Tuesday, 21 January 2014

Purging trace and dump files with 11g ADRCI

Purging trace and dump files with 11g ADRCI


In previous versions of Oracle prior to 11g, we had to use our own housekeeping scripts to purge the udump, cdump and bdump directories.

In Oracle 11g, we now have the ADR (Automatic Diagnostic Repository) which is defined by the diagnostic_dest parameter.

So how are unwanted trace and core dump files cleaned out in 11g automatically?

This is done by the MMON background process.

There are two time attributes which are used to manage the retention of information in ADR. Both attributes correspond to a number of hours after which the MMON background process purges the expired ADR data.

LONGP_POLICY (long term) defaults to 365 days and relates to things like Incidents and Health Monitor warnings.

SHORTP_POLICY (short term) defaults to 30 days and relates to things like trace and core dump files

The ADRCI command show control will show us what the current purge settings are as shown below.

adrci> show control

ADR Home = /u01/app/oracle/diag/rdbms/orcl11/orcl11:
*************************************************************************
ADRID                SHORTP_POLICY        LONGP_POLICY         LAST_MOD_TIME                            LAST_AUTOPRG_TIME                        LAST_MANUPRG_TIME                        ADRDIR_VERSION       ADRSCHM_VERSION      ADRSCHMV_SUMMARY     ADRALERT_VERSION     CREATE_TIME
-------------------- -------------------- -------------------- ---------------------------------------- ---------------------------------------- ---------------------------------------- -------------------- -------------------- -------------------- -------------------- ----------------------------------------
1095473802           720                  8760                 2010-07-07 08:46:56.405618 +08:00        2010-08-22 22:14:11.443356 +08:00                                                 1                    2                    76                   1                    2010-07-07 08:46:56.405618 +08:00

In this case it is set to the defaults of 720 hours (30 days) for the Short Term and 8760 hours (One year) for the long term category.

We can change this by using the ADRCI command ‘set control’

In this example we are changing the retention to 15 days for the Short Term policy attribute (note it is defined in Hours)

adrci> set control (SHORTP_POLICY =360)

adrci> show control

ADR Home = /u01/app/oracle/diag/rdbms/orcl11/orcl11:
*************************************************************************
ADRID                SHORTP_POLICY        LONGP_POLICY         LAST_MOD_TIME                            LAST_AUTOPRG_TIME                        LAST_MANUPRG_TIME                        ADRDIR_VERSION       ADRSCHM_VERSION      ADRSCHMV_SUMMARY     ADRALERT_VERSION     CREATE_TIME
-------------------- -------------------- -------------------- ---------------------------------------- ---------------------------------------- ---------------------------------------- -------------------- -------------------- -------------------- -------------------- ----------------------------------------
1095473802           360                  8760                 2010-08-27 09:36:09.385370 +08:00        2010-08-22 22:14:11.443356 +08:00                                                 1                    2                    76                   1                    2010-07-07 08:46:56.405618 +08:00

We can also manually purge information from the ADR using the ‘purge’ command from ADRCI (note this is defined in minutes and not hours!).

In this example we are purging all trace files older than 6 days. We see that the LAST_MANUPRG_TIME column is now populated.

adrci> purge -age 8640 -type TRACE 

adrci> show control

ADR Home = /u01/app/oracle/diag/rdbms/orcl11/orcl11:
*************************************************************************
ADRID                SHORTP_POLICY        LONGP_POLICY         LAST_MOD_TIME                            LAST_AUTOPRG_TIME                        LAST_MANUPRG_TIME                        ADRDIR_VERSION       ADRSCHM_VERSION      ADRSCHMV_SUMMARY     ADRALERT_VERSION     CREATE_TIME
-------------------- -------------------- -------------------- ---------------------------------------- ---------------------------------------- ---------------------------------------- -------------------- -------------------- -------------------- -------------------- ----------------------------------------
1095473802           360                  8760                 2010-08-27 09:36:09.385370 +08:00        2010-08-22 22:14:11.443356 +08:00        2010-08-27 09:50:07.399853 +08:00        1                    2                    76                   1                    2010-07-07 08:46:56.405618 +08:00


collect various diagnostic related files, such as the Database alert log, trace files using ADRCI

Oracle's Automatic Diagnostic Repository (ADR) is a location on the file system used to collect various diagnostic related files, such as the Database alert log, trace files, and the new 11g HealthMonitor report files. The ADR location is not limited just to the Oracle Database it can be used as the diagnostic repository for other Oracle products. The command line tool used to 'manage' the diagnostic data is ADRCI.

Two terms that help 'understand' the adrci tool are Problem and Incident.

Problem – The critical error.

Incident – A single occurrence of the error. Diagnostic data (eg trace files etc) are collected and related to an incident.


To Start the ADRCI command line tool:
adrci

To end an adrci session use either exit or quit:
adrci>exit
adrci>quit

To see the list of available commands:
adrci>help
adrci>help <command name>
eg
adrci>help set home


Setting the Home
The structure of the ADR allows it to be the repository for many database instances, each instance having its own sub-directories within ADR base directory. The default behaviour for the adrci tool to 'act' on all the 'homes' within the ADR. You can also 'set' the home to limit the source of diagnostic data to manage.

View the current homes
adrci> show homes;

adrci> show homes;
ADR Homes:
diag/rdbms/orcl2/orcl2
diag/rdbms/orcl/orcl
diag/tnslsnr/oln/listener


Notice the multiple homes(orcl2, orcl ....) listed within the ADR, and that the path information is display as 'relative' paths.  You can use the relative path when setting the adrci home.

- To set a single home path:
adrci>set homepath diag/rdbms/orcl2/orcl2

or in a shorten form dropping the diag/rdbms

adrci>set homepath orcl2

The adric>show homes command indicates the current home is orcl2;

adrci> show homes
ADR Homes:
diag/rdbms/orcl2/orcl2



Viewing the the Alert Log.
Use the adrci>show alert command to view the contents of the alert log.

adrci>show alert
or
adrci>show alert -tail
or
adrci>show alert -tail -f


Viewing Problems.
The adrci>show problem command will by default list the last 50 problems. There are various filter and sort options to control the output. The basic adrci>show problem syntax is

adrci>show problem [-p "predicate_string"]

Where the "predicate_string" is 1 of many predicate values. Use the
adrci>help show problem for the full list. Below are some examples.

adrci>show problem
or
adrci> show problem -p "problem_id=2"
or
adrci>show problem -orderby lastinc_time
or
adrci>show problem -p "problem_id > 500"


Here is sample of the adrci>show problem command

adrci> show problem -p "problem_id=2"
ADR Home = /u01/app/diag/rdbms/orcl/orcl:
*************************************************************************
PROBLEM_ID      PROBLEM_KEY                  LAST_INCIDENT    
--------------- ---------------------------  --------------------
2                ORA 600 [kebm_mmon_main_1]  7489  
1 rows fetched


Viewing Incidents.
To view the incidents of the current ADR home use the adrci>show incident command. Two arguments that control the output are the -p “predicate string” and the -mode {BASIC | BRIEF | DETAIL} indicator. For example:

adrci>show incident
or
adrci>show incident -p “incident_id=7489”
or
adrci>show incident -p “incident_id=7489” -mode detail

The show incident -p “incident_id=7489” -mode detail will include in the output the location of the related trace files for the selected incident.


Viewing Trace Files
To display a list of trace files within the currently set Home use adrci> show tracefile. Two useful methods to control the tracefile output are -i <incident_id> and the use of % wildcard. Use the -i <incident_id> to list tracefiles related to the incident.  Use the %wildcard to limit the tracefiles to those that match a particular filename. For example:

adrci> show tracefile
or
adrci> show tracefile %dw00%
or
adrci> show tracefile -i 7489


Packing it (For Oracle Support or for yourself)
There are multiple ways to achieve this, here I am creating an empty 'package' and adding incident data to it.  The steps are
1. Create a package (A logical container for the diagnostic data)
2. Add diagnostic data into the package (from an incident or manually by adding trace files)
3. Generate a 'physical' package (file on the filesystem)
4. Finalize the package


Step 1.
adrci> ips create package
Created package 3 without any contents, correlation level typical

Step 2.
adrci>ips add incident 32578 package 3;
Added incident 32578 to package 3

To manually add a tracefile into the package
adrci ips add file <path to tracefile> package 3

Step 3.
adrci>ips generate package 3 in /u01/support
Generated package 3 in file /u01/support/IPSPKG_<datestamp>_COM_1.zip, mode complete

Step 4.
adrci> ips finalize package 3


When reviewing a database incident, even if you may not forward it on to Oracle Support it is helpful to generate the package zip file then unzipping if for your own review.


Oracle Docs: http://docs.oracle.com/cd/B28359_01/server.111/b28319/adrci.htm

Friday, 10 January 2014

Optimizer statistics do not get automatically purged (object lock of WRI$_OPTSTAT_HISTHEAD_HISTORY) /// SYSAUX and SYSTEM tablespaces have been continually growing



object lock of WRI$_OPTSTAT_HISTHEAD_HISTORY



The old statistics are purged automatically at regular intervals based on the statistics history retention setting and the time of recent statistics gathering performed in the system. Retention is configurable using the ALTER_STATS_HISTORY_RETENTION procedure. The default value is 31 days.

The SYSAUX and SYSTEM tablespaces have been continually growing due to

the databases where a high amount of import/exports and RMAN are taking place.

SELECT  occupant_name "Item",
    space_usage_kbytes/1048576 "Space Used (GB)",
    schema_name "Schema",
    move_procedure "Move Procedure"
FROM v$sysaux_occupants
ORDER BY 1

/

Item                      Space Used (GB) Schema                    Move Procedure

SM/OPTSTAT 12.45  SYS


To resolve this I set the stats retention period to 5 days.

SQL> exec dbms_stats.alter_stats_history_retention(5);


To find out the oldest available stats you can issue the following:

SQL> select dbms_stats.get_stats_history_availability from dual;
GET_STATS_HISTORY_AVAILABILITY
---------------------------------------------------------------------------
28-SEP-13 00.00.00.000000000 +04:00

To find out a list of how many stats are gathered for each day between the retention the current date and the oldest stats history issue the following:

SQL> select trunc(SAVTIME),count(1) from WRI$_OPTSTAT_HISTHEAD_HISTORY group by  trunc(SAVTIME) order by 1;
TRUNC(SAV COUNT(1)
--------- ----------
28-SEP-13 2920140
29-SEP-13 843683
30-SEP-13 519834
01-OCT-13 958836
02-OCT-13 3158052
03-OCT-13 287
04-OCT-13 1253952
05-OCT-13 732361
06-OCT-13 507186
07-OCT-13 189416
08-OCT-13 2619
09-OCT-13 1491
10-OCT-13 287
11-OCT-13 126324
12-OCT-13 139556
13-OCT-13 181068
14-OCT-13 4832
15-OCT-13 258027
16-OCT-13 1152
17-OCT-13 287
18-OCT-13 27839
21 rows selected.

What has happened here is that the job run by MMON every 24hrs has checked the retention period and tried to run a purge of all stats older than the retention period. As the job has not compeleted within 5 minutes because of the high number of stats collected on each day, the job has given up and rolled back. Therefore the stats are not being purged.

As each day continues the SYSAUX table is continuing to fill up because the job fails each night and cannot purge old stats.

To resolve this we have to issue a manual purge to clear down the old statistics. This can be UNDO tablespace extensive so it’s best to keep an eye on the amount of UNDO being generated. I suggest starting with the oldest and working fowards.

To manually purge the stats issue the following:

SQL> exec dbms_stats.purge_stats(to_date('28-SEP-13','DD-MON-YY'));PL/SQL procedure successfully completed.

OR

exec DBMS_STATS.PURGE_STATS(SYSDATE-5);

Purge stats older than 5 days 

(best to do this in stages if there is 

a lot of data (sysdate-30,sydate-25 etc)


Then I tried rebuilding the stats indexes and tables as they would now be fragmented.

If you are only running standard edition then you can only rebuild indexes offline. Online index rebuild is a feature of Enterprise Edition.


SELECT
sum(bytes/1024/1024) Mb,
segment_name,
segment_type
FROM
dba_segments
WHERE
tablespace_name = 'SYSAUX'
AND
segment_type in ('INDEX','TABLE')
GROUP BY
segment_name,
segment_type
ORDER BY Mb;
MB  SEGMENT_NAME                             SEGMENT_TYPE
--  ---------------------------------------  ----------------
2   WRH$_SQLTEXT                             TABLE
2   WRH$_ENQUEUE_STAT_PK                     INDEX
2   WRI$_ADV_PARAMETERS                      TABLE
2   WRH$_SEG_STAT_OBJ_PK                     INDEX
3   WRI$_ADV_PARAMETERS_PK                   INDEX
3   WRH$_SQL_PLAN_PK                         INDEX
3   WRH$_SEG_STAT_OBJ                        TABLE
3   WRH$_ENQUEUE_STAT                        TABLE
3   WRH$_SYSMETRIC_SUMMARY_INDEX             INDEX
4   WRH$_SQL_BIND_METADATA_PK                INDEX
4   WRH$_SQL_BIND_METADATA                   TABLE
6   WRH$_SYSMETRIC_SUMMARY                   TABLE
7   WRH$_SQL_PLAN                            TABLE
8   WRI$_OPTSTAT_TAB_HISTORY                 TABLE
8   I_WRI$_OPTSTAT_TAB_ST                    INDEX
9   I_WRI$_OPTSTAT_H_ST                      INDEX
9   I_WRI$_OPTSTAT_TAB_OBJ#_ST               INDEX
12  I_WRI$_OPTSTAT_H_OBJ#_ICOL#_ST           INDEX
12  I_WRI$_OPTSTAT_IND_ST                    INDEX
12  WRI$_OPTSTAT_HISTGRM_HISTORY             TABLE
14  I_WRI$_OPTSTAT_IND_OBJ#_ST               INDEX
20  WRI$_OPTSTAT_IND_HISTORY                 TABLE
360 I_WRI$_OPTSTAT_HH_ST                     INDEX
376 WRI$_OPTSTAT_HISTHEAD_HISTORY            TABLE
488 I_WRI$_OPTSTAT_HH_OBJ_ICOL_ST            INDEX

To reduce these tables and indexes you can issue the following:

Note that you cannot enable row movement and shrink the tables as the indexes are function based

alter table WRI$_OPTSTAT_IND_HISTORY enable row movement;
alter table WRI$_OPTSTAT_IND_HISTORY shrink space;
*
ERROR at line 1:
ORA-10631: SHRINK clause should not be specified for this object

select 'alter table '||segment_name||'  move tablespace SYSAUX;' from dba_segments where tablespace_name = 'SYSAUX'
and segment_name like '%OPT%' and segment_type='TABLE'
Run the rebuild table commands – note that this does cause any gather_stats jobs to fail

alter table WRI$_OPTSTAT_TAB_HISTORY  move tablespace sysaux;
alter table WRI$_OPTSTAT_IND_HISTORY  move tablespace sysaux;
alter table WRI$_OPTSTAT_HISTHEAD_HISTORY  move tablespace sysaux;
alter table WRI$_OPTSTAT_HISTGRM_HISTORY  move tablespace sysaux;
alter table WRI$_OPTSTAT_AUX_HISTORY  move tablespace sysaux;
alter table WRI$_OPTSTAT_OPR  move tablespace sysaux;
alter table WRH$_OPTIMIZER_ENV  move tablespace sysaux;

Script to generate rebuild statements

select 'alter index '||segment_name||'  rebuild online parallel (degree 14);' from dba_segments where tablespace_name = 'SYSAUX'
and segment_name like '%OPT%' and segment_type='INDEX'
Once completed it is best to check that the indexes (indices) are usable


select  di.index_name,di.index_type,di.status  from  dba_indexes di , dba_tables dt
where  di.tablespace_name = 'SYSAUX'
and dt.table_name = di.table_name
and di.table_name like '%OPT%'
order by 1 asc

/


Show how big the tables are and rebuild after stats have been purged

select sum(bytes/1024/1024) Mb, segment_name,segment_type from dba_segments
where  tablespace_name = 'SYSAUX'
and segment_name like 'WRI$_OPTSTAT%'
and segment_type='TABLE'
group by segment_name,segment_type order by 1 asc

Show how big the indexes are ready for a rebuild after stats have been purged

select sum(bytes/1024/1024) Mb, segment_name,segment_type from dba_segments
where  tablespace_name = 'SYSAUX'
and segment_name like '%OPT%'
and segment_type='INDEX'

group by segment_name,segment_type order by 1 asc









Wednesday, 1 January 2014

[SOLVED] Crontab does not run crontab scripts



[SOLVED] Crontab does not run crontab scripts


[oracle@111 db_back]$ ps aux | grep cron
oracle   14930  0.0  0.0 103232   860 pts/0    S+   23:21   0:00 grep cron
[oracle@111 db_back]$ service crond restart
User has insufficient privilege.
[root@111 ~]# service crond restart
Stopping crond:                                            [FAILED]
Starting crond:                                            [  OK  ]


SHELL=/bin/bash
PATH=/sbin:/bin:/usr/sbin:/usr/bin
MAILTO=root
HOME=/
# For details see man 4 crontabs

# Example of job definition:
# .---------------- minute (0 - 59)
# |  .------------- hour (0 - 23)
# |  |  .---------- day of month (1 - 31)
# |  |  |  .------- month (1 - 12) OR jan,feb,mar,apr ...
# |  |  |  |  .---- day of week (0 - 6) (Sunday=0 or 7) OR sun,mon,tue,wed,thu,fri,sat
# |  |  |  |  |
# *  *  *  *  * user-name command to be executed
* * * * * oracle run-parts /scripts/expback
* * * * * oracle run-parts /scripts/rmanback
15 * * * * oracle run-parts /etc/cron.hourly



First, the correct place for this is probably in /etc/cron.d not in /etc/crontab. If you really want to keep it where it is now,
I'd suggest looking in /var/log/cron and making sure that it is executing at all. I'd look at `aureport -a`
 and see if anything is logged as an selinux denial around the time you expect this to execute
 and then use `ausearch -a nnn` where nnn is the number from the far right hand end of the aureport output line(s).
  Trying it in permissive mode by running `setenforce 0` would be a good test of this.