Thursday, November 04, 2010

How to create Datafiles in a Data Guard (10g) environment

I came across this Data Guard situation lately where a datafile was added to the primary database, but failed during creation on the standby. The database version was 10.2.0.3 and standby_file_management was set to AUTO.

Standby_file_management is a parameter, and when set to AUTO on the standby site, will create datafiles automatically on the standby site, for every datafile created on the primary site.

I started getting alerts in a few minutes on the standby for the following error.

WARNING: File being created with same name as in Primary Existing file may be overwritten
File #309 added to control file as 'UNNAMED00309'.
Originally created as: '/disk1/oradata/primarydb01/sys06.dbf'
Recovery was unable to create the file as: '/disk1/oradata/primary/sys06.dbf'
Errors with log /disk1/archive/stbydb/sf2_1_215423_528599320.arc
MRP0: Background Media Recovery terminated with error 1119
Tue Feb 5 07:42:09 2008
Errors in file /disk1/oracle/admin/stbydb2/bdump/stbydb2_mrp0_12552.trc:
ORA-01119: error in creating database file '/disk1/oradata/sf12/sys06.dbf'
ORA-27054: NFS file system where the file is created or resides is not mounted with correct options
SVR4 Error: 13: Permission denied
Some recovered datafiles maybe left media fuzzy
Media recovery may continue but open resetlogs may fail
Tue Feb 5 07:42:13 2008
Errors in file /disk1/oracle/admin/stbydb2/bdump/stbydb2_mrp0_12552.trc:
ORA-01119: error in creating database file '/ora-disk/oradata/sf12/sys06.dbf'
ORA-27054: NFS file system where the file is created or resides is not mounted with correct options
SVR4 Error: 13: Permission denied
Tue Feb 5 07:42:13 2008

From the message it was clear that the datafile creation had failed on the standby, even though standby_file_management was set to AUTO. It failed for the reason that the directory the datafile was created in, "/disk1/oradata/primary/", was not there on the Physical Standby. However, we want the directory "/disk1/oradata/primary/" to be "/disk1/oradata/standby/" on the standby. This is taken care of by "DB_FILE_NAME_CONVERT" parameter set on the standby. However, if you have several lines for this parameter,

SQL> Show parameter db_file_name_convert

will display as empty. At that point, the only way to check that value is to view the parameter file.

So login to standby database and issue

Step 1: SQL> create pfile from spfile;

Now go to $ORACLE_HOME/dbs directory for that initialization parameter file.

In our case, we had clearly missed the directory that the datafile was created on the standby.

Change the pfile to add that destination

Db_file_name_convert= '/disk1/oradata/primary','/disk1/oradata/standby'

Step 2: Now login to the standby and cancel the recovery

SQL> alter database recover standby managed database cancel;

Database altered

Step 3: Shutdown and startup the standby database using new parameter file.



SQL> shutdown immediate;
ORA-01507: database not mounted



ORACLE instance shut down.

SQL> startup mount pfile=$ORACLE_HOME/dbs/initstandby.ora
ORACLE instance started.







SQL> create spfile from pfile;

Step 4: Start the managed recovery on the standby.

SQL> alter database recover managed standby database disconnect from session;

Database altered.

The above command will tell standby to start applying archivelogs from where it was missing. I got the following errors in the alert log this time:

MRP0: Background Managed Standby Recovery process started (stbydb2)
Managed Standby Recovery not using Real Time Apply
MRP0: Background Media Recovery terminated with error 1111
Wed Feb 6 13:50:05 2008
Errors in file /disk1/oracle/admin/stbydb2/bdump/stbydb2_mrp0_22855.trc:
ORA-01111: name for data file 309 is unknown - rename to correct file
ORA-01110: data file 309: '/disk1/oracle/product/10.2.0/dbs/UNNAMED00309'
ORA-01157: cannot identify/lock data file 309 - see DBWR trace file
ORA-01111: name for data file 309 is unknown - rename to correct file
ORA-01110: data file 309: '/disk1/oracle/product/10.2.0/dbs/UNNAMED00309'
Wed Feb 6 13:50:05 2008
Errors in file /disk1/oracle/admin/stbydb2/bdump/stbydb2_mrp0_22855.trc:
ORA-01111: name for data file 309 is unknown - rename to correct file
ORA-01110: data file 309: '/disk1/oracle/product/10.2.0/dbs/UNNAMED00309'
ORA-01157: cannot identify/lock data file 309 - see DBWR trace file
ORA-01111: name for data file 309 is unknown - rename to correct file
ORA-01110: data file 309: '/disk1/oracle/product/10.2.0/dbs/UNNAMED00309'
Wed Feb 6 13:50:05 2008
MRP0: Background Media Recovery process shutdown (stbydb2)
Wed Feb 6 13:50:05 2008
Completed: alter database recover managed standby database disconnect from session
Wed Feb 6 13:52:09 2008
Using STANDBY_ARCHIVE_DEST parameter default value as /disk2/archive/stbydb2/
Redo Shipping Client Connected as PUBLIC
-- Connected User is Valid
RFS[1]: Assigned to RFS process 23024

This is because whenever datafile creation fails, Oracle server will create an invisible datafile called UNNAMED and make it look like it is there under $ORACLE_HOME/dbs. When the automatic recovery is resumed, it looks at that UNNAMED file, and the standby fails again with the message:

ORA-01111: name for data file 309 is unknown - rename to correct file

Now you need to rename that file to the one that you want to create in order for the physical standby to continue applying remaining logs.

Step 5: So now when you try to issue the "rename" command, you will get this error:

SQL> alter database rename file '/disk1/oracle/product/10.2.0/dbs/UNNAMED00309' to
'/disk1/oradata/stbydb2/sys06.dbf'
*
ERROR at line 1:
ORA-01511: error in renaming log/data files
ORA-01275: Operation RENAME is not allowed if standby file management is automatic.

This is because RENAME (ADD/DROP,CREATE) operations on the standby is not allowed, when standby_file_manageent is set to AUTO. Now you need to change the parameter to MANUAL for the above command to work.

Step 6: SQL> alter system set STANDBY_FILE_MANAGEMENT=MANUAL scope=memory;

System altered.

Step 7: SQL> alter database rename file
'/disk1/oracle/product/10.2.0/dbs/UNNAMED00309' to '/disk1/oradata/stbydb2/sys06.dbf';
alter database rename file '/disk1/oracle/product/10.2.0/dbs/UNNAMED00309' to '/disk1/oradata/stbydb2/sys06.dbf'
*
ERROR at line 1:
ORA-01511: error in renaming log/data files
ORA-01141: error renaming data file 309 - new file
'/disk1/oradata/stbydb2/sys06.dbf' not found
ORA-01111: name for data file 309 is unknown - rename to correct file
ORA-01110: data file 309:
'/disk1/oracle/product/10.2.0/dbs/UNNAMED00309'
ORA-27037: unable to obtain file status
SVR4 Error: 2: No such file or directory
Additional information: 3



Oh... What now? Well, we can not rename to a file that doesn't already exist, so we actually need to create the datafile.

Step 8: SQL> alter database create datafile
'/disk1/oracle/product/10.2.0/dbs/UNNAMED00309' as '/disk1/oradata/stbydb2/sys06.dbf';

Database altered.

Yes, finally! Ok, datafile created. Let's put the automatic recovery back on the standby.

Step 9: SQL> alter database recover standby database;
alter database recover standby database
*
ERROR at line 1:
ORA-00279: change 9999250009 generated at 02/05/2008 07:28:43 needed for thread 1
ORA-00289: suggestion : /disk2/archive/stbydb2/sf2_1_215423_528599320.arc
ORA-00280: change 9999250009 for thread 1 is in sequence #215423



Now Oracle wants me to manually specify the archive log files to be applied. This is because I changed the standby_file_management parameter earlier to MANUAL. I need to put it back to AUTO.

Step 10: SQL> alter system set STANDBY_FILE_MANAGEMENT=AUTO scope=memory;

System altered.

Step 11: Now issue the command
SQL> recover automatic standby database;

Media recovery complete.

Check the alert log

Tail –f alert*

RFS[1]: Assigned to RFS process 27167
RFS[1]: Identified database type as 'physical standby'
Wed Feb 6 14:48:32 2008
RFS LogMiner: Client disabled from further notification
Wed Feb 6 14:48:45 2008
Media Recovery Log /disk2/archive/stbydb2/sf2_1_215425_528599320.arc
Wed Feb 6 14:50:04 2008
RFS[1]: No standby redo logfiles created
Wed Feb 6 14:51:32 2008
RFS[1]: Archived Log: '/disk2/archive/stbydb2/sf2_1_215542_528599320.arc'
Wed Feb 6 14:56:42 2008
Recovery created file /disk1/oradata/stbydb2/users121.dbf
Successfully added datafile 310 to media recovery
Datafile #310: '/disk1/oradata/stbydb2/users121.dbf'



Yes, it successfully created the datafile and the recovery resumed. All the missing 105 archive log files were applied against 2.5TB OLTP database.

Best practices and lesson learned: Whenever you add a datafile to the primary, login to standby and make sure it was created successfully. The best way to do that is to issue "tail –f alert*" in the background_dump_dest directory. If for some reason the datafile creation has failed, cancel the recovery on the standby, and do only the steps 6, 8, 10 and 11 in this document.

Tuesday, November 02, 2010

Oracle 10g Compression vs. 11g's Advanced Compression

This document is meant to show the difference between 10gR2 Table Compression feature and 11gR2’s Advanced Compression. Oracle provided table level compression feature in 10gR2. While this compression provided some storage reduction, 10g’s table compression only compressed the data during BULK LOAD operations. New and updated data were not compressed. With 11g’s Advanced Compression new and updated data are also compressed; achieving highest level in storage reduction, while providing performance improvements as compressed blocks result in more data being moved per I/O.

[Please understand that these test cases are only meant as examples to quickly analyze and understand the benefits of Advanced Compression in 11g. Depending on duplicate data in your environments, these results may vary]


Here is the example.

Following test case was executed in 10.2.0.1 database server.

A table called TEST was created without COMPRESSION option.

SQL> select table_name,compression from dba_tables where table_name = 'TEST';

TABLE_NAME COMPRESS
------------------------- -------------
TEST DISABLED


SQL> select bytes from dba_segments where segment_name = 'TEST';

SUM(BYTES)
------------------
92274688


The size of the table was around 92MB.

Now create another table called TEST_COMPRESSED with COMPRESS option.

SQL> create table TEST_COMPRESSED COMPRESS as select * from test;

Table created.

SQL> select table_name, compression from dba_tables where table_name like 'TEST
%';

TABLE_NAME COMPRESS
------------------------------ ---------------
TEST_COMPRESSED ENABLED
TEST DISABLED


Now let’s check the size of the COMPRESSED table.


SQL> select bytes from dba_segments where segment_name = 'TEST_COMPRESSED';

SUM(BYTES)
----------------
30408704

Check out the size of the COMPRESSED table. It is only 30MB, around 30% reduction in size. So far so good.

Now let’s do a plain insert into the COMPRESSED table.

SQL> insert into TEST_COMPRESSED select * from TEST;

805040 rows created.

SQL> commit;

Commit complete.

Let’s check the size of the COMPRESSED table.

SQL> select bytes from dba_segments where segment_name = 'TEST_COMPRESSED'

2 /

SUM(BYTES)
----------
117440512


Wow! From 30MB to 117MB? So, plain INSERT statement does not COMPRESS the data in 10g.

(You will see this is not the case with 11g)

Now let’s do the same insert with a BULK LOAD

SQL> insert /*+ APPEND */ into TEST_COMPRESSED select * from TEST;

805040 rows created.

SQL> commit;

Commit complete.

SQL> select bytes from dba_segments where segment_name = 'TEST_COMPRESSED';


SUM(BYTES)
----------
142606336

Ok, now the size of the COMPRESSED table is 142MB from 117MB. For the same number of rows, the table size only increased by 25MB. So BULK LOAD compresses the data.

Let’s check other DML statements such as DELETE and UPDATE against the COMPRESSED table.


SQL> delete from test_compressed where rownum < 100000;

99999 rows deleted.

SQL> commit;

Commit complete.

SQL> select bytes from dba_segments where segment_name = 'TEST_COMPRESSED';

SUM(BYTES)
----------
142606336

No change in total size of the table. DELETE has no impact as expected.


Let’s check UPDATE.

SQL> update test_compressed set object_name = 'XXXXXXXXXXXXXXXXXXXXXXXXXX' where
rownum < 100000;

99999 rows updated.

SQL> commit;


SQL> select bytes from dba_segments where segment_name = 'TEST_COMPRESSED';

SUM(BYTES)
----------
150994944

The table size is increased by 8MB? No compression for UPDATE statement either.


All this clearly shows that 10g’s Table COMPRESSION would work great for initial BULK LOADS, however subsequent UPDATE’s, DELETE’s and INSERT’s will not result in COMPRESSED blocks.







Now, let’s see 11g’s Test Results.

The following SQL statements were executed against 11.2.0.1 database version.


TEST table of 100MB in size was created as before.



SQL> select bytes from dba_segments where segment_name = 'TEST';

BYTES
----------
100663296

So 100MB of table created.

Let’s create a table with COMPRESS FOR ALL OPERATIONS option. This is only available in 11g.


SQL> create table test_compressed compress for all operations as select * from
test;

Table created.

SQL> select bytes from dba_segments where segment_name = 'TEST_COMPRESSED';

BYTES
----------
31457280


Check out the size of the compressed table vs. uncompressed table. 30% less space usage on a compressed table. Not a big difference compared to 10g.




Let’s check other DML statements.

Let’s do a plain insert to the compressed table.

SQL> insert into TEST_COMPRESSED select * from test;

789757 rows created.

SQL> commit;

Commit complete.

SQL> select bytes from dba_segments where segment_name = 'TEST_COMPRESSED';

BYTES
----------
75497472



11g’s Advanced compression, compressed 100MB of data to 40MB and inserted to the compressed table, WITHOUT BULK LOAD option.

Now let’s do the BULK LOAD onto 11g’s COMPRESSED table.


SQL> insert into /*+ APPEND */ test_compressed select * from TEST;

789757 rows created.

SQL> commit;

Commit complete.

SQL> select bytes from dba_segments where segment_name = 'TEST_COMPRESSED';

BYTES
----------
109051904
It has a same impact as PLAIN insert.

What about deletes and updates?


SQL> delete from test_compressed where rownum < 100000;

99999 rows deleted.

SQL> commit;

Commit complete.

SQL> select bytes from dba_segments where segment_name = 'TEST_COMPRESSED';

BYTES
----------
109051904


No change in deletes. This is expected as the blocks are compressed when the new rows are added to the existing blocks and that the threshold reaches PCTFREE.


SQL> update test_compressed set object_name = 'XXXXXXXXXXXXXXXXXXXXXXXXXX' where

2 rownum < 100000;

99999 rows updated.

SQL> commit;

Commit complete.

SQL> select bytes from dba_segments where segment_name = 'TEST_COMPRESSED';

BYTES
----------
109051904


There is no change in this case as existing blocks were able to accommodate updates. However the same update generated more data in 10g.


Summary:

The above tests prove that 11g’s Advanced Compression compresses Table Data for all INSERTS and UPDATES, resulting in huge storage reduction. However 10g’s compression only compresses data during BULK LOAD operations. Subsequent added/updated data is not compressed.

Friday, June 18, 2010

I/O Errors using NFS on Netapps storage on Oracle 10.2.0.4

I recently faced the following issue, on one of my client environment.

After successfully creating a 10g database using NFS file system on a Linux OS, I turned it over to the client for production implementation. Developers started populating the data using Schema level import. When a large table is imported, the import failed with the following errors

ORA-01115: IO error reading block from file 6 (block # 5631)
ORA-01110: data file 6: '/u0201/oradata/reportdata.dbf'
ORA-27091: unable to queue I/O
ORA-27072: File I/O error
Linux-x86_64 Error: 5: Input/output error
Additional information: 4
Additional information: 5630

After searching several hours on Oracle Support, found a document that talked about setting up filesystemio_options to 'setall' on NFS file systems for redo log corruption. We did not have any corruption, but I wanted to try it anyway because since we were using NFS.

Alter system set filesystemio_options=setall scope=spfile;

After setting this, developers did the imports successfully without any problem.


Parameter: FILESYSTEMIO_OPTIONS
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
"directIO" - This allows directIO to be used where supported by the OS. Direct IO bypasses any Unix buffer cache. As of 10.2 most platforms will try to use "directio" option for NFS mounted disks (and will also check NFS attributes are sensible).
"setall" - Enables both ASYNC and DIRECT IO. The 'setall' option chooses async or direct I/O based on the platform and file system used.
Enabling directio against nfs datafiles/redo logs should always be used so that no writes are ever cached in the os buffer cache. The consistency of writes cannot be guaranteed when relying on stateless nfs to perform asynchronous writes over the network.

Saturday, June 05, 2010

Size of RMAN incremental backups

We had a situation recently where backup destination on a Linux server was 97%. Our system admin, wanted to know why? because RMAN retention has not changed, we wanted to know the db did actually grew that big to make that difference. We were taking Lavel 0 backups every Sunday followed by Level 1 and 2 backups during the week and Level 0 again next Sunday.

We were keeping 45 days worth of backups on disk. Now, we needed to find out the size of backups sets every week and then calculate which week had a enormous growth in size. Messy way of doing this is, at the UNIX level, we can issue, ls -ltr and then total the backups belonging to each week. The database was 500Gb and each backupsets had different sizes. It was a cumbersome effort.

How did we do this elegantly in RMAN? There is a view called rc_backup_datafile that you can use fir this purpose.

First Week

select
sum(blocks * block_size)/(1024*1024*1024)
from rc_backup_datafile
where completion_time between ('04-APR-2010') and ('10-APR-2010')
order by 1

SQL> /

SUM(BLOCKS*BLOCK_SIZE)/(1024*1024*1024)

---------------------------------------
255.587982

Second week

select
sum(blocks * block_size)/(1024*1024*1024)
from rc_backup_datafile
where completion_time between ('11-APR-2010') and ('17-APR-2010')
order by 1
SQL> /



SUM(BLOCKS*BLOCK_SIZE)/(1024*1024*1024)
---------------------------------------
255.705849


select
sum(blocks * block_size)/(1024*1024*1024)
from rc_backup_datafile
where completion_time between ('26-APR-2010') and ('03-MAY-2010')
order by 1

SQL> /


SUM(BLOCKS*BLOCK_SIZE)/(1024*1024*1024)
---------------------------------------
283.666035


select
sum(blocks * block_size)/(1024*1024*1024)
from rc_backup_datafile
where completion_time between ('04-MAY-2010') and ('11-MAY-2010')
order by 1

SQL> /


SUM(BLOCKS*BLOCK_SIZE)/(1024*1024*1024)
---------------------------------------
279.676537

Ok, between 26th Apr and 3rd May, the backups had grown almost 30Gb. It was admitted to me , there indeed was a major upload that happened that week, which would explain the sudden increase in backup directory usage.

Ok, demand for more disk was justified!

Dbca hangs while creating the database

I came across the following weird issue with dbca, that I wanted to share with others.
I had to create a database on a Linux x86_64 server. When I ran ./dbca , the screens ran through and I answered all the questions and then clicked the last step that creates the database.

dbca just was hung , did not proceed. There was no logs, no errors just nothing! I was puzzled and didn't know where to start digging. Finally there was a problem with my Xming utility, I was missing several font files that oracle needs. AFter reinstalling the X utilities by the sys admin, dbca finished to completion.

So if you are stuck with unexplainable issue with dbca, check your X windows utilities and make sure you have all the required set.

Saturday, May 22, 2010

Flashback Query Usage Example

Flashback Query

In my earlier blog, I explained about Flashback Database Feature usage. In this blog I wanted to give an example of a Flashback Query usage.

I received a call one Friday afternoon from one my client DBAs, just before I was getting ready to leave my office. He was extremely nervous on the phone and could not speak clearly. I said "oh no.. don't worry" and then asked him "What happened?".

He just realized that he had just dropped bunch of rows from a critical table in production. He wanted to know how long it will take to restore the database. I asked him if he knew when he performed this operation. He said he didn't know just not in the mindset to guess either. I asked him if the previous RMAN backups were good, he said yes. But he deleted bunch of archivelog files this morning. So point in time recovery just before the delete operation was out of question.

I asked him if the database had undo_retention set. He said yes with 24 hour retention. I asked him the name of the table and told him not to call me or IM me next 10 min.

Here are the steps I followed. I just estimated 3 hours and started querying from the table from last three hours. First to see how many rows were there around 3:00 P.M on Feb 1st.

SELECT count(*) FROM schema_name.card_accounts AS OF TIMESTAMP TO_TIMESTAMP('01-FEB-10 03:00:00 PM','DD-MON-YY HH:MI:SS PM')
/
20004

Ok there were 20004 records at 3:00 P.M. It only had 2 records now at 4:30 P.M. I have used dummy schema_name so you won't guess my client.

Clearly the rows were deleted between 4:00 P.M and 4:30 P.M. But it is a wide window.

SO let's query again

SELECT count(*) FROM schema_name.card_accounts AS OF TIMESTAMP TO_TIMESTAMP('01-FEB-10 03:30:00 PM','DD-MON-YY HH:MI:SS PM')
/
20004

SELECT count(*) FROM schema_name.card_accounts AS OF TIMESTAMP TO_TIMESTAMP('01-FEB-10 04:00:00 PM','DD-MON-YY HH:MI:SS PM')
/
20004

SELECT count(*) FROM schma_name.card_accounts AS OF TIMESTAMP TO_TIMESTAMP('01-FEB-10 04:30:00 PM','DD-MON-YY HH:MI:SS PM')
/
2

Alright, so the rows got deleted between 4:00 P.M and 4:30 P.M. There my phone rings again. I picked up and asked him "Hey Mike, does 20004 sounds good?". "Yeah that is what I had before" Mike replied.

I inserted those rows to a temporary table.

insert into card_accounts_tmp2 select * from schema_name.card_accounts AS OF TIMESTAMP TO_TIMESTAMP('01-FEB-10 03:00:00 PM','DD-MON-YY HH:MI:SS PM')
/
20004 rows created.

Ok go ahead and check that temporary table to see if you have everything you need! He said "No". OH, yeah I forgot to "commit".


I issued commit.

SQL> commit;

Commit complete

Ok, "Now?" I asked. He typed carelessly and said "Wow! yes they are all there! You are my hero!" , slowly raising his voice.

Now I queried the table again and found 8 rows. oh yeah, the users are in and inserting rows into that table.

Mike said that was a easy fix, he can just merge those two tables and insert back.

Ok, Problem Solved and Over! So was my Friday evening! :-(

Tuesday, May 18, 2010

How to resolve ORA-01547 errors during RMAN restore

How to resolve ORA-01547 errors during RMAN restore

You just ran a RMAN duplicate script to clone a production database. After restoring datafiles for almost 6 hours, your duplicate script failed with the following errors towards the end.

ORA-01547: warning: RECOVER succeeded but OPEN RESETLOGS would get error below
ORA-01194: file 1 needs more recovery to be consistent
ORA-01110: data file 1: '/u01/oradata/REP/system01.dbf'

released channel: ch1
released channel: ch2

RMAN-00571: ========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ==============
RMAN-00571: ===========================================================
RMAN-03002: failure of Duplicate Db command at 05/17/2010 14:14:10
RMAN-03015: error occurred in stored script Memory Script
RMAN-06053: unable to perform media recovery because of missing log
RMAN-06025: no backup of log thread 1 seq 11366 lowscn 74424083 found to restore
RMAN-06025: no backup of log thread 1 seq 11365 lowscn 74420409 found to restore

RMAN>


As per Oracle Support document number 274118.1, you copied the archivelog files for seq nos 11366 and 11365 to the auxiliary database server, and issued recover command.

Now oracle says it needs a log seq# 10955 to be restored and recovered. Well seq# 10278 is
from last week's LEVEL 0 backup. At this point, the only solution is to

login to target using rman and backup archivelog files to the same backup destination where
you restored the rest of the rman backups, that is accessible to the auxilairy database.

I issued the following command on the target database instance.

rman > backup archivelog all not backedup to disk;

This way only archivelog files that are not backed up are backed up, to reduce time.

Now go back to the duplicate script via RMAN and rerun it.

Now RMAN in oracle 10g would let you restart a failed RMAN duplicate script and will start recovering files from it left.

You will notice RMAN starts recovering from earlier sequence numbers 10955


starting media recovery

channel ch2: starting archive log restore to default destination
channel ch1: starting archive log restore to default destination
channel ch2: restoring archive log
archive log thread=1 sequence=10955
channel ch2: reading from backup piece /backups/ROD/ROD_20100514_s2494_p1_arc
channel ch1: restoring archive log
archive log thread=1 sequence=10953
channel ch1: restoring archive log
archive log thread=1 sequence=10954
channel ch1: reading from backup piece /backups/ROD/ROD_20100514_s2492_p1_arc
channel ch2: restored backup piece 1


After that you can open the database in "resetlogs" mode without any problem.

RMAN duplicate works to find the latest log file from controlfile if nocatalog is used. If it can't find it on the backup directory, it spits out a message immediately.

At that point, you can backup all the archivelog files and rerun the duplicate.

To avoid this issue, run duplicate command with "set until time" if possible.

Sunday, April 11, 2010

Ora-00600 errors during RMAN duplicate in 10g

If you ever receive the following error while running RMAN duplicate command, that could be because you did not create a password file before running RMAN duplicate.

Finished restore at 14-APR-10
sql statement: CREATE CONTROLFILE REUSE SET DATABASE "PROD" RESETLOGS ARCHIVELOG
MAXLOGFILES 16
MAXLOGMEMBERS 3
MAXDATAFILES 100
MAXINSTANCES 8
MAXLOGHISTORY 2336
LOGFILE
GROUP 1 '/u0202/oradata/PROD/redo01.log' SIZE 50 M ,
GROUP 2 '/u0202/oradata/PROD/redo02.log' SIZE 50 M ,
GROUP 3 '/u0202/oradata/PROD/redo03.log' SIZE 50 M
DATAFILE
'/u0201/oradata/PROD/system01.dbf'
CHARACTER SET AL32UTF8

released channel: ch1
released channel: ch2
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of Duplicate Db command at 04/14/2010 12:12:03
RMAN-06136: ORACLE error from auxiliary database: ORA-01503: CREATE CONTROLFILE failed
ORA-00600: internal error code, arguments: [kccsga_update_amx_1], [11], [2656], [2353], [], [], [], []

Solution: Go ahead and create the password file and rerun RMAN duplicate command again.You do not have to cleanout the old datafiles and start over in 10g. Just kick off the rman duplicate script again and it will restart from where it has left off.

Sunday, April 04, 2010

Flashbacklogs Cleanup in Oracle

Flashback logs cleanup without downtime – Step by Step

Summary

Have you ever gotten into a situation where flash back logs have been building up for years in the Flash Recovery Area of your database and that you are stumped as to how to clean them up manually? You are dealing with the production database and that no downtime is allowed unless it is of utmost necessity. How do we cleanup the flashback logs from the Flash Recovery Area without any downtime to the production database? Before we go into the details, let us see what Flash Recovery Area and Flash Back Database are. (Trust me, many people are so confused with these two terms and many actually think they are the same)

Flash Recovery Area

Flash Recovery Area contains all the files you need to completely recover a database from a media failure. They are archivelog files, RMAN backups, control files, flash back logs. Oracle manages these files automatically when the database experience space pressure in Flash Recovery Area. In other words, when Flash recovery area gets to 85% filled, it will issue a warning, but continue to fill up FRA to 100%. As soon as FRA reaches 100%, oracle will start deleting these files from the oldest to make room for new files.

Flash Back Database

You can set your database to use Flash Back Database feature. This feature allows your entire database to be reverted back to a point in time in the past. When you have this feature set, flash back logs will be created. You can limit the number of flashback logs created by setting the retention parameter.
In our case we over estimated the Flash Recovery Area in our database and that flashback logs have been growing for several years. Oracle does not recommend deleting these files from using operating system commands, because that might corrupt your database. You need to let Oracle delete these files automatically. But these files are not going to be deleted by oracle, unless 100% space is filled up. I wanted to take manual action at this point and delete last 4 years worth of flash back files from FRA.

There are two ways of achieving this.

1. Disable flashback database feature, bounce the database and enable it back. But as you all know, this requires down time and hence not a preferred solution.

2. Shrink the FRA size to the used space, to trick oracle into thinking that FRA is 100% full, in which case oracle will start deleting the files automatically to make room.

Let us now look at step by step process to accomplish this task.

Following are the steps I executed in production to accomplish this using the 2nd option.
First, let us check the size of FLASH_RECOVERY_AREA
SQL> show parameter db_recovery
NAME TYPE VALUE
--------------------- ------------------------------------------------------------

db_recovery_file_dest string /u01/flash_recovery_area

db_recovery_file_dest_size big integer 190000M

So the size of the FLASH_RECOVERY_AREA is 190GB.

How do we know how much of that 190GB is used and how much is free? Issue the following query to get that information.

SQL> select space_used/(1024*1024*1024),space_limit/(1024*1024*1024) from v$recovery_file_dest;
SPACE_USED/(1024*1024*1024) SPACE_LIMIT/(1024*1024*1024)
--------------------------- ----------------------------
55.6249833 185.546875

Only 55GB of space is being used. Now we want to see, how much of space is being split among flashback logs, archivelogs and backups. Run the following query to get that information.
SQL> select * from v$flash_recovery_area_usage;
FILE_TYPE PERCENT_SPACE_USED PERCENT_SPACE_RECLAIMABLE NUMBER_OF_FILES
------------ ------------------ ------------------------- -----------------------------------------------------------------------
CONTROLFILE 0 0 0
ONLINELOG 0 0 0
ARCHIVELOG 18.17 6.03 4691
BACKUPPIECE 0 0 0
IMAGECOPY 0 0 0
FLASHBACKLOG 11.81 11.73 1077

6 rows selected.

As per the above query output, 18.17% of FRA space is used for archivelog files, out of which 6.03% is reclaimable. 11.81% of the space is used for flashback logs, out of which 11.73% is reclaimable.
Let us now verify the retention target for flashback.
SQL> show parameter db_flashback
NAME TYPE VALUE
------------------------------------ ----------- --------------------------------------
db_flashback_retention_target integer 1440

Retention is set to 24 hours so only last 24 hours of flashback logs can be utilized anyway.
We can free up all the space utilized by flashback logs older than last 24 hours and reclaim space. Following steps illustrates how to accomplish this.
Make sure the following parameter is set.
SQL> show parameter log_archive_min_succeed_dest;

NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
log_archive_min_succeed_dest integer 1

Set log_archive_dest_1 to a different location where there is enough space. This is required to prevent the database from hanging for want of space while db_recovery_file_destination_size is reduced preventing archivelog file creation.
SQL> alter system set log_archive_dest_1=’LOCATION=/home/oracle/temp_archivelog’ scope=memory;
System altered.

DEFER the default DB_RECOVERY_FILE_DESTINATION so archivelogs won’t be written to the default destination.
SQL> alter system set log_archive_dest_state_10=defer scope=memory;
System altered.

Switch logfiles and make sure archivelog files are created in the new location /oracle/home/temp_archivelog
SQL> alter system switch logfile;
System altered.
SQL> /
System altered.

Make the db_recovery_file_size to 55GB, same as the actual usage to make oracle delete older files.
SQL> alter system set db_recovery_file_dest_size=55G scope=memory;
System altered.
You will see following messages in the alert log

ALTER SYSTEM SET db_recovery_file_dest_size=’55G’ SCOPE=MEMORY;
Mon Jul 27 13:27:45 2009
Deleted Oracle managed file /u01/flash_recovery_area/PRODDB/flashback/o1_mf_4zngo02h_.flb
Deleted Oracle managed file /u01/flash_recovery_area/PRODDB/flashback/o1_mf_4zngookw_.flb
Deleted Oracle managed file /u01/flash_recovery_area/PRODDB/flashback/o1_mf_4zngpd5j_.flb
Deleted Oracle managed file /u01/flash_recovery_area/PRODDB/flashback/o1_mf_4zngq3l6_.flb
Deleted Oracle managed file /u01/flash_recovery_area/PRODDB/flashback/o1_mf_4zngqw4j_.flb
Deleted Oracle managed file /u01/flash_recovery_area/PRODDB/flashback/o1_mf_4zngrpfp_.flb
Deleted Oracle managed file /u01/flash_recovery_area/PRODDB/flashback/o1_mf_4zngsjpb_.flb

Now let us check the space now by issuing the same query used before.

SQL> select * from v$flash_recovery_area_usage;

FILE_TYPE PERCENT_SPACE_USED PERCENT_SPACE_RECLAIMABLE NUMBER_OF_FILES
------------ ------------------ ------------------------- ----------------------------------------------------------------------------
CONTROLFILE 0 0 0
ONLINELOG 0 0 0
ARCHIVELOG 61.3 20.34 4692
BACKUPPIECE 0 0 0
IMAGECOPY 0 0 0
FLASHBACKLOG 38.7 38.44 1046

6 rows selected.

SQL> select space_used/(1024*1024*1024),space_limit/(1024*1024*1024) from v$recovery_file_dest;
SPACE_USED/(1024*1024*1024) SPACE_LIMIT/(1024*1024*1024)
--------------------------- ----------------------------
54.9990034 55
Note the new sizes and the number of files. It looks like oracle deleted some flashback logs but none of the archivelog files were deleted.

Alert log showed following warnings.
Mon Jul 27 13:31:42 2009
Errors in file /u01/app/oracle/product/admin/PRODDB/bdump/db_mmon_22830.trc:
ORA-19815: WARNING: db_recovery_file_dest_size of 53687091200 bytes is 87.63% used, and has 6642196992 remaining bytes available.

Oracle could not delete anymore files because archivelog files were not backed up or even cleaned out. For archivelog files to become eligible for automatic cleanout, they needs to be either backed up or deleted manually.
So I removed archive log files from the os manually and ran the following commands from RMAN.

RMAN> crosscheck archivelog all;
RMAN> delete expired;
Now check space
SQL> /
FILE_TYPE PERCENT_SPACE_USED PERCENT_SPACE_RECLAIMABLE NUMBER_OF_FILES
------------ ------------------ ------------------------- ------------------------------------------------------------------------
CONTROLFILE 0 0 0
ONLINELOG 0 0 0
ARCHIVELOG 3.83 0 1274
BACKUPPIECE 0 0 0
IMAGECOPY 0 0 0
FLASHBACKLOG 51.23 50.93 1007

6 rows selected.
Reclaimable space for ARCHIVELOG is now 0 and number of files matched RMAN retention period. In other words, all the archivelog files belonged to the retention set in RMAN.
Now check the new sizes.

SQL> select * from v$flash_recovery_area_usage;
FILE_TYPE PERCENT_SPACE_USED PERCENT_SPACE_RECLAIMABLE NUMBER_OF_FILES
------------ ------------------ ------------------------- --------------------------------------------------------------------------
CONTROLFILE 0 0 0
ONLINELOG 0 0 0
ARCHIVELOG 97.95 0 1274
BACKUPPIECE 0 0 0
IMAGECOPY 0 0 0
FLASHBACKLOG 7.76 0 6

6 rows selected.

SQL> select space_used/(1024*1024*1024),space_limit/(1024*1024*1024) from v$recovery_file_dest;
SPACE_USED/(1024*1024*1024) SPACE_LIMIT/(1024*1024*1024)
--------------------------- --------------------------------------------------
1.6516757 1.66015625




Reclaimable space for FLASHBACKLOG is 0 and that there are only 6 files left. I also confirmed that those were from the last 24 hours.
Put the log_archive_destination back
SQL> alter system set log_archive_dest_1=’ ‘ scope=memory;
System altered.

SQL> alter system set log_archive_dest_10=’LOCATION=USE_DB_RECOVERY_FILE_DEST’ scope=memory;
System altered.
SQL> archive log list
Database log mode Archive Mode
Automatic archival Enabled
Archive destination USE_DB_RECOVERY_FILE_DEST
Oldest online log sequence 4253
Next log sequence to archive 4254
Current log sequence 4254

Make sure to increase flash_recovery_area so there is enough space to write files.

SQL> alter system set db_recovery_file_dest_size=50G scope=memory;
System altered.

SQL> show parameter db_recovery
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
db_recovery_file_dest string /u01/flash_recovery_area
db_recovery_file_dest_size big integer 50G
Now finally enable the log_archive_dest_state

SQL> alter system set log_archive_dest_state_10=enable scope=memory;
System altered.