Showing posts with label Data Guard. Show all posts
Showing posts with label Data Guard. Show all posts

Monday, June 28, 2010

Restoring Standby Controlfile results in ORA-00600

I received ORA-00600 error while restoring Controlfile on a RAC-standby database (Physical). We purposely disabled redo log shipping on the Primary database and as a result the Standby Database was lagging way behind the production. Instead of applying all the Archive logs I took the shortest path of taking SCN based RMAN backup on the Primary and apply it to the Standby.

While doing so, I took a backup of Controlfile from primary database and shipped it to the standby site. I was able to successfully restore the controlfile as shown below:


RMAN> restore controlfile from '/tmp/control01.ctl';

Starting restore at 08-JUN-10
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: sid=1083 instance=qtprod1 devtype=DISK

channel ORA_DISK_1: copied control file copy
output filename=+DATADG/control01.ctl
output filename=+FRADG/control02.ctl
output filename=+DATADG/control03.ctl
Finished restore at 08-JUN-10

RMAN> exit

However, while mounting the database with this new controlfile, I was greeted with ORA-00600. Below is the error:


SQL> alter database mount standby database;
alter database mount standby database
*
ERROR at line 1:
ORA-00600: internal error code, arguments: [kccsga_update_ckpt_5], [1453], [0],
[62], [0], [], [], []

Searching for the above error in "Troubleshoot an ORA-600 or ORA-7445 Error Using the Error Lookup Tool" did not provide any information. :(

Looking back at the steps that lead me to ORA-00600 error, I found the solution to the problem. The controlfile should have been restored as a standby CF. After correcting the mistake database happily mounted.


RMAN> restore standby controlfile from '/tmp/control01.ctl';

Starting restore at 08-JUN-10
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: sid=1082 instance=biodb1 devtype=DISK

channel ORA_DISK_1: copied control file copy
output filename=+DATADG/control01.ctl
output filename=+FRADG/control02.ctl
output filename=+DATADG/control03.ctl
Finished restore at 08-JUN-10

RMAN> exit


SQL> alter database mount standby database;

Database altered.

After applying the RMAN backup, standby database got in sync with the primary.

Tuesday, March 17, 2009

Strange Data Guard Issue

We have recently failed over one of our database to the DRC. At that time the production archivelog sequence was 80,000 plus. After staying there for couple of hours we recreated a physical standby database on the primary and switched back to original primary location. Everything went fine without any hoo-ha.

The failover and switchover took place on February 20, 2009. We have an automated job on all our standby databases to check whether the Standby is lagging behind the Primary, if so, then it automatically restores the missing archivelogs and applies them.

This job was working fine until day before yesterday (March 15, 2009) but started reporting errors thereafter. It failed with “RMAN-20242” error.

Starting restore at 15-MAR-09
released channel: ch12
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of restore command at 03/15/2009 07:26:59
RMAN-06004: ORACLE error from recovery catalog database: 
RMAN-20242: specification does not match any archive log in the recovery catalog

When notified, I queried V$ARCHIVE_GAP and the output of this query took me to a surprise:

SQL> select * from v$version;

BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - 64bi
PL/SQL Release 10.2.0.3.0 - Production
CORE    10.2.0.3.0      Production
TNS for IBM/AIX RISC System/6000: Version 10.2.0.3.0 - Production
NLSRTL Version 10.2.0.3.0 - Production

SQL>
SQL> select * from v$archive_gap;

   THREAD# LOW_SEQUENCE# HIGH_SEQUENCE#
---------- ------------- --------------
         1           776          82468

SQL> 

Well, all of a sudden Oracle thinks that it is 81 thousand archive logs behind the primary. I tried to switching couple of archive logs on the production and the gap was still intact. I don't really know from where did Oracle took this information.

The automated job was trying to restore archives between 776 and 82468 and was kicked out by the recovery catalog database saying, there are no such archive logs (in fact there exist no such archivelogs).

I created a standby controlfile on the production database and replaced the controlfile on the standby database with the new one. The standby database seems to be happy with this version of controlfile and have stopped reporting any gap and the automated job is also leading a happy life.

SQL> select * from v$archive_gap;

no rows selected

SQL>