Monday, February 04, 2008

Practicing Block Recovery in Oracle Database

Dear all,

During backup and recovery practice sessions, we often struggle to perform block recovery scenario. This is because we find it difficult to corrupt an Oracle block.

I performed this test on Oracle 10g Release 2 (10.2.0.1) on Windows XP and for the purpose of this test we need to keep our database in archivelog mode. In this article, I will discuss how to corrupt an Oracle data block, but before beginning this discussion, I would like to answer:

Why to corrupt an Oracle Block?

We will be corrupting an Oracle block in order to practice recovery procedures involved when one encounters a Block Corruption in a production environment. If a block gets corrupted in any of our production databases we will be in a position to rectify and correct the error instead of wandering for help.

This is purely for educational purpose and please do not practice this on any of your production/development/testing databases, rather create a new database for this purpose and practice it there.

For the purpose of this test, I have created a separate tablespace and a new schema.

SQL> create tablespace corrupt_ts datafile 'c:\mydb\data\corrupt01.dbf' size 10m; 
Tablespace created. 
SQL>
SQL> create user test identified by test default tablespace corrupt_ts; 
User created. 
SQL>
SQL> grant connect, resource to test;
Grant succeeded. 
SQL>

Create and populate test table with dummy data as shown:

SQL> conn test/test
Connected.
SQL>
SQL> create table t1 as select rownum rno, object_name from all_objects
  2  where object_name like 'AQ%';

Table created.

SQL> select count(*) from t1;

  COUNT(*)
----------
        42

Insert a record into this table which we will be corrupting:

SQL> insert into t1 values (99, 'LET ME CORRUPT');

1 row created.

SQL> commit;

Commit complete.

Let us take RMAN full database backup before we corrupt the block.

RMAN> backup format 'c:\mydb\rman\fulldb_%U' database plus archivelog;

Starting backup at 01-FEB-08
current log archived
:
:
piece handle=C:\MYDB\RMAN\FULLDB_0LJ7LIML_1_1 tag=TAG20080201T234641 comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:28
Finished backup at 02-FEB-08

Starting backup at 02-FEB-08
current log archived
using channel ORA_DISK_1
channel ORA_DISK_1: starting archive log backupset
channel ORA_DISK_1: specifying archive log(s) in backup set
input archive log thread=1 sequence=105 recid=105 stamp=645581560
channel ORA_DISK_1: starting piece 1 at 02-FEB-08
channel ORA_DISK_1: finished piece 1 at 02-FEB-08
piece handle=C:\MYDB\RMAN\FULLDB_0MJ7LINS_1_1 tag=TAG20080202T001242 comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:04
Finished backup at 02-FEB-08

RMAN>

Take the tablespace offline so that we can make changes to the datafile. There are many freeware and shareware Hex Editors available in the market. I am using UltraEdit editor to make changes in our datafile.
SQL> alter tablespace corrupt_ts offline;

Tablespace altered.

Open datafile “'c:\mydb\data\corrupt01.dbf” using UltraEdit (press “Ctrl+h” to toggle between Hex Mode). Search for our record entry “LET ME CORRUPT” in the file and changed “CORRUPT” to “NORRUPT” and save the file and close UltraEdit. I just changed “C” to “N”.

Bring back the tablespace to online mode.

SQL> alter tablespace corrupt_ts online;

Tablespace altered.

You may notice that Oracle doesn’t complain when it brings the datafile online because the file header wasn’t modified. Oracle will complain only when it tries to access the corrupt blocks. Let’s see what happens when we try to query table “T1”.

SQL> conn test/test
Connected.
SQL> select * from t1;

       RNO OBJECT_NAME
---------- ------------------------------
         1 AQ$_AGENT
         2 AQ$_DEQUEUE_HISTORY
          :
          :

          30 AQ$_JMS_NAMEARRAY
ERROR:
ORA-01578: ORACLE data block corrupted (file # 6, block # 13)
ORA-01110: data file 6: 'C:\MYDB\DATA\CORRUPT01.DBF'


30 rows selected.

Query returns 30 records and then complains of block corruption in file 6. Block numbered 13 is being reported as corrupt. Let us see what all blocks are corrupt in “corruption01.dbf” datafile by running dbv utility.

C:\ora10g\BIN>dbv file=C:\MYDB\data\corrupt01.dbf blocksize=8192

DBVERIFY: Release 10.2.0.1.0 - Production on Mon Feb 4 00:00:11 2008

Copyright (c) 1982, 2005, Oracle.  All rights reserved.

DBVERIFY - Verification starting : FILE = C:\MYDB\data\corrupt01.dbf
Page 13 is marked corrupt
Corrupt block relative dba: 0x0180000d (file 6, block 13)
Bad check value found during dbv:
Data in bad block:
 type: 6 format: 2 rdba: 0x0180000d
 last change scn: 0x0000.0039aa9f seq: 0x3 flg: 0x06
 spare1: 0x0 spare2: 0x0 spare3: 0x0
 consistency value in tail: 0xaa9f0603
 check value in block header: 0x85b0
 computed block checksum: 0x1b00

DBVERIFY - Verification complete

Total Pages Examined         : 1280
Total Pages Processed (Data) : 4
Total Pages Failing   (Data) : 0
Total Pages Processed (Index): 0
Total Pages Failing   (Index): 0
Total Pages Processed (Other): 11
Total Pages Processed (Seg)  : 0
Total Pages Failing   (Seg)  : 0
Total Pages Empty            : 1264
Total Pages Marked Corrupt   : 1
Total Pages Influx           : 0
Highest block SCN            : 3779231 (0.3779231)
C:\ora10g\BIN>

This utility scans all the blocks in a given datafile and outputs the corrupt ones. In my case, I have one block marked as corrupt. Make a note of all the corrupt blocks as we need to recover them to previous state.

Start RMAN session and recover all the corrupt blocks. The beauty of RMAN is that it leaves the entire datafile online except the corrupted blocks and we need to recover only those corrupt blocks instead of entire datafile.

 
RMAN> blockrecover datafile 6 block 13;

Starting blockrecover at 04-FEB-08
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: sid=44 devtype=DISK

channel ORA_DISK_1: restoring block(s)
channel ORA_DISK_1: specifying block(s) to restore from backup set
restoring blocks of datafile 00006
channel ORA_DISK_1: reading from backup piece C:\MYDB\RMAN\FULLDB_0KJ7LH72_1_1
channel ORA_DISK_1: restored block(s) from backup piece 1
piece handle=C:\MYDB\RMAN\FULLDB_0KJ7LH72_1_1 tag=TAG20080201T234641
channel ORA_DISK_1: block restore complete, elapsed time: 00:00:36

starting media recovery

archive log thread 1 sequence 105 is already on disk as file C:\MYDB\FRA\MYDB\ARCHIVELOG\2008_02_02\O1_MF_1_10
5_3T72T48S_.ARC
archive log thread 1 sequence 106 is already on disk as file C:\MYDB\FRA\MYDB\ARCHIVELOG\2008_02_03\O1_MF_1_10
6_3TD97K0Z_.ARC
media recovery complete, elapsed time: 00:00:35
Finished blockrecover at 04-FEB-08

RMAN>

RMAN reports success of block recovery command. Let us query the table again by logging in to SQL*Plus:

 
SQL> select * from t1;

       RNO OBJECT_NAME
---------- ------------------------------
         1 AQ$_AGENT
         2 AQ$_DEQUEUE_HISTORY
          :
          :

        41 AQ$_JMS_ARRAY_ERROR_INFO
        42 AQ$_JMS_ARRAY_ERRORS
        99 LET ME CORRUPT

43 rows selected.

SQL>

Wow, the query runs successfully and our original record is restored. Similar article on block recovery in UNIX environment can be found here.

Happy recovery (block)!!!

49 comments:

Gary Coy said...

You rock. This is a great article. I'm surprised no on has commented on this. This is a good exercise, well presented. Thank you for this.

Claude said...

Thanks for the Nice Artice..But when I try it, I have always the following problem :
SQL> alter tablespace corrupt_ts online;
alter tablespace corrupt_ts online
*
ERROR at line 1:
ORA-01157: cannot identify/lock data file 8 - see DBWR trace file
ORA-01110: data file 8: 'C:\MYDB\DATA\CORRUPT01.DBF'

i am using the UltraEdit and I did the same steps by changing the letter C to N.

Is there anything missing to do? Thanks
Claude

Asif Momen said...

Hi,

Did you place your datafile under 'C:\MYDB\DATA' folder?

Please paste your "CREATE TABLESPACE' command.

Regards

Claude said...

Yes, i did the same steps exactly that you have described in your document(Tablespace, user and Table). I think the problem is related to the UltraEdit. I am using UltraEdit Professional Version 14.
Whenever I open the file with UltraEdit, I see a list of 0000000h :20...etc So I press the ctrl+h to be in Hex Mode , and I search for the word 'LET ME CORRUPT' and I could find it. I changed manually the letter C with N.
UltraEdit is creating a file called corrupt01.dbf.bak. Then I save the file.

Do you think there is another tool better than UltraEdit, in whihc I can change the datafile contents?

Thanks very much

Claude said...

I could do the exrecise..The problem was when I use UltraEdit, I didn't use the HexMode. Thanks very much for this practice. It's very useful

Regards
Claude

Asif Momen said...

Glad to know that you did it successfully.

Anonymous said...

All you did was change a character, you didn't mess with the block header or row header. I don't think that the block tail indicated the change because its values if SCN+block type+ SCN sequence #, right?

How exactly did Oracle spot the change? What mechanism did Oracle use? Is there some checksum value kept per row? If so, where is it kept? Thanks so much for the good lesson.

Vipul Kapadia said...

This is an excellent article. To test out our BMR this works perfectly in a non-ASM area. Is there anyway we can do this in a RAC environment running in ASM? We have made several attempts but it does not work and requires a tablespace recovery instead.

Vipul Kapadia said...

This is an excellent article. To test out our BMR this works perfectly in a non-ASM area. Is there anyway we can do this in a RAC environment running in ASM? We have made several attempts but it does not work and requires a tablespace recovery instead.

Vipul said...

This is an excellent article. This works perfectly in a non-ASM area. We have a RAC/ASM environment and we are wondering if there's anyway we can do this. We have made several attempts but it does not work and requires a tablespace recovery instead because we can't edit the file in ASM so we move the tablespace out to file system and after editing we put it back in ASM. That's the only way we can think of doing this.

Asif Momen said...

Hi Vipul,

Yes, the only way to edit a datafile in ASM is to copy it to a filesystem, modify it, and then place it back into ASM.

Happy testing !!!

Razi said...

Dear Asif Sb

very nice article.....

This is Razi Ahmad who lived in ur building at 3rd floor.

regards

Razi Ahmad

Asif Momen said...

@Razi

Glad to know that you found it informative.

Anonymous said...

Asif,
Great article. It took me a long time to find this one. I am surprised that nobody else has anything on this. Unfortunately, it didn't work for me. I have a RAC with ASM, and when I copy the corrupted file back, ASM complains that the file type was not recognized:
ASMCMD [+DATA/TEST/DATAFILE] > cp /tmp/CORRUPT_TS.268.692961805 CORRUPT_TS.268.692961805
source /tmp/CORRUPT_TS.268.692961805
target CORRUPT_TS.268.692961805
ASMCMD-08012: can not determine file type for file->'/tmp/CORRUPT_TS.268.692961805'

I am using UltraEdit. Do you have any idea what I might be doing wrong?

Thanks

Asif Momen said...

@Anonymous

I haven't testing this with ASM, will do that once I find sufficient time. I will update my results here, to keep watching.

Mansi said...

great article...thanks

Dmitry said...

Hi Asif, don't you think that the procedure can be automated by the file fix dbf program, it automates the procedure of database recovery. there are some similar application that I have evaluated

kiran said...

Can you also let us know what kind of block corruption is this PHYSICAL or LOGICAL.

A great article thanks for such detailed explanation.

Thanks,
Kiran

Anonymous said...

Hi Asif,

Nice article, small question, what i understand that we must have to have the backup taken in the begining to be in place so as to recover block. Am i right?
Also i saw your profile as you are working in banking environment, can you please advise how the responsibilities are shared amongst dba's (specially application user (dbuser through which application is connecting to db) password will be with one dba or shared with all dba's and if with one dba then how it is managed during his vacation (specially when changing application user password is delicate and can't be done to avaoid unexpected issues on production env.)

Anonymous said...


Hi Asif,
Nice article. what i understand before corrupting we must have to have backup through which we can recover corrupted block.Am i right?
Secondly since you are in banking environment, how it is managed among all dba's in the team responsiblity vice or all dba's are looking all databases. What if main application's database user through which application is connecting to the db and that password is hardcoded in the application, then it will be with one dba only? if yes then in his absence/vacation who is taking care of work related with that password? Further if you can describe the roles & responsibilities distributed in banks dba team

ana said...

hi,

I tried the step, I am not using rman can you give some practices using ordinary os command backup like copy or cp. I follow the step but no effect i think it's applicable only for RMAN.

Thank you.

Asif Momen said...

@Kiran

This is a physical block corruption.

@Anonymous

Yes, we need to have a backup in order to perform recovery. Drop me a mail in person to discuss how responsibilities are shared in my environment.

@Ana

The only valid database backup using OS commands is when the database is down. RMAN is the most powerful backup and recovery tool for Oracle database.

Raghu Bandla said...

Good Article...!! Thanks

Dinesh said...

its really wonderfull artical

meeran said...

Hai asif,

I followed the same steps as u did to practice block corruption but facing the same error as claude faced.

1.After opening the file with ultra edit pressed crtl + h then ctrl + f typed the word LET ME CORRUPT and changed the word C to N in the file.

2.Saved the file by ctl + s ,closed ultra editor
and i could see new file created with name CORRUPT01.DBF.bak

3.SQL> alter tablespace corrupt_ts online;
alter tablespace corrupt_ts online
*
ERROR at line 1:
ORA-01157: cannot identify/lock data file 8 - see DBWR trace file
ORA-01110: data file 8: 'C:\MYDB\DATA\CORRUPT01.DBF'

Please correct me if i am doing anything wrong .

4.What i observed with new file CORRUPT01.DBF.bak is the original file CORRUPT01.DBF is renamed to CORRUPT01.DBF.bak and created new file CORRUPT01.DBF which contains entry "LET ME NCORRUTPT" and
CORRUPT01.DBF.bak contains entry "LET ME CORRUPT".

5.I have deleted the file CORRUPT01.DBF and renamed CORRUPT01.DBF.bak as CORRUPT01.DBF.


SQL> alter tablespace corrupt_ts online;

Tablespace altered.

SQL> show user;
USER is "SYS"
SQL> conn
Enter user-name: testcrpt
Enter password:
Connected.
SQL> select * from t1;

RNO OBJECT_NAME
---------- ------------------------------
1 AQ$_AGENT
2 AQ$_DEQUEUE_HISTORY
3 AQ$_SUBSCRIBERS
4 AQ$_RECIPIENTS
5 AQ$_HISTORY
6 AQ$_NOTIFY_MSG
7 AQ$_SIG_PROP
8 AQ$_MIDARRAY
9 AQ$INTERNET_USERS
10 AQ$_NTFN_DESCRIPTOR
11 AQ$_DESCRIPTOR

RNO OBJECT_NAME
---------- ------------------------------
12 AQ$_REG_INFO
13 AQ$_POST_INFO
14 AQ$_REG_INFO_LIST
15 AQ$_POST_INFO_LIST
16 AQ$_SUBSCRIBER
17 AQ$_SUBSCRIBER_T
18 AQ$_DUMMY_T
19 AQ$_JMS_USERPROPERTY
20 AQ$_JMS_USERPROPARRAY
21 AQ$_JMS_HEADER
22 AQ$_JMS_MESSAGE

RNO OBJECT_NAME
---------- ------------------------------
23 AQ$_JMS_TEXT_MESSAGE
24 AQ$_JMS_BYTES_MESSAGE
25 AQ$_JMS_STREAM_MESSAGE
26 AQ$_JMS_MAP_MESSAGE
27 AQ$_JMS_OBJECT_MESSAGE
28 AQ$_JMS_VALUE
29 AQ$_JMS_EXCEPTION
30 AQ$_JMS_NAMEARRAY
31 AQ$_JMS_MESSAGES
32 AQ$_JMS_TEXT_MESSAGES
33 AQ$_JMS_BYTES_MESSAGES

RNO OBJECT_NAME
---------- ------------------------------
34 AQ$_JMS_MAP_MESSAGES
35 AQ$_JMS_STREAM_MESSAGES
36 AQ$_JMS_OBJECT_MESSAGES
37 AQ$_JMS_MESSAGE_PROPERTY
38 AQ$_JMS_MESSAGE_PROPERTIES
39 AQ$_JMS_ARRAY_MSGID_INFO
40 AQ$_JMS_ARRAY_MSGIDS
41 AQ$_JMS_ARRAY_ERROR_INFO
42 AQ$_JMS_ARRAY_ERRORS
99 LET ME CORRUPT

43 rows selected.

Asif Momen said...

@Meeran,

When you save a file in UltraEdit it automatically backs up the original file with a "bak" extension.

Since you reverted the modified file with the backed up file, you were able to bring the tablespace online.

You should ignore this backed up file and continue as mentioned in the post.

Happy testing !

meeran said...

Hai Asif,

Thank U for ur reply.
Again i repeated the same step what u have mentioned in the post.
Facing the same error

SQL> alter tablespace corrupt_ts online;
alter tablespace corrupt_ts online
*
ERROR at line 1:
ORA-01157: cannot identify/lock data file 6 - see DBWR trace file
ORA-01110: data file 6: 'C:\MYDB\DATA\CORRUPT02.DBF'


In the alert log file

ORA-01157: cannot identify/lock data file 6 - see DBWR trace file
ORA-01110: data file 6: 'C:\MYDB\DATA\CORRUPT02.DBF'
ORA-27046: file size is not a multiple of logical block size
OSD-04000: logical block size mismatch (OS 8192)

How i can resolve this error and continue in doing the block recovery practice.

Prett said...

Hello Asif Momen,

Very nice Post but There are many scenarios in which you corrupt your data.In my knowledge RMAN detects only the physically corrupt data blocks on Oracle database. It does not automatically find the logical corruption of the blocks. You must specify the logical block corruption checking using the 'CHECK LOGICAL' option.If RMAN cannot detect logical and physical corruption to Oracle database and fix it than this is cause of severe data loss.In this case you can use Good Oracle Database Recovery Software.

Asif Momen said...

@Meeran

Seems you are missing some steps in between. In your earlier message you were trying to work with "CORRUPT01.DBF" file where as in the last message you are trying to bring "CORRUPT02.DBF" online.

meeran said...

Hai Asif,

Again thank u for ur reply . To try newly i have created new datbase named sil on the same machine itself . Datafile CORRUPT02.DBF belong new database sil and CORRUPT01.DBF belongs to my old database test123 ,tablespace name is same on both the database.

I have tried so many times facing the same error. I am keen to know the reason for the error and practicing block recovery.

Asif Momen said...

Make sure that you are not missing any step. All the best.

Anonymous said...

I found your article very helpful. Thank you very much. I would like to ask you however one question: when we recover the corrupted block, why do we roll back to the state when the block was not corrupted? I mean, the changes that we make in order to corrupt the block, are they not stored in the archive redo log files?

Asif Momen said...

Glad that you found it helpful.

We did not rollback the block. We recovered the block to the last good state. We made changes to the datafile (in order to corrupt it) using a Hex Editor. These changes are not part of the database and hence will not find place in archivelog.

I hope I answered your question.

Anonymous said...

Yes, thank you very much, it answered my question. Yeah, I should speak correctly and use recover instead of roll-back when I mean to 'go back' to a previous state.By the way, when we recover, which is our starting SCN?the SCN of the last backup?should not we firstly restore?

Thank you very much again.

Asif Momen said...

@Anonymous

We are performing a BLOCK recover and when we say "blockrecover datafile n block x", Oracle will restore and recover only the mentioned block(s). Block restore is implicit with this command.

Anonymous said...

Ok Asif, that vas also very helpful, thank you very much for all your support. Best wishes!!!

Excel Repair said...

Thank you for posting the great post…I was looking for something like this…I found it quiet interesting, hopefully you will keep posting such blogs….Keep sharing

Anonymous said...

SQL> alter tablespace corrupt online;
alter tablespace corrupt online
*
ERROR at line 1:
ORA-01157: cannot identify/lock data file 7 - see DBWR trace file
ORA-01110: data file 7: '/u01/TESTDB3/oradata/corrupt.dbf'


i tried to edit the dbf file using the 'vi editor' on linux!

Could you please tell me how to deal with this problem?

raj said...

Hi Asif,

i dont see the same article for UNIX. can you give me the link for UNIX

raj said...

can you please provide me the link for unix. the link posted in the article doesnt work

Thanks

Asif Momen said...

@Raj

I will try to add the UNIX version also.

pritesh ranjan said...

hi thanks for a helpful post of blok recovery.plz clear me it is necessary to open the datafile in ultra edit mode after the tablespace offline to corrupt the block it cant be performed in open the datafile in note pad. i try your post step in note pad and facing the problem when again back the tablespace online.the error maintain below

sql>alter tablespace corrupt_ts online;
ERROR at line 1:
ORA-01157: cannot identify/lock data file 6 - see DBWR trace file
ORA-01110: data file 6: 'D:\ORACLE\PRODUCT\10.2.0\ORADATA\ORCL\CORRUPT01.DBF'.

plz clear me this error due to use of note pad rahter than ultra edit;
the use of ultra edit is must??

regards
pritesh ranjan
priteshranjan23@gmail.com

Asif Momen said...

@pritesh

You should use HexEditor to modify the datafile. Notepad is NOT an HexEditor. Use any HexEditor like UltraEdit.

Iresh said...

Thanks Asif .
I would like to know how to corrupt asm file if u have any article for that.

On Linux ASM setup i was able to copy file to ext3 file system and follow the same procedure, it worked :)

But on windows i am facing problem to copy ASM file to local file system
error:
"ASMCMD [+data/guidb] > cp diffcrr01.dbf c:\\
ASMCMD-08010: no usr name is specified in remote instance connect_string->'c'"

Thanks in advance

Asif Momen said...

Iresh,

I did not test corrupting datafiles residing in ASM. But that should be straight forward. You copy the datafile to OS, modify, and copy back to ASM.

What version are you on???

Anonymous said...

I opened the data file using ultraedit & changed the block content from corrupt to norrupt. When I tried to save the file , i showed error message that file my be read only or used by another program or application. Save as with another name. What could be the reason for the error.? Please provide solution to save the corrupted dbf file using ultraedit.

Asif Momen said...

You have to place the tablespace in offline mode before making any modifications. Hope this helps.

Anonymous said...

Thanks for this article.
Even I got error
ERROR at line 1:
ORA-01157: cannot identify/lock data file 8 - see DBWR trace file
ORA-01110: data file 8: 'C:\MYDB\DATA\test.DBF'

Anonymous said...

Can you please tell me how you know the corruption is not already in the backed up file, I realise from your example it obviously isn't but how would you tell in a real situation.