Showing posts with label Oracle 10g. Show all posts
Showing posts with label Oracle 10g. Show all posts

Tuesday, July 14, 2009

Who says Full Table Scans are Evil?

Recently, I had a developer complaining that a query which normally runs very fast is now taking huge amount of time to complete.

Following is the culprit query and its execution plan:

SELECT  DISTINCT SUBSTR(A.MF_NO,4) MF_NO,
 B.SRT_SEQ,
 SUBSTR(B.MAIN_CODE,1,6) MAIN_CODE
  FROM  DETAIL_TBL A,
        MASTER_TBL B    
  WHERE A.MRF_ID   = B.MRF_ID    
    AND A.MRF_CODE = B.MRF_CODE    
    AND TRIM(SUBSTR(A.MF_NO,4)) IN (SELECT LPAD(MF_NO,4,0) 
                                     FROM REMOTE_TBL@RMLINK 
                                    WHERE USER_LEVEL = 'RO'  
                                      AND STATUS_CODE <> 'D');

Execution Plan
----------------------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name                  | Rows  | Bytes | Cost (%CPU)| Time     | Inst   |IN-OUT|
----------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                       |       |       |  6646 (100)|          |        |      |
|   1 |  HASH UNIQUE                 |                       |     1 |    62 |  6646   (2)| 00:01:34 |        |      |
|   2 |   TABLE ACCESS BY INDEX ROWID| MASTER_TBL            |     1 |    30 |     5   (0)| 00:00:01 |        |      |
|   3 |    NESTED LOOPS              |                       |     1 |    62 |  6645   (2)| 00:01:34 |        |      |
|*  4 |     HASH JOIN                |                       |     1 |    32 |  6640   (2)| 00:01:33 |        |      |
|   5 |      REMOTE                  | REMOTE_TBL            |     1 |    12 |    24   (0)| 00:00:01 |  REMDB | R->S |
|   6 |      TABLE ACCESS FULL       | DETAIL_TBL            |  4159K|    79M|  6589   (2)| 00:01:33 |        |      |
|*  7 |     INDEX RANGE SCAN         | SYS_C0011692          |     3 |       |     2   (0)| 00:00:01 |        |      |
----------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - access(TRIM(SUBSTR("A"."MF_NO",:SYS_B_3))=LPAD("MF_NO",:SYS_B_4,TO_CHAR(:SYS_B_5)))
   7 - access("A"."MRF_ID"="B"."MRF_ID" AND "A"."MRF_CODE"="B"."MRF_CODE")

Using AWR SQL report (awrsqrpt.sql) for the time when the query was performing better shows that a Full Table Scan was performed on the MASTER_TBL.

Execution Plan
----------------------------------------------------------
Plan hash value: 1931109479

----------------------------------------------------------------------------------------------------------------------
| Id  | Operation            | Name                  | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     | Inst   |IN-OUT|
----------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |                       |  2919K|   172M|       | 42564   (2)| 00:09:56 |        |      |
|   1 |  HASH UNIQUE         |                       |  2919K|   172M|   424M| 42564   (2)| 00:09:56 |        |      |
|*  2 |   HASH JOIN          |                       |  2919K|   172M|    11M| 15147   (2)| 00:03:33 |        |      |
|   3 |    TABLE ACCESS FULL | MASTER_TBL            |   289K|  8494K|       |  5590   (2)| 00:01:19 |        |      |
|*  4 |    HASH JOIN         |                       |  1225K|    37M|       |  6833   (2)| 00:01:36 |        |      |
|   5 |     REMOTE           | REMOTE_TBL            |    29 |   348 |       |    24   (0)| 00:00:01 |  REMDB | R->S |
|   6 |     TABLE ACCESS FULL| DETAIL_TBL            |  4191K|    79M|       |  6781   (2)| 00:01:35 |        |      |
----------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("A"."MRF_ID"="B"."MRF_ID" AND "A"."MRF_CODE"="B"."MRF_CODE")
   4 - access(TRIM(SUBSTR("A"."MF_NO",4))=LPAD("MF_NO",4,'0'))

When we add a Full Table Scan (FTS) hint on the MASTER_TBL, the same query ran in less than 20 seconds. We can modify the query to include FTS hint as it is a home grown application.

SELECT  /*+ FULL(A) */ DISTINCT SUBSTR(A.MF_NO,4) MF_NO,
 B.SRT_SEQ,
 SUBSTR(B.MAIN_CODE,1,6) MAIN_CODE
  FROM  DETAIL_TBL A,
        MASTER_TBL B    
  WHERE A.MRF_ID   = B.MRF_ID    
    AND A.MRF_CODE = B.MRF_CODE    
    AND TRIM(SUBSTR(A.MF_NO,4)) IN (SELECT LPAD(MF_NO,4,0) 
                                     FROM REMOTE_TBL@RMLINK 
                                    WHERE USER_LEVEL = 'RO'  
                                      AND STATUS_CODE <> 'D');


Execution Plan
----------------------------------------------------------
Plan hash value: 1931109479

----------------------------------------------------------------------------------------------------------------------
| Id  | Operation            | Name                  | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     | Inst   |IN-OUT|
----------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |                       |  2919K|   172M|       | 42564   (2)| 00:09:56 |        |      |
|   1 |  HASH UNIQUE         |                       |  2919K|   172M|   424M| 42564   (2)| 00:09:56 |        |      |
|*  2 |   HASH JOIN          |                       |  2919K|   172M|    11M| 15147   (2)| 00:03:33 |        |      |
|   3 |    TABLE ACCESS FULL | MASTER_TBL            |   289K|  8494K|       |  5590   (2)| 00:01:19 |        |      |
|*  4 |    HASH JOIN         |                       |  1225K|    37M|       |  6833   (2)| 00:01:36 |        |      |
|   5 |     REMOTE           | REMOTE_TBL            |    29 |   348 |       |    24   (0)| 00:00:01 |  REMDB | R->S |
|   6 |     TABLE ACCESS FULL| DETAIL_TBL            |  4191K|    79M|       |  6781   (2)| 00:01:35 |        |      |
----------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("A"."MRF_ID"="B"."MRF_ID" AND "A"."MRF_CODE"="B"."MRF_CODE")
   4 - access(TRIM(SUBSTR("A"."MF_NO",4))=LPAD("MF_NO",4,'0'))

This worked as an emergency painkiller before the routine lab tests are performed and an appropriate action is decided.

In the bad execution plan, to access the master table (MASTER_TBL), optimizer is doing an index range scan (“SYS_C0011692”) and the cardinality reported is “3”. This is where the real problem is hiding. Drilling into the tables, columns and indexes for both the tables revealed this information:

Table Name: MASTER_TBL
Primary Key: MRF_ID, MRF_CODE, ORDNO
Num Rows: 287,456

Table Name: CHILD_TBL
Primary Key: MRF_ID, MRF_CODE, ORDNO
Num Rows: 4,159,964

Oops… there are nearly 300,000 records in the MASTER_TBL, but the execution plan indicates a cardinality of merely 3. So, there is something’s serious wrong somewhere. First suspect is table and index statistics. As suspected, statistics on MASTER_TBL were botched up. These were the steps performed on the MASTER_TBL:

1) Data was cleaned 2) Initial data load was run 3) Statistics were collected 4) Remaining major chunk of the data loaded

After collecting fresh statistics on the MASTER_TBL things got back on track. Also, data belonging to columns “MRF_ID” & “MRF_CODE” is evenly spread across all the used blocks in MASTER_TBL & CHILD_TBL tables. Moreover, there is no limiting condition on these columns which could encourage optimizer to go with an index access path. So, it would make no sense for the optimizer to go with an index access path.

Wednesday, April 15, 2009

ORA-07445 When Querying V$SQL_PLAN View

On one of our production databases, I was looking for a particular index usage as how often was it used and at what times it was used. To achieve this, I did a little mining on AWR repository tables and found out that it was used 11 times in the last one month (AWR retention period).

SQL> select * from v$version;

BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bi
PL/SQL Release 10.2.0.4.0 - Production
CORE    10.2.0.4.0      Production
TNS for IBM/AIX RISC System/6000: Version 10.2.0.4.0 - Productio
NLSRTL Version 10.2.0.4.0 - Production

Elapsed: 00:00:00.01
SQL>


SQL> select sp.sql_id,
  2         sp.options,
  3         count(1) cnt
  4    from dba_hist_sql_plan sp,
  5         dba_hist_sqlstat ss
  6    where sp.sql_id = ss.sql_id
  7      and sp.object_owner  =  'PAYMAST' 
  8      and sp.operation like '%INDEX%'
  9      and sp.object_name = 'IDX_COMMIT_DET'
 10  group by sp.sql_id, 
 11           sp.options
 12  order by sp.sql_id, 
 13           sp.options;

SQL_ID        OPTIONS                               CNT
------------- ------------------------------ ----------
1b4xf87ntvfgn SAMPLE FAST FULL SCAN                   1
8qaj7c6wqcyvg SAMPLE FAST FULL SCAN                   7
9jcrv4kunuhg4 SAMPLE FAST FULL SCAN                   1
afzq942kp848t RANGE SCAN                              1
ar6c0r4s4kntp RANGE SCAN                              1

Elapsed: 00:00:00.54
SQL>          

However, querying V$SQL_PLAN view for the same index threw ORA-03113 error. Hmm, I connected to the database and ran the same query again and it resulted in the same error.

SQL> select * from V$SQL_PLAN where  operation  ='INDEX' and object_name ='IDX_COMMIT_DET';

ADDRESS          HASH_VALUE SQL_ID        PLAN_HASH_VALUE CHILD_ADDRESS    CHILD_NUMBER TIMESTAMP       
---------------- ---------- ------------- --------------- ---------------- ------------ ------------
FILTER_PREDICATES                                                                                       
----------------------------------------------------------------------------------------------------
REMARKS                                                                                                 
----------------------------------------------------------------------------------------------------
07000000D98392B0 2773750041 afzq942kp848t       380836487 07000000D986D420            0 15-APR2009 12:2
"VOU_TYPE_NAME"=:B1                                                                                     


ERROR:
ORA-03113: end-of-file on communication channel



ERROR:
ORA-03114: not connected to ORACLE


Elapsed: 00:00:03.82
SQL>

Oracle has bumped some information into the alert log and here it is:

Errors in file /dbdata1/oradba/admin/PAYPROD/udump/payprod_ora_6488244.trc:
ORA-07445: exception encountered: core dump [msqsub+0008] [SIGSEGV] [Address not mapped to object] [0x35B000000000070] [] []

and it created a 18MB trace file in the USER_DUMP_DEST which has all the gory details for the Oracle Support to work on. Below is an excerpt from the trace file:

*** 2009-04-15 12:27:46.646
ksedmp: internal or fatal error
ORA-07445: exception encountered: core dump [msqsub+0008] [SIGSEGV] [Address not mapped to object] [0x35B000000000070] [] []
Current SQL statement for this session:
select * from V$SQL_PLAN where  operation  ='INDEX' and object_name ='IDX_COMMIT_DET'

The ORA-00600/ORA-07445 Lookup Tool in Oracle Metalink reports “A description for this ORA-07445 error is not yet available”.

I have to end up by lodging an SR and wait for Oracle Support’s response.

Tuesday, March 17, 2009

Strange Data Guard Issue

We have recently failed over one of our database to the DRC. At that time the production archivelog sequence was 80,000 plus. After staying there for couple of hours we recreated a physical standby database on the primary and switched back to original primary location. Everything went fine without any hoo-ha.

The failover and switchover took place on February 20, 2009. We have an automated job on all our standby databases to check whether the Standby is lagging behind the Primary, if so, then it automatically restores the missing archivelogs and applies them.

This job was working fine until day before yesterday (March 15, 2009) but started reporting errors thereafter. It failed with “RMAN-20242” error.

Starting restore at 15-MAR-09
released channel: ch12
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of restore command at 03/15/2009 07:26:59
RMAN-06004: ORACLE error from recovery catalog database: 
RMAN-20242: specification does not match any archive log in the recovery catalog

When notified, I queried V$ARCHIVE_GAP and the output of this query took me to a surprise:

SQL> select * from v$version;

BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - 64bi
PL/SQL Release 10.2.0.3.0 - Production
CORE    10.2.0.3.0      Production
TNS for IBM/AIX RISC System/6000: Version 10.2.0.3.0 - Production
NLSRTL Version 10.2.0.3.0 - Production

SQL>
SQL> select * from v$archive_gap;

   THREAD# LOW_SEQUENCE# HIGH_SEQUENCE#
---------- ------------- --------------
         1           776          82468

SQL> 

Well, all of a sudden Oracle thinks that it is 81 thousand archive logs behind the primary. I tried to switching couple of archive logs on the production and the gap was still intact. I don't really know from where did Oracle took this information.

The automated job was trying to restore archives between 776 and 82468 and was kicked out by the recovery catalog database saying, there are no such archive logs (in fact there exist no such archivelogs).

I created a standby controlfile on the production database and replaced the controlfile on the standby database with the new one. The standby database seems to be happy with this version of controlfile and have stopped reporting any gap and the automated job is also leading a happy life.

SQL> select * from v$archive_gap;

no rows selected

SQL>