Monday, May 26, 2008

Database is very slow!!!

Dear Readers,

I received a call from one of our developers and following was the conversation that took place between us:

He: “Database is very slow”

Me: What is very slow? Can you tell me what actually you are doing?

He: I am running a simple report which is supposed to return less than 100 records. But it’s been more than 4 hours and the query is still running.

Me: How long it use to take before?

He: This is the first time we are running this query.

Me: Ok, let me log in to the database.



I logged into the database and ran my set of commands to trace the culprit SQL. I was able to identify the query and here it is:

SELECT
      TO_CHAR(TRN_DATE, 'YYYY-MM-DD HH24:MI:SS') TRN_DATE,
      TRAN_TYPE,
      TRAN_REF,
      KEY_NO,
      IN_AMT A1,
      0 A2
 FROM TRANSACTION_DAILY
WHERE TRN_DATE = TO_DATE ( '#BUSDATE#', 'DDMMYYYY' )
      AND SRC_FLAG ='L'
      AND (SUBSTR(OUT_KEY_NO,1,10) = 'DXIY0-19XV' or SUBSTR(IN_KEY_NO,1,10) = 'DXIY0-19XV')
      AND TRAN_TYPE  NVL(KEY_NO,'A')  NVL(IN_AMT,0) NOT IN (
                    SELECT  TD_TRAN_TYPE  TD_KEY_NO_REF  NVL(OUT_AMT,0)
                      FROM TRANSACTION_DETAILS
                     WHERE TRN_DATE = TO_DATE ( '#BUSDATE#', 'DDMMYYYY' ) 
                       AND RM_CODE = 'XXX123'  )
UNION
SELECT
      TO_CHAR(TRN_DATE, 'YYYY-MM-DD HH24:MI:SS') TRN_DATE,
      TD_TRAN_TYPE,
      TRAN_REF, 
      TD_KEY_NO_REF, 
      0 A1,
      OUT_AMT A2
 FROM TRANSACTION_DETAILS
WHERE TRN_DATE = TO_DATE ( '#BUSDATE#', 'DDMMYYYY' )
  AND RM_CODE = 'XXX123' 
  AND TD_TRAN_TYPE  TD_KEY_NO_REF  NVL(OUT_AMT,0)   NOT IN (
                    SELECT TRAN_TYPE  NVL(KEY_NO,'A')  NVL(IN_AMT,0)
                      FROM TRANSACTION_DAILY
                     WHERE TRN_DATE = TO_DATE ( '#BUSDATE#', 'DDMMYYYY' )
                       AND SRC_FLAG ='L'
                       AND (SUBSTR(OUT_KEY_NO,1,10) = 'DXIY0-19XV' 
                            or SUBSTR(IN_KEY_NO,1,10) = 'DXIY0-19XV') )

and this is the execution plan:

-------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name                  | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
-------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |                       |       |       |    51M(100)|          |       |       |
|   1 |  SORT UNIQUE                          |                       |  2842 |   159K|    51M (70)|198:47:09 |       |       |
|   2 |   UNION-ALL                           |                       |       |       |            |          |       |       |
|*  3 |    FILTER                             |                       |       |       |            |          |       |       |
|   4 |     PARTITION RANGE SINGLE            |                       |  2042 |   135K|  2854   (2)| 00:00:40 |   KEY |   KEY |
|*  5 |      TABLE ACCESS FULL                | TRANSACTION_DAILY     |  2042 |   135K|  2854   (2)| 00:00:40 |   KEY |   KEY |
|   6 |     PARTITION RANGE SINGLE            |                       |   760 | 19000 | 15269   (1)| 00:03:34 |   KEY |   KEY |
|*  7 |      TABLE ACCESS BY LOCAL INDEX ROWID| TRANSACTION_DETAILS   |   760 | 19000 | 15269   (1)| 00:03:34 |   KEY |   KEY |
|*  8 |       INDEX RANGE SCAN                | TRAN_DET_IDX          |   434K|       |   980   (1)| 00:00:14 |   KEY |   KEY |
|*  9 |    FILTER                             |                       |       |       |            |          |       |       |
|  10 |     PARTITION RANGE SINGLE            |                       |   800 | 24800 | 15269   (1)| 00:03:34 |   KEY |   KEY |
|* 11 |      TABLE ACCESS BY LOCAL INDEX ROWID| TRANSACTION_DETAILS   |   800 | 24800 | 15269   (1)| 00:03:34 |   KEY |   KEY |
|* 12 |       INDEX RANGE SCAN                | TRAN_DET_IDX          |   434K|       |   980   (1)| 00:00:14 |   KEY |   KEY |
|* 13 |     TABLE ACCESS BY GLOBAL INDEX ROWID| TRANSACTION_DAILY     |  1940 |   111K| 88735   (1)| 00:20:43 | ROW L | ROW L |
|* 14 |      INDEX RANGE SCAN                 | PK_TRANSACTION_DAILY  |   102K|       |   413   (1)| 00:00:06 |       |       |
-------------------------------------------------------------------------------------------------------------------------------


From the above execution plan, Optimizer thinks it needs “198:47:09” hours to “Sort Unique” the result set. But the actual time the query took to complete was 6 hours and 49 minute (24540 seconds).

I then simplified the query in order to understand it clearly by removing some of the predicates. The simplified query is:

 
SELECT *
 FROM TRANSACTION_DAILY
WHERE TRN_DATE = TO_DATE ( '#BUSDATE#', 'DDMMYYYY' ) 
      AND TRAN_TYPE || NVL(KEY_NO,'A') || NVL(IN_AMT,0) NOT IN (
                    SELECT  TD_TRAN_TYPE || 
                            TD_KEY_NO_REF || 
                            NVL(OUT_AMT,0) 
                      FROM TRANSACTION_DETAILS
                     WHERE TRN_DATE = 
                           TO_DATE ( '#BUSDATE#', 'DDMMYYYY' )  )
UNION
SELECT *
 FROM TRANSACTION_DETAILS
WHERE TRN_DATE = TO_DATE ( '#BUSDATE#', 'DDMMYYYY' ) 
  AND TD_TRAN_TYPE || TD_KEY_NO_REF || NVL(OUT_AMT,0)   NOT IN (
                    SELECT TRAN_TYPE || 
                           NVL(KEY_NO,'A') || 
                           NVL(IN_AMT,0)
                      FROM TRANSACTION_DAILY
                     WHERE TRN_DATE = 
                            TO_DATE ( '#BUSDATE#', 'DDMMYYYY' ) )

Hmmmm, this query seems to be doing a full outer join on “TRANSACTION_DAILY” and “TRANSACTION_DETAILS” tables. I called him back:

Me: What is the objective of this report?

He: We need to find all the information from “TRANSACTION_DAILY” table which does not exist in “TRANSACTION_DETAILS” table and vice versa.

Yes, I was right. It’s doing a “Full Outer Join”. I modified his original query and proposed him a new query which is executing at lighting speed and more importantly yielding the same result:

 
SELECT  nvl(a.TRN_DATE, b.TRN_DATE) TRN_DATE,
        a.TRN_TYPE, b.OTH_TRN_TYPE,
        a.KEY_NO, b.KEY_NO_REF,
        nvl(a.IN_AMT, 0) A1, 
        nvl(b.OUT_AMT, 0) A2
FROM  (select TRN_DATE, TRN_TYPE, KEY_NO, IN_AMT, 
         from TRANSACTION_DAILY 
        where TRN_DATE = to_date('16-05-2008', 'DD-MM-YYYY')
          and (substr(IN_KEY_NO, 1, 10) = 'DXIY0-19XV' OR
               substr(OUT_KEY_NO, 1, 10) = 'DXIY0-19XV')) a 
  FULL OUTER JOIN 
      (select TRN_DATE, OTH_TRN_TYPE, KEY_NO_REF, OUT_AMT, 
         from TRANSACTION_DETAILS 
        where RM_CODE = 'XXX123'
          and TRN_DATE = to_date('16-05-2008', 'DD-MM-YYYY')) b
  ON (a.TRN_DATE = b.TRN_DATE and 
      a.TRN_TYPE = b.OTH_TRN_TYPE and 
      a.KEY_NO = b.KEY_NO_REF and 
      a.IN_AMT = b.OUT_AMT and 
      a.rno = b.rno)
WHERE (a.TRN_TYPE is null or 
       b.OTH_TRN_TYPE is null or 
       a.KEY_NO is null or 
       b.KEY_NO_REF is null or 
       a.IN_AMT is null or 
       b.OUT_AMT is null);

I called him back with over-excitement announcing him that the query is ready and let us test it:

Me: I have re-written the query, kindly come over to my office so that we can test it together.

He: How long does it take to execute?

Me: It completes in less than 4 seconds.

He: What...? … less than 4 seconds … ? … Are you sure, have you included all the condition that I have mentioned?

Me: Yes, let us test it.

He made couple of tests and the query was perfect to all the test cases but except one. Requirements started getting little messy so, I decided to create two tables and test the query against them. I created two tables, A and B, of same table structure with sample data:

 
create table a(id number, amt number);
insert into a values (1, 10);
insert into a values (2, 20);
insert into a values (3, 30);
insert into a values (4, 40);

create table b(id number, amt number);
insert into b values (2, 20);
insert into b values (3, 30);
insert into b values (5, 50);
insert into b values (6, 60);

commit;

Basically, we need to find out data that exists in A and is not in B and also records that exist in B which are not A. Here’s the query:

 
SQL> select *
  2    from a FULL OUTER JOIN b
  3      on a.id = b.id
  4   where a.id is null or b.id is null;

        ID        AMT         ID        AMT
---------- ---------- ---------- ----------
         4         40
                               6         60
                               5         50

SQL>

He said: “Yes, the query is fetching right data, but what happens when you have two records with same id and amt in table A and only a single record in B? In this case, the query should display one record from table A.”

Adding more to the complexity he said: “There is no third column in either table to distinguish this occurrence.”

So, we inserted a record in each table and ran the same query:

 
insert into a values (1, 10);
insert into b values (1, 10);


SQL> select *
  2    from a FULL OUTER JOIN b
  3      on a.id = b.id
  4   where a.id is null or b.id is null;

        ID        AMT         ID        AMT
---------- ---------- ---------- ----------
         4         40
                               6         60
                               5         50

SQL>

Oops!! The query fails at this point. Ok, I then decided to re-write this query and this time use analytical functions to rescue me from the current problematic situation.

 
SQL> select x.id, x.amt, y.id, y.amt
  2    from (select id, amt, 
  3                 row_number() over (partition by id, amt 
  4                                    order by id, amt) rno 
  5            from a) x
  6         FULL OUTER JOIN
  7         (select id, amt, 
  8                 row_number() over (partition by id, amt 
  9                                     order by id, amt) rno 
 10                 from b) y
 11      on x.id = y.id and x.rno = y.rno
 12   where x.id is null 
 13      or y.id is null 
 14      or x.rno is null 
 15      or y.rno is null;

        ID        AMT         ID        AMT
---------- ---------- ---------- ----------
         1         10
         4         40
                               5         50
                               6         60

SQL>

Yippy!!! This query rocks. I then quickly transformed the original query into this form and ran it again:

 
SELECT  nvl(a.TRN_DATE, b.TRN_DATE) TRN_DATE,
        a.TRN_TYPE, b.OTH_TRN_TYPE,
        a.KEY_NO, b.KEY_NO_REF,
        nvl(a.IN_AMT, 0) A1, 
        nvl(b.OUT_AMT, 0) A2
FROM  (select TRN_DATE, TRN_TYPE, KEY_NO, IN_AMT, 
              row_number() over (partition by KEY_NO, 
                                              TRN_TYPE, 
                                              IN_AMT 
                                     order by KEY_NO, 
                                              TRN_TYPE, 
                                              IN_AMT) rno 
         from TRANSACTION_DAILY 
        where TRN_DATE = to_date('16-05-2008', 'DD-MM-YYYY')
          and (substr(IN_KEY_NO, 1, 10) = 'DXIY0-19XV' OR
               substr(OUT_KEY_NO, 1, 10) = 'DXIY0-19XV')) a 
  FULL OUTER JOIN 
      (select TRN_DATE, OTH_TRN_TYPE, KEY_NO_REF, OUT_AMT, 
              row_number() over (partition by KEY_NO_REF, 
                                              OTH_TRN_TYPE, 
                                              OUT_AMT 
                                     order by KEY_NO_REF, 
                                              OTH_TRN_TYPE, 
                                              OUT_AMT) rno    
         from TRANSACTION_DETAILS 
        where RM_CODE = 'XXX123'
          and TRN_DATE = to_date('16-05-2008', 'DD-MM-YYYY')) b
  ON (a.TRN_DATE = b.TRN_DATE and 
      a.TRN_TYPE = b.OTH_TRN_TYPE and 
      a.KEY_NO = b.KEY_NO_REF and
      a.IN_AMT = b.OUT_AMT and 
      a.rno = b.rno)
WHERE (a.TRN_TYPE is null or 
       b.OTH_TRN_TYPE is null or 
       a.KEY_NO is null or 
       b.KEY_NO_REF is null or 
       a.IN_AMT is null or 
       b.OUT_AMT is null or
       a.rno is null or
       b.rno is null);

Execution plan of this query is:

 
----------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                | Name                  | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
----------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                         |                       |  2230 |   409K| 48722   (1)| 00:11:23 |       |       |
|   1 |  VIEW                                    |                       |  2230 |   409K| 48722   (1)| 00:11:23 |       |       |
|   2 |   UNION-ALL                              |                       |       |       |            |          |       |       |
|*  3 |    FILTER                                |                       |       |       |            |          |       |       |
|*  4 |     HASH JOIN RIGHT OUTER                |                       |  1552 |   175K| 24361   (1)| 00:05:42 |       |       |
|   5 |      VIEW                                |                       |   678 | 40002 | 14852   (1)| 00:03:28 |       |       |
|   6 |       WINDOW SORT                        |                       |   678 | 12882 | 14852   (1)| 00:03:28 |       |       |
|   7 |        PARTITION RANGE SINGLE            |                       |   678 | 12882 | 14851   (1)| 00:03:28 |    14 |    14 |
|*  8 |         TABLE ACCESS BY LOCAL INDEX ROWID| TRANSACTION_DETAILS   |   678 | 12882 | 14851   (1)| 00:03:28 |    14 |    14 |
|*  9 |          INDEX RANGE SCAN                | TRAN_DET_IDX          |   422K|       |   897   (1)| 00:00:13 |    14 |    14 |
|  10 |      VIEW                                |                       |  1552 | 88464 |  9508   (1)| 00:02:14 |       |       |
|  11 |       WINDOW SORT                        |                       |  1552 | 69840 |  9508   (1)| 00:02:14 |       |       |
|  12 |        PARTITION RANGE SINGLE            |                       |  1552 | 69840 |  9507   (1)| 00:02:14 |     7 |     7 |
|* 13 |         TABLE ACCESS FULL                | TRANSACTION_DAILY     |  1552 | 69840 |  9507   (1)| 00:02:14 |     7 |     7 |
|* 14 |    HASH JOIN ANTI                        |                       |   678 | 78648 | 24361   (1)| 00:05:42 |       |       |
|  15 |     VIEW                                 |                       |   678 | 40002 | 14852   (1)| 00:03:28 |       |       |
|  16 |      WINDOW SORT                         |                       |   678 | 12882 | 14852   (1)| 00:03:28 |       |       |
|  17 |       PARTITION RANGE SINGLE             |                       |   678 | 12882 | 14851   (1)| 00:03:28 |    14 |    14 |
|* 18 |        TABLE ACCESS BY LOCAL INDEX ROWID | TRANSACTION_DETAILS   |   678 | 12882 | 14851   (1)| 00:03:28 |    14 |    14 |
|* 19 |         INDEX RANGE SCAN                 | TRAN_DET_IDX          |   422K|       |   897   (1)| 00:00:13 |    14 |    14 |
|  20 |     VIEW                                 |                       |  1552 | 88464 |  9508   (1)| 00:02:14 |       |       |
|  21 |      WINDOW SORT                         |                       |  1552 | 69840 |  9508   (1)| 00:02:14 |       |       |
|  22 |       PARTITION RANGE SINGLE             |                       |  1552 | 69840 |  9507   (1)| 00:02:14 |     7 |     7 |
|* 23 |        TABLE ACCESS FULL                 | TRANSACTION_DAILY     |  1552 | 69840 |  9507   (1)| 00:02:14 |     7 |     7 |
----------------------------------------------------------------------------------------------------------------------------------

The results were convincing and it still takes 4 seconds to fetch the data compared to 6 hours and 49 minutes. It is 6135 times faster than the original one.

This query left End-Users rejoicing and he left my desk smiling but without answering my question, “Is the database very slow?”

P.S.: I have renamed all the table names and columns names so as not to reveal official and sensitive information on my personal blog, yet preserving the reality.

Regards

No comments: