Me: What is very slow? Can you tell me what actually you are doing?
He: I am running a simple report which is supposed to return less than 100 records. But it’s been more than 4 hours and the query is still running.
Me: How long it use to take before?
He: This is the first time we are running this query.
Me: Ok, let me log in to the database.
I logged into the database and ran my set of commands to trace the culprit SQL. I was able to identify the query and here it is:
SELECT TO_CHAR(TRN_DATE, 'YYYY-MM-DD HH24:MI:SS') TRN_DATE, TRAN_TYPE, TRAN_REF, KEY_NO, IN_AMT A1, 0 A2 FROM TRANSACTION_DAILY WHERE TRN_DATE = TO_DATE ( '#BUSDATE#', 'DDMMYYYY' ) AND SRC_FLAG ='L' AND (SUBSTR(OUT_KEY_NO,1,10) = 'DXIY0-19XV' or SUBSTR(IN_KEY_NO,1,10) = 'DXIY0-19XV') AND TRAN_TYPE NVL(KEY_NO,'A') NVL(IN_AMT,0) NOT IN ( SELECT TD_TRAN_TYPE TD_KEY_NO_REF NVL(OUT_AMT,0) FROM TRANSACTION_DETAILS WHERE TRN_DATE = TO_DATE ( '#BUSDATE#', 'DDMMYYYY' ) AND RM_CODE = 'XXX123' ) UNION SELECT TO_CHAR(TRN_DATE, 'YYYY-MM-DD HH24:MI:SS') TRN_DATE, TD_TRAN_TYPE, TRAN_REF, TD_KEY_NO_REF, 0 A1, OUT_AMT A2 FROM TRANSACTION_DETAILS WHERE TRN_DATE = TO_DATE ( '#BUSDATE#', 'DDMMYYYY' ) AND RM_CODE = 'XXX123' AND TD_TRAN_TYPE TD_KEY_NO_REF NVL(OUT_AMT,0) NOT IN ( SELECT TRAN_TYPE NVL(KEY_NO,'A') NVL(IN_AMT,0) FROM TRANSACTION_DAILY WHERE TRN_DATE = TO_DATE ( '#BUSDATE#', 'DDMMYYYY' ) AND SRC_FLAG ='L' AND (SUBSTR(OUT_KEY_NO,1,10) = 'DXIY0-19XV' or SUBSTR(IN_KEY_NO,1,10) = 'DXIY0-19XV') )and this is the execution plan:
------------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop | ------------------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | | | 51M(100)| | | | | 1 | SORT UNIQUE | | 2842 | 159K| 51M (70)|198:47:09 | | | | 2 | UNION-ALL | | | | | | | | |* 3 | FILTER | | | | | | | | | 4 | PARTITION RANGE SINGLE | | 2042 | 135K| 2854 (2)| 00:00:40 | KEY | KEY | |* 5 | TABLE ACCESS FULL | TRANSACTION_DAILY | 2042 | 135K| 2854 (2)| 00:00:40 | KEY | KEY | | 6 | PARTITION RANGE SINGLE | | 760 | 19000 | 15269 (1)| 00:03:34 | KEY | KEY | |* 7 | TABLE ACCESS BY LOCAL INDEX ROWID| TRANSACTION_DETAILS | 760 | 19000 | 15269 (1)| 00:03:34 | KEY | KEY | |* 8 | INDEX RANGE SCAN | TRAN_DET_IDX | 434K| | 980 (1)| 00:00:14 | KEY | KEY | |* 9 | FILTER | | | | | | | | | 10 | PARTITION RANGE SINGLE | | 800 | 24800 | 15269 (1)| 00:03:34 | KEY | KEY | |* 11 | TABLE ACCESS BY LOCAL INDEX ROWID| TRANSACTION_DETAILS | 800 | 24800 | 15269 (1)| 00:03:34 | KEY | KEY | |* 12 | INDEX RANGE SCAN | TRAN_DET_IDX | 434K| | 980 (1)| 00:00:14 | KEY | KEY | |* 13 | TABLE ACCESS BY GLOBAL INDEX ROWID| TRANSACTION_DAILY | 1940 | 111K| 88735 (1)| 00:20:43 | ROW L | ROW L | |* 14 | INDEX RANGE SCAN | PK_TRANSACTION_DAILY | 102K| | 413 (1)| 00:00:06 | | | -------------------------------------------------------------------------------------------------------------------------------From the above execution plan, Optimizer thinks it needs “198:47:09” hours to “Sort Unique” the result set. But the actual time the query took to complete was 6 hours and 49 minute (24540 seconds). I then simplified the query in order to understand it clearly by removing some of the predicates. The simplified query is:
SELECT * FROM TRANSACTION_DAILY WHERE TRN_DATE = TO_DATE ( '#BUSDATE#', 'DDMMYYYY' ) AND TRAN_TYPE || NVL(KEY_NO,'A') || NVL(IN_AMT,0) NOT IN ( SELECT TD_TRAN_TYPE || TD_KEY_NO_REF || NVL(OUT_AMT,0) FROM TRANSACTION_DETAILS WHERE TRN_DATE = TO_DATE ( '#BUSDATE#', 'DDMMYYYY' ) ) UNION SELECT * FROM TRANSACTION_DETAILS WHERE TRN_DATE = TO_DATE ( '#BUSDATE#', 'DDMMYYYY' ) AND TD_TRAN_TYPE || TD_KEY_NO_REF || NVL(OUT_AMT,0) NOT IN ( SELECT TRAN_TYPE || NVL(KEY_NO,'A') || NVL(IN_AMT,0) FROM TRANSACTION_DAILY WHERE TRN_DATE = TO_DATE ( '#BUSDATE#', 'DDMMYYYY' ) )Hmmmm, this query seems to be doing a full outer join on “TRANSACTION_DAILY” and “TRANSACTION_DETAILS” tables. I called him back: Me: What is the objective of this report?
He: We need to find all the information from “TRANSACTION_DAILY” table which does not exist in “TRANSACTION_DETAILS” table and vice versa. Yes, I was right. It’s doing a “Full Outer Join”. I modified his original query and proposed him a new query which is executing at lighting speed and more importantly yielding the same result:
SELECT nvl(a.TRN_DATE, b.TRN_DATE) TRN_DATE, a.TRN_TYPE, b.OTH_TRN_TYPE, a.KEY_NO, b.KEY_NO_REF, nvl(a.IN_AMT, 0) A1, nvl(b.OUT_AMT, 0) A2 FROM (select TRN_DATE, TRN_TYPE, KEY_NO, IN_AMT, from TRANSACTION_DAILY where TRN_DATE = to_date('16-05-2008', 'DD-MM-YYYY') and (substr(IN_KEY_NO, 1, 10) = 'DXIY0-19XV' OR substr(OUT_KEY_NO, 1, 10) = 'DXIY0-19XV')) a FULL OUTER JOIN (select TRN_DATE, OTH_TRN_TYPE, KEY_NO_REF, OUT_AMT, from TRANSACTION_DETAILS where RM_CODE = 'XXX123' and TRN_DATE = to_date('16-05-2008', 'DD-MM-YYYY')) b ON (a.TRN_DATE = b.TRN_DATE and a.TRN_TYPE = b.OTH_TRN_TYPE and a.KEY_NO = b.KEY_NO_REF and a.IN_AMT = b.OUT_AMT and a.rno = b.rno) WHERE (a.TRN_TYPE is null or b.OTH_TRN_TYPE is null or a.KEY_NO is null or b.KEY_NO_REF is null or a.IN_AMT is null or b.OUT_AMT is null);I called him back with over-excitement announcing him that the query is ready and let us test it: Me: I have re-written the query, kindly come over to my office so that we can test it together.
He: How long does it take to execute?
Me: It completes in less than 4 seconds.
He: What...? … less than 4 seconds … ? … Are you sure, have you included all the condition that I have mentioned?
Me: Yes, let us test it. He made couple of tests and the query was perfect to all the test cases but except one. Requirements started getting little messy so, I decided to create two tables and test the query against them. I created two tables, A and B, of same table structure with sample data:
create table a(id number, amt number); insert into a values (1, 10); insert into a values (2, 20); insert into a values (3, 30); insert into a values (4, 40); create table b(id number, amt number); insert into b values (2, 20); insert into b values (3, 30); insert into b values (5, 50); insert into b values (6, 60); commit;Basically, we need to find out data that exists in A and is not in B and also records that exist in B which are not A. Here’s the query:
SQL> select * 2 from a FULL OUTER JOIN b 3 on a.id = b.id 4 where a.id is null or b.id is null; ID AMT ID AMT ---------- ---------- ---------- ---------- 4 40 6 60 5 50 SQL>He said: “Yes, the query is fetching right data, but what happens when you have two records with same id and amt in table A and only a single record in B? In this case, the query should display one record from table A.” Adding more to the complexity he said: “There is no third column in either table to distinguish this occurrence.” So, we inserted a record in each table and ran the same query:
insert into a values (1, 10); insert into b values (1, 10); SQL> select * 2 from a FULL OUTER JOIN b 3 on a.id = b.id 4 where a.id is null or b.id is null; ID AMT ID AMT ---------- ---------- ---------- ---------- 4 40 6 60 5 50 SQL>Oops!! The query fails at this point. Ok, I then decided to re-write this query and this time use analytical functions to rescue me from the current problematic situation.
SQL> select x.id, x.amt, y.id, y.amt 2 from (select id, amt, 3 row_number() over (partition by id, amt 4 order by id, amt) rno 5 from a) x 6 FULL OUTER JOIN 7 (select id, amt, 8 row_number() over (partition by id, amt 9 order by id, amt) rno 10 from b) y 11 on x.id = y.id and x.rno = y.rno 12 where x.id is null 13 or y.id is null 14 or x.rno is null 15 or y.rno is null; ID AMT ID AMT ---------- ---------- ---------- ---------- 1 10 4 40 5 50 6 60 SQL>Yippy!!! This query rocks. I then quickly transformed the original query into this form and ran it again:
SELECT nvl(a.TRN_DATE, b.TRN_DATE) TRN_DATE, a.TRN_TYPE, b.OTH_TRN_TYPE, a.KEY_NO, b.KEY_NO_REF, nvl(a.IN_AMT, 0) A1, nvl(b.OUT_AMT, 0) A2 FROM (select TRN_DATE, TRN_TYPE, KEY_NO, IN_AMT, row_number() over (partition by KEY_NO, TRN_TYPE, IN_AMT order by KEY_NO, TRN_TYPE, IN_AMT) rno from TRANSACTION_DAILY where TRN_DATE = to_date('16-05-2008', 'DD-MM-YYYY') and (substr(IN_KEY_NO, 1, 10) = 'DXIY0-19XV' OR substr(OUT_KEY_NO, 1, 10) = 'DXIY0-19XV')) a FULL OUTER JOIN (select TRN_DATE, OTH_TRN_TYPE, KEY_NO_REF, OUT_AMT, row_number() over (partition by KEY_NO_REF, OTH_TRN_TYPE, OUT_AMT order by KEY_NO_REF, OTH_TRN_TYPE, OUT_AMT) rno from TRANSACTION_DETAILS where RM_CODE = 'XXX123' and TRN_DATE = to_date('16-05-2008', 'DD-MM-YYYY')) b ON (a.TRN_DATE = b.TRN_DATE and a.TRN_TYPE = b.OTH_TRN_TYPE and a.KEY_NO = b.KEY_NO_REF and a.IN_AMT = b.OUT_AMT and a.rno = b.rno) WHERE (a.TRN_TYPE is null or b.OTH_TRN_TYPE is null or a.KEY_NO is null or b.KEY_NO_REF is null or a.IN_AMT is null or b.OUT_AMT is null or a.rno is null or b.rno is null);Execution plan of this query is:
---------------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop | ---------------------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 2230 | 409K| 48722 (1)| 00:11:23 | | | | 1 | VIEW | | 2230 | 409K| 48722 (1)| 00:11:23 | | | | 2 | UNION-ALL | | | | | | | | |* 3 | FILTER | | | | | | | | |* 4 | HASH JOIN RIGHT OUTER | | 1552 | 175K| 24361 (1)| 00:05:42 | | | | 5 | VIEW | | 678 | 40002 | 14852 (1)| 00:03:28 | | | | 6 | WINDOW SORT | | 678 | 12882 | 14852 (1)| 00:03:28 | | | | 7 | PARTITION RANGE SINGLE | | 678 | 12882 | 14851 (1)| 00:03:28 | 14 | 14 | |* 8 | TABLE ACCESS BY LOCAL INDEX ROWID| TRANSACTION_DETAILS | 678 | 12882 | 14851 (1)| 00:03:28 | 14 | 14 | |* 9 | INDEX RANGE SCAN | TRAN_DET_IDX | 422K| | 897 (1)| 00:00:13 | 14 | 14 | | 10 | VIEW | | 1552 | 88464 | 9508 (1)| 00:02:14 | | | | 11 | WINDOW SORT | | 1552 | 69840 | 9508 (1)| 00:02:14 | | | | 12 | PARTITION RANGE SINGLE | | 1552 | 69840 | 9507 (1)| 00:02:14 | 7 | 7 | |* 13 | TABLE ACCESS FULL | TRANSACTION_DAILY | 1552 | 69840 | 9507 (1)| 00:02:14 | 7 | 7 | |* 14 | HASH JOIN ANTI | | 678 | 78648 | 24361 (1)| 00:05:42 | | | | 15 | VIEW | | 678 | 40002 | 14852 (1)| 00:03:28 | | | | 16 | WINDOW SORT | | 678 | 12882 | 14852 (1)| 00:03:28 | | | | 17 | PARTITION RANGE SINGLE | | 678 | 12882 | 14851 (1)| 00:03:28 | 14 | 14 | |* 18 | TABLE ACCESS BY LOCAL INDEX ROWID | TRANSACTION_DETAILS | 678 | 12882 | 14851 (1)| 00:03:28 | 14 | 14 | |* 19 | INDEX RANGE SCAN | TRAN_DET_IDX | 422K| | 897 (1)| 00:00:13 | 14 | 14 | | 20 | VIEW | | 1552 | 88464 | 9508 (1)| 00:02:14 | | | | 21 | WINDOW SORT | | 1552 | 69840 | 9508 (1)| 00:02:14 | | | | 22 | PARTITION RANGE SINGLE | | 1552 | 69840 | 9507 (1)| 00:02:14 | 7 | 7 | |* 23 | TABLE ACCESS FULL | TRANSACTION_DAILY | 1552 | 69840 | 9507 (1)| 00:02:14 | 7 | 7 | ----------------------------------------------------------------------------------------------------------------------------------The results were convincing and it still takes 4 seconds to fetch the data compared to 6 hours and 49 minutes. It is 6135 times faster than the original one. This query left End-Users rejoicing and he left my desk smiling but without answering my question, “Is the database very slow?” P.S.: I have renamed all the table names and columns names so as not to reveal official and sensitive information on my personal blog, yet preserving the reality. Regards