Me: What is very slow? Can you tell me what actually you are doing?
He: I am running a simple report which is supposed to return less than 100 records. But it’s been more than 4 hours and the query is still running.
Me: How long it use to take before?
He: This is the first time we are running this query.
Me: Ok, let me log in to the database.
I logged into the database and ran my set of commands to trace the culprit SQL. I was able to identify the query and here it is:
SELECT
TO_CHAR(TRN_DATE, 'YYYY-MM-DD HH24:MI:SS') TRN_DATE,
TRAN_TYPE,
TRAN_REF,
KEY_NO,
IN_AMT A1,
0 A2
FROM TRANSACTION_DAILY
WHERE TRN_DATE = TO_DATE ( '#BUSDATE#', 'DDMMYYYY' )
AND SRC_FLAG ='L'
AND (SUBSTR(OUT_KEY_NO,1,10) = 'DXIY0-19XV' or SUBSTR(IN_KEY_NO,1,10) = 'DXIY0-19XV')
AND TRAN_TYPE NVL(KEY_NO,'A') NVL(IN_AMT,0) NOT IN (
SELECT TD_TRAN_TYPE TD_KEY_NO_REF NVL(OUT_AMT,0)
FROM TRANSACTION_DETAILS
WHERE TRN_DATE = TO_DATE ( '#BUSDATE#', 'DDMMYYYY' )
AND RM_CODE = 'XXX123' )
UNION
SELECT
TO_CHAR(TRN_DATE, 'YYYY-MM-DD HH24:MI:SS') TRN_DATE,
TD_TRAN_TYPE,
TRAN_REF,
TD_KEY_NO_REF,
0 A1,
OUT_AMT A2
FROM TRANSACTION_DETAILS
WHERE TRN_DATE = TO_DATE ( '#BUSDATE#', 'DDMMYYYY' )
AND RM_CODE = 'XXX123'
AND TD_TRAN_TYPE TD_KEY_NO_REF NVL(OUT_AMT,0) NOT IN (
SELECT TRAN_TYPE NVL(KEY_NO,'A') NVL(IN_AMT,0)
FROM TRANSACTION_DAILY
WHERE TRN_DATE = TO_DATE ( '#BUSDATE#', 'DDMMYYYY' )
AND SRC_FLAG ='L'
AND (SUBSTR(OUT_KEY_NO,1,10) = 'DXIY0-19XV'
or SUBSTR(IN_KEY_NO,1,10) = 'DXIY0-19XV') )
and this is the execution plan:
------------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop | ------------------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | | | 51M(100)| | | | | 1 | SORT UNIQUE | | 2842 | 159K| 51M (70)|198:47:09 | | | | 2 | UNION-ALL | | | | | | | | |* 3 | FILTER | | | | | | | | | 4 | PARTITION RANGE SINGLE | | 2042 | 135K| 2854 (2)| 00:00:40 | KEY | KEY | |* 5 | TABLE ACCESS FULL | TRANSACTION_DAILY | 2042 | 135K| 2854 (2)| 00:00:40 | KEY | KEY | | 6 | PARTITION RANGE SINGLE | | 760 | 19000 | 15269 (1)| 00:03:34 | KEY | KEY | |* 7 | TABLE ACCESS BY LOCAL INDEX ROWID| TRANSACTION_DETAILS | 760 | 19000 | 15269 (1)| 00:03:34 | KEY | KEY | |* 8 | INDEX RANGE SCAN | TRAN_DET_IDX | 434K| | 980 (1)| 00:00:14 | KEY | KEY | |* 9 | FILTER | | | | | | | | | 10 | PARTITION RANGE SINGLE | | 800 | 24800 | 15269 (1)| 00:03:34 | KEY | KEY | |* 11 | TABLE ACCESS BY LOCAL INDEX ROWID| TRANSACTION_DETAILS | 800 | 24800 | 15269 (1)| 00:03:34 | KEY | KEY | |* 12 | INDEX RANGE SCAN | TRAN_DET_IDX | 434K| | 980 (1)| 00:00:14 | KEY | KEY | |* 13 | TABLE ACCESS BY GLOBAL INDEX ROWID| TRANSACTION_DAILY | 1940 | 111K| 88735 (1)| 00:20:43 | ROW L | ROW L | |* 14 | INDEX RANGE SCAN | PK_TRANSACTION_DAILY | 102K| | 413 (1)| 00:00:06 | | | -------------------------------------------------------------------------------------------------------------------------------From the above execution plan, Optimizer thinks it needs “198:47:09” hours to “Sort Unique” the result set. But the actual time the query took to complete was 6 hours and 49 minute (24540 seconds). I then simplified the query in order to understand it clearly by removing some of the predicates. The simplified query is:
SELECT *
FROM TRANSACTION_DAILY
WHERE TRN_DATE = TO_DATE ( '#BUSDATE#', 'DDMMYYYY' )
AND TRAN_TYPE || NVL(KEY_NO,'A') || NVL(IN_AMT,0) NOT IN (
SELECT TD_TRAN_TYPE ||
TD_KEY_NO_REF ||
NVL(OUT_AMT,0)
FROM TRANSACTION_DETAILS
WHERE TRN_DATE =
TO_DATE ( '#BUSDATE#', 'DDMMYYYY' ) )
UNION
SELECT *
FROM TRANSACTION_DETAILS
WHERE TRN_DATE = TO_DATE ( '#BUSDATE#', 'DDMMYYYY' )
AND TD_TRAN_TYPE || TD_KEY_NO_REF || NVL(OUT_AMT,0) NOT IN (
SELECT TRAN_TYPE ||
NVL(KEY_NO,'A') ||
NVL(IN_AMT,0)
FROM TRANSACTION_DAILY
WHERE TRN_DATE =
TO_DATE ( '#BUSDATE#', 'DDMMYYYY' ) )
Hmmmm, this query seems to be doing a full outer join on “TRANSACTION_DAILY” and “TRANSACTION_DETAILS” tables. I called him back:
Me: What is the objective of this report?He: We need to find all the information from “TRANSACTION_DAILY” table which does not exist in “TRANSACTION_DETAILS” table and vice versa. Yes, I was right. It’s doing a “Full Outer Join”. I modified his original query and proposed him a new query which is executing at lighting speed and more importantly yielding the same result:
SELECT nvl(a.TRN_DATE, b.TRN_DATE) TRN_DATE,
a.TRN_TYPE, b.OTH_TRN_TYPE,
a.KEY_NO, b.KEY_NO_REF,
nvl(a.IN_AMT, 0) A1,
nvl(b.OUT_AMT, 0) A2
FROM (select TRN_DATE, TRN_TYPE, KEY_NO, IN_AMT,
from TRANSACTION_DAILY
where TRN_DATE = to_date('16-05-2008', 'DD-MM-YYYY')
and (substr(IN_KEY_NO, 1, 10) = 'DXIY0-19XV' OR
substr(OUT_KEY_NO, 1, 10) = 'DXIY0-19XV')) a
FULL OUTER JOIN
(select TRN_DATE, OTH_TRN_TYPE, KEY_NO_REF, OUT_AMT,
from TRANSACTION_DETAILS
where RM_CODE = 'XXX123'
and TRN_DATE = to_date('16-05-2008', 'DD-MM-YYYY')) b
ON (a.TRN_DATE = b.TRN_DATE and
a.TRN_TYPE = b.OTH_TRN_TYPE and
a.KEY_NO = b.KEY_NO_REF and
a.IN_AMT = b.OUT_AMT and
a.rno = b.rno)
WHERE (a.TRN_TYPE is null or
b.OTH_TRN_TYPE is null or
a.KEY_NO is null or
b.KEY_NO_REF is null or
a.IN_AMT is null or
b.OUT_AMT is null);
I called him back with over-excitement announcing him that the query is ready and let us test it:
Me: I have re-written the query, kindly come over to my office so that we can test it together.He: How long does it take to execute?
Me: It completes in less than 4 seconds.
He: What...? … less than 4 seconds … ? … Are you sure, have you included all the condition that I have mentioned?
Me: Yes, let us test it. He made couple of tests and the query was perfect to all the test cases but except one. Requirements started getting little messy so, I decided to create two tables and test the query against them. I created two tables, A and B, of same table structure with sample data:
create table a(id number, amt number); insert into a values (1, 10); insert into a values (2, 20); insert into a values (3, 30); insert into a values (4, 40); create table b(id number, amt number); insert into b values (2, 20); insert into b values (3, 30); insert into b values (5, 50); insert into b values (6, 60); commit;Basically, we need to find out data that exists in A and is not in B and also records that exist in B which are not A. Here’s the query:
SQL> select *
2 from a FULL OUTER JOIN b
3 on a.id = b.id
4 where a.id is null or b.id is null;
ID AMT ID AMT
---------- ---------- ---------- ----------
4 40
6 60
5 50
SQL>
He said: “Yes, the query is fetching right data, but what happens when you have two records with same id and amt in table A and only a single record in B? In this case, the query should display one record from table A.”
Adding more to the complexity he said: “There is no third column in either table to distinguish this occurrence.”
So, we inserted a record in each table and ran the same query:
insert into a values (1, 10);
insert into b values (1, 10);
SQL> select *
2 from a FULL OUTER JOIN b
3 on a.id = b.id
4 where a.id is null or b.id is null;
ID AMT ID AMT
---------- ---------- ---------- ----------
4 40
6 60
5 50
SQL>
Oops!! The query fails at this point. Ok, I then decided to re-write this query and this time use analytical functions to rescue me from the current problematic situation.
SQL> select x.id, x.amt, y.id, y.amt
2 from (select id, amt,
3 row_number() over (partition by id, amt
4 order by id, amt) rno
5 from a) x
6 FULL OUTER JOIN
7 (select id, amt,
8 row_number() over (partition by id, amt
9 order by id, amt) rno
10 from b) y
11 on x.id = y.id and x.rno = y.rno
12 where x.id is null
13 or y.id is null
14 or x.rno is null
15 or y.rno is null;
ID AMT ID AMT
---------- ---------- ---------- ----------
1 10
4 40
5 50
6 60
SQL>
Yippy!!! This query rocks. I then quickly transformed the original query into this form and ran it again:
SELECT nvl(a.TRN_DATE, b.TRN_DATE) TRN_DATE,
a.TRN_TYPE, b.OTH_TRN_TYPE,
a.KEY_NO, b.KEY_NO_REF,
nvl(a.IN_AMT, 0) A1,
nvl(b.OUT_AMT, 0) A2
FROM (select TRN_DATE, TRN_TYPE, KEY_NO, IN_AMT,
row_number() over (partition by KEY_NO,
TRN_TYPE,
IN_AMT
order by KEY_NO,
TRN_TYPE,
IN_AMT) rno
from TRANSACTION_DAILY
where TRN_DATE = to_date('16-05-2008', 'DD-MM-YYYY')
and (substr(IN_KEY_NO, 1, 10) = 'DXIY0-19XV' OR
substr(OUT_KEY_NO, 1, 10) = 'DXIY0-19XV')) a
FULL OUTER JOIN
(select TRN_DATE, OTH_TRN_TYPE, KEY_NO_REF, OUT_AMT,
row_number() over (partition by KEY_NO_REF,
OTH_TRN_TYPE,
OUT_AMT
order by KEY_NO_REF,
OTH_TRN_TYPE,
OUT_AMT) rno
from TRANSACTION_DETAILS
where RM_CODE = 'XXX123'
and TRN_DATE = to_date('16-05-2008', 'DD-MM-YYYY')) b
ON (a.TRN_DATE = b.TRN_DATE and
a.TRN_TYPE = b.OTH_TRN_TYPE and
a.KEY_NO = b.KEY_NO_REF and
a.IN_AMT = b.OUT_AMT and
a.rno = b.rno)
WHERE (a.TRN_TYPE is null or
b.OTH_TRN_TYPE is null or
a.KEY_NO is null or
b.KEY_NO_REF is null or
a.IN_AMT is null or
b.OUT_AMT is null or
a.rno is null or
b.rno is null);
Execution plan of this query is:
---------------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop | ---------------------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 2230 | 409K| 48722 (1)| 00:11:23 | | | | 1 | VIEW | | 2230 | 409K| 48722 (1)| 00:11:23 | | | | 2 | UNION-ALL | | | | | | | | |* 3 | FILTER | | | | | | | | |* 4 | HASH JOIN RIGHT OUTER | | 1552 | 175K| 24361 (1)| 00:05:42 | | | | 5 | VIEW | | 678 | 40002 | 14852 (1)| 00:03:28 | | | | 6 | WINDOW SORT | | 678 | 12882 | 14852 (1)| 00:03:28 | | | | 7 | PARTITION RANGE SINGLE | | 678 | 12882 | 14851 (1)| 00:03:28 | 14 | 14 | |* 8 | TABLE ACCESS BY LOCAL INDEX ROWID| TRANSACTION_DETAILS | 678 | 12882 | 14851 (1)| 00:03:28 | 14 | 14 | |* 9 | INDEX RANGE SCAN | TRAN_DET_IDX | 422K| | 897 (1)| 00:00:13 | 14 | 14 | | 10 | VIEW | | 1552 | 88464 | 9508 (1)| 00:02:14 | | | | 11 | WINDOW SORT | | 1552 | 69840 | 9508 (1)| 00:02:14 | | | | 12 | PARTITION RANGE SINGLE | | 1552 | 69840 | 9507 (1)| 00:02:14 | 7 | 7 | |* 13 | TABLE ACCESS FULL | TRANSACTION_DAILY | 1552 | 69840 | 9507 (1)| 00:02:14 | 7 | 7 | |* 14 | HASH JOIN ANTI | | 678 | 78648 | 24361 (1)| 00:05:42 | | | | 15 | VIEW | | 678 | 40002 | 14852 (1)| 00:03:28 | | | | 16 | WINDOW SORT | | 678 | 12882 | 14852 (1)| 00:03:28 | | | | 17 | PARTITION RANGE SINGLE | | 678 | 12882 | 14851 (1)| 00:03:28 | 14 | 14 | |* 18 | TABLE ACCESS BY LOCAL INDEX ROWID | TRANSACTION_DETAILS | 678 | 12882 | 14851 (1)| 00:03:28 | 14 | 14 | |* 19 | INDEX RANGE SCAN | TRAN_DET_IDX | 422K| | 897 (1)| 00:00:13 | 14 | 14 | | 20 | VIEW | | 1552 | 88464 | 9508 (1)| 00:02:14 | | | | 21 | WINDOW SORT | | 1552 | 69840 | 9508 (1)| 00:02:14 | | | | 22 | PARTITION RANGE SINGLE | | 1552 | 69840 | 9507 (1)| 00:02:14 | 7 | 7 | |* 23 | TABLE ACCESS FULL | TRANSACTION_DAILY | 1552 | 69840 | 9507 (1)| 00:02:14 | 7 | 7 | ----------------------------------------------------------------------------------------------------------------------------------The results were convincing and it still takes 4 seconds to fetch the data compared to 6 hours and 49 minutes. It is 6135 times faster than the original one. This query left End-Users rejoicing and he left my desk smiling but without answering my question, “Is the database very slow?” P.S.: I have renamed all the table names and columns names so as not to reveal official and sensitive information on my personal blog, yet preserving the reality. Regards
No comments:
Post a Comment