Wednesday, November 05, 2008

Analytic Functions: A Savior

Hi,

Yesterday, I have come across yet another performance issue. I received a complaint from one of our developers that they have a business report which is running extremely slow.

I received his mail and attached was the query:

SELECT   
  XLT.VOU_DET.VOU_DATE,
  XLT.VOU_DET.VALUE_DATE,
  XLT.VOU_DET.DESCRIPTION,
  XLT.VOU_DET.PIX_CODE,
  VOU_TYPE.DESCR,
  XLT.VOU_DET.PIX_BRN,
  XLT.VOU_DET.DR_BAL_ORIG,
  XLT.VOU_DET.CR_BAL_ORIG,
  XLT.VOU_DET.CLOSING_BAL_ORIG,
  XLT.VOU_LOOKUP.NAME,
  XLT.VOU_DET.VOU_SEQ,
  (SELECT TO_CHAR(X.OPEN_BAL_ORIG) 
     FROM XLT.VOU_DET X 
    WHERE X.ASOF_DATE BETWEEN to_date('01-05-2007', 'dd-mm-yyyy') 
                          AND to_date('30-09-2008', 'dd-mm-yyyy')
      AND X.VOU_CODE  = '9900016WXYRT01'
      AND X.VOU_SEQ = 1
      AND ROWNUM = 1) OPEN_BAL_ORIG,
  (SELECT count(  XLT.VOU_DET.DR_BAL_ORIG) 
     FROM (select DR_BAL_ORIG 
              From XLT.VOU_DET X 
             WHERE X.ASOF_DATE BETWEEN to_date('01-05-2007', 'dd-mm-yyyy') 
                                   AND to_date('30-09-2008', 'dd-mm-yyyy')
               AND X.VOU_CODE  = '9900016WXYRT01'
               AND X.DR_BAL_ORIG <>0)) DR_BAL_ORIG_CNT,
  (SELECT count(  XLT.VOU_DET.CR_BAL_ORIG) 
     FROM (select CR_BAL_ORIG 
             From XLT.VOU_DET X 
            WHERE X.ASOF_DATE BETWEEN to_date('01-05-2007', 'dd-mm-yyyy') 
                                  AND to_date('30-09-2008', 'dd-mm-yyyy')
              AND X.VOU_CODE  = '9900016WXYRT01'
              AND X.CR_BAL_ORIG <>0)) CR_BAL_ORIG_CNT 
FROM
  XLT.VOU_DET,
  XLT.X_VOU_TYPE  VOU_TYPE,
  XLT.VOU_LOOKUP
WHERE XLT.VOU_DET.VOU_TYPE_ID=VOU_TYPE.VOU_TYPE_ID
  AND XLT.VOU_DET.VOU_REF(+)=XLT.VOU_LOOKUP.CUST_CODE_WNG
  AND XLT.VOU_DET.ASOF_DATE  BETWEEN  to_date('01-05-2007', 'dd-mm-yyyy') AND to_date('30-09-2008', 'dd-mm-yyyy')
  AND XLT.VOU_DET.VOU_CODE  =  '9900016WXYRT01'
ORDER BY
  XLT.VOU_DET.VOU_SEQ,
  XLT.VOU_DET.ASOF_DATE;

This looks like a case of missing Analytic functions.

The three SELECT statements within the query can be easily replaced with simple Analytic functions. The first one is to fetch the opening balance:

  (SELECT TO_CHAR(X.OPEN_BAL_ORIG) 
     FROM XLT.VOU_DET X 
    WHERE X.ASOF_DATE BETWEEN to_date('01-05-2007', 'dd-mm-yyyy') 
                          AND to_date('30-09-2008', 'dd-mm-yyyy')
      AND X.VOU_CODE  = '9900016WXYRT01'
      AND X.VOU_SEQ = 1
      AND ROWNUM = 1) OPEN_BAL_ORIG,

and could be easily rewritten as:

first_value(OPEN_BAL_ORIG) 
      over (partition by VOU_CODE 
                Order by ASOF_DATE, decode(VOU_SEQ, 0, 99, VOU_SEQ)) VOU_SEQ,

The second and third are the number of Debit and Credit transactions respectively and can also be written as:

  count(decode(DR_BAL_ORIG, null, null, 0, null, 1)) 
      over (partition by  VOU_CODE) DR_BAL_ORIG_cnt,

and

  count(decode(CR_BAL_ORIG, null, null, 0, null, 1)) 
      over (partition by  VOU_CODE) CR_BAL_ORIG_cnt

Now, its time to execute the modified query and (1) compare the result set, (2) measure time taken, (3) and compare statistics with the original query.

Well, the result set was compared and was convincing. The enhanced query completes in less than 20 seconds while the original took just over 17 minutes. So, the new query is 173 times faster than the original one.

Lastly, let’s compare statistics.

Statistics of Original query:

SQL> set autotrace traceonly stat
SQL> @tuneme

661 rows selected.

Elapsed: 00:17:26.91

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
    2066944  consistent gets
     753812  physical reads
        116  redo size
      68506  bytes sent via SQL*Net to client
        842  bytes received via SQL*Net from client
         46  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
        661  rows processed

SQL> 

Statistics of Modified query:

SQL> set autotrace traceonly exp stat
SQL> @tunedquery

661 rows selected.

Elapsed: 00:00:17.35
Statistics
----------------------------------------------------------
          2  recursive calls
          0  db block gets
      22514  consistent gets
       2580  physical reads
          0  redo size
      68003  bytes sent via SQL*Net to client
        547  bytes received via SQL*Net from client
         46  SQL*Net roundtrips to/from client
          2  sorts (memory)
          0  sorts (disk)
        661  rows processed

SQL>

Instead of “2,066,944” consistent gets and “753,812” physical reads, it took merely “22,514” consistent gets and “2,580” physical reads. That’s 98% reduction in LIO’s and 99%+ reduction in PIO.

Again Analytics Functions is the winner over traditional query writing style.

5 comments:

Bruno said...

Nice one! I especially liked the trick of using decode to remove the zeroes from the count instead of the subquery predicate.

Though you're clearly avoiding many table accesses, you may consider adding the before/after execution plans -or better, the actual execution statistics ;)

Anonymous said...

Momen, I dont know anything analytic function. But, was having a doubt, did you flush the buffer cache in between.

-Nand

Asif Momen said...

Hi Nand,

No, I did not flush the cache and I cannot afford to, as it is a production database.

Anonymous said...

So, might be in the second run(with analytic function), it got the blocks from the buffer cache, instead of getting them from datafile, thus its faster. PLease clarify...am i missing something?

-Nand

Asif Momen said...

Nand,

True, but if you analyze the query, instead of accessing VOU_DET table only once to retrieve the required output, the Developer was accessing it four times. This is where the LIO's were incurred from.

I will try to post execution plans of both the queries, which would enlighten the difference between both the queries.

Asif Momen