Sunday, September 21, 2008

Analytic Functions: The most ignored ones

Dear Readers,

Analytic Functions were first introduced in Oracle 8i, way back in 1999. Tom Kyte’s book “Expert One-on-One Oracle” has a dedicated chapter on this topic. Not only Tom’s book, any book on SQL (I have seen so far) has a separate chapter written on Analytic Functions. Yet, the developer community seems to be paying least attention in trying to understand and use them. I think before educating students, tutors should be trained to emphasize the importance of Analytic Functions in day to day life of a developer.

Anyways, here’s a similar case on one of our very busy OLTP database.

This SQL statement topped in the AWR report consuming nearly 84% of database time and was executed more than 1200 times during a 45-minutes AWR report.

A query is required to fetch TECH_PAC_ID for a customer along with the number of records for that customer.

Here’s the original query, its Explain Plan and Statistics:

SQL> set autotrace on
SQL> SELECT Tech_PAC_ID, Users_USER_CD, TOT_PAC 
  2   FROM (SELECT Tech_PAC_ID, Users_USER_CD 
  3          FROM Tech, Users 
  4         WHERE Tech_STS = 'Y' AND Users_USER_CD = UPPER('ABX65842' )
  5           AND Users_USER_CD = Tech_ETI_USER_CD) DET, 
  6        (SELECT COUNT(*) TOT_PAC 
  7           FROM Tech, Users 
  8          WHERE Tech_STS = 'Y' AND Users_USER_CD = UPPER('ABX65842' ) 
  9            AND Users_USER_CD = Tech_ETI_USER_CD) TOT;

TECH_PAC_N USERS_USER_CD        TOT_PAC
---------- -------------------- ----------
236XXX123  ABX65842                     5

Elapsed: 00:00:00.06

Execution Plan
----------------------------------------------------------
Plan hash value: 3349818188

------------------------------------------------------------------------------------------------
| Id  | Operation                       | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                |              |     1 |    54 |  1280   (1)| 00:00:16 |
|   1 |  MERGE JOIN CARTESIAN           |              |     1 |    54 |  1280   (1)| 00:00:16 |
|*  2 |   TABLE ACCESS BY INDEX ROWID   | TECH         |     1 |    18 |     4   (0)| 00:00:01 |
|   3 |    NESTED LOOPS                 |              |     1 |    41 |   640   (1)| 00:00:08 |
|*  4 |     TABLE ACCESS FULL           | USERS        |     1 |    23 |   636   (1)| 00:00:08 |
|*  5 |     INDEX RANGE SCAN            | TECH_USER_CD |     3 |       |     1   (0)| 00:00:01 |
|   6 |   BUFFER SORT                   |              |     1 |    13 |  1276   (1)| 00:00:16 |
|   7 |    VIEW                         |              |     1 |    13 |   640   (1)| 00:00:08 |
|   8 |     SORT AGGREGATE              |              |     1 |   104 |            |          |
|*  9 |      TABLE ACCESS BY INDEX ROWID| TECH         |     1 |    12 |     4   (0)| 00:00:01 |
|  10 |       NESTED LOOPS              |              |     1 |   104 |   640   (1)| 00:00:08 |
|* 11 |        TABLE ACCESS FULL        | USERS        |     1 |    92 |   636   (1)| 00:00:08 |
|* 12 |        INDEX RANGE SCAN         | TECH_USER_CD |     3 |       |     1   (0)| 00:00:01 |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("TECH_STS"='Y')
   4 - filter("USERS_USER_CD"='ABX65842')
   5 - access("USERS_USER_CD"="Tech_ETI_USER_CD")
       filter("TECH_ETI_USER_CD" IS NOT NULL)
   9 - filter("TECH_STS"='Y')
  11 - filter("USERS_USER_CD"='ABX65842')
  12 - access("USERS_USER_CD"="Tech_ETI_USER_CD")
       filter("TECH_ETI_USER_CD" IS NOT NULL)


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       4671  consistent gets
          0  physical reads
          0  redo size
        558  bytes sent via SQL*Net to client
        381  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
          1  rows processed

SQL>

Instead of writing two queries to fetch TECH_PAC_ID and COUNT(USERS_USER_CD) we can achieve the same result by using Analytic Functions. The new query not only returns the required result but is also less resource intensive and more database-friendly.

Here goes the enhanced query using Analytic Function:

SQL> SELECT Tech_PAC_ID, Users_USER_CD, 
  2         count(1) over (partition by Users_USER_CD) TOT_PAC
  3          FROM Tech, Users 
  4         WHERE Tech_STS = 'Y' AND Users_USER_CD = UPPER('ABX65842' )
  5           AND Users_USER_CD = Tech_ETI_USER_CD
  6  GROUP BY Tech_PAC_ID, Users_USER_CD;

TECH_PAC_N USERS_USER_CD        TOT_PAC
---------- -------------------- ----------
236XXX123  ABX65842                     5

Elapsed: 00:00:00.04

Execution Plan
----------------------------------------------------------
Plan hash value: 1328229640

----------------------------------------------------------------------------------------------
| Id  | Operation                     | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |              |     1 |    41 |   641   (1)| 00:00:08 |
|   1 |  WINDOW BUFFER                |              |     1 |    41 |   641   (1)| 00:00:08 |
|   2 |   SORT GROUP BY               |              |     1 |    41 |   641   (1)| 00:00:08 |
|*  3 |    TABLE ACCESS BY INDEX ROWID| TECH         |     1 |    18 |     4   (0)| 00:00:01 |
|   4 |     NESTED LOOPS              |              |     1 |    41 |   640   (1)| 00:00:08 |
|*  5 |      TABLE ACCESS FULL        | USERS        |     1 |    23 |   636   (1)| 00:00:08 |
|*  6 |      INDEX RANGE SCAN         | TECH_USER_CD |     3 |       |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter("TECH_STS"='Y')
   5 - filter("USERS_USER_CD"='ABX65842')
   6 - access("USERS_USER_CD"="Tech_ETI_USER_CD")
       filter("TECH_ETI_USER_CD" IS NOT NULL)


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       2335  consistent gets
          0  physical reads
          0  redo size
        558  bytes sent via SQL*Net to client
        381  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          2  sorts (memory)
          0  sorts (disk)
          1  rows processed

SQL> 
SQL>

The new query seems to be doing a good job. The “consistent gets” have dropped from 4671 to 2335. That’s nearly half of the original, the reason being: instead of hitting the table twice, we are getting the work done in one hit. But still, the consistent gets seems to be reasonably high.

Although ROWS=1 is being shown in the Explain Plan for USERS table, but optimizer is spending most of its time doing a Full Table Scan at this step. Adding an index on USERS_USER_CD column of USERS table should do the trick.

After adding the index on USERS (USERS_USER_CD) column, the query seems to be flying.

SQL> SELECT Tech_PAC_ID, Users_USER_CD, 
  2         count(1) over (partition by Users_USER_CD) TOT_PAC
  3          FROM Tech, Users 
  4         WHERE Tech_STS = 'Y' AND Users_USER_CD = UPPER('ABX65842' )
  5           AND Users_USER_CD = Tech_ETI_USER_CD
  6  GROUP BY Tech_PAC_ID, Users_USER_CD;

TECH_PAC_N USERS_USER_CD        TOT_PAC
---------- -------------------- ----------
236XXX123  ABX65842                     5


Execution Plan
----------------------------------------------------------
Plan hash value: 1753916289

---------------------------------------------------------------------------------------------------------
| Id  | Operation                       | Name                  | Rows  | Bytes | Cost (%CPU)| Time  |
---------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                |                       |     1 |    33 |     7  (15)| 00:00:01 |
|   1 |  WINDOW BUFFER                  |                       |     1 |    33 |     7  (15)| 00:00:01 |
|   2 |   SORT GROUP BY                 |                       |     1 |    33 |     7  (15)| 00:00:01 |
|*  3 |    TABLE ACCESS BY INDEX ROWID  | TECH                  |     1 |    15 |     4   (0)| 00:00:01 |
|   4 |     NESTED LOOPS                |                       |     1 |    33 |     6   (0)| 00:00:01 |
|   5 |      TABLE ACCESS BY INDEX ROWID| USERS                 |     1 |    18 |     2   (0)| 00:00:01 |
|*  6 |       INDEX RANGE SCAN          | USERS_MUB_USER_CD_IDX |     1 |       |     1   (0)| 00:00:01 |
|*  7 |      INDEX RANGE SCAN           | TECH_USER_CD          |     3 |       |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter("TECH_STS"='Y')
   6 - filter("USERS_USER_CD"='ABX65842')
   7 - access("USERS_USER_CD"="Tech_ETI_USER_CD")
       filter("TECH_ETI_USER_CD" IS NOT NULL)


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
          6  consistent gets
          1  physical reads
          0  redo size
        558  bytes sent via SQL*Net to client
        399  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          2  sorts (memory)
          0  sorts (disk)
          1  rows processed

SQL>

This time merely 6 “consistent gets” were required. It’s a 99%+ reduction in overall consistent gets.

As I mentioned earlier in the post, I see SQL statements similar to this one over and over again. Analytic function is a nice alternative to the traditional way of writing these types of queries that works extremely well and provides the performance needed for high numbers of executions in a high data volume environment.

More information on Analytic Functions can (should) be obtained from here:

Oracle Database SQL Reference 10g Release 2

On Top-n and Pagination Queries by Tom Kyte

AskTom

One Analytic Function Can do More Than a 1000 Lines of Code by Alex Nuitjen

Happy reading.

2 comments:

mohamed said...

You know that analytical functions can not be used within PLSQL packages and procedures (at least for ORACLE 8.1.7.4).
Of course we can use dynamic sql in order to be able to use them within PLSQL.

However, doing as such, we will loose the kind of "Auto Bind variable" one can profit from when using static sql (PLSQL packages).
Could you please give me concrete situations where using analytical functions are useful.

In my actual work 90% of the oracle work is done via PLSQL.

Thanks

Asif Momen said...

Mohamed,

I suggest you to upgrade to Oracle 10g. Oracle 8i was de-supported way long time back and even Oracle 9i is no more a supported version.

Not only you will be able to use Analytic Functions in stored procedures, you will also be able to use hundreds of other new features added since then.