Monday, December 08, 2008

"Heh, heh, that"s a joke, son"

Well, the title of this post is the error message from "Oracle® Database Error Messages: 10g Release 2 (10.2)" document.

PLS-00101: reserved for future use

Cause: This error message is not used yet.(Heh, heh, that"s a joke, son.) Action: none

Although, it has been reserved for future use but somehow it got leaked into Oracle Database 10g documentation. However, this same error is missing from Oracle 11g Database Error Messages. I think its reserved to be used with Oracle 12g, who knows? Has anybody encountered this error ??

Sunday, November 23, 2008

Dynamic Sampling Myths Dispelled

There are couple of myths associated with Dynamic Sampling, I thought of eliminating them with the help of this post. Two myths associated with DS are:

i) Default number of blocks sampled are 32 in Oracle 10g and ii) Tables having less than 32 blocks are not considered by the optimizer for dynamic sampling.

Myth 1: Default number of blocks sampled are 32 in Oracle 10g

The default number of dynamic sampling blocks are 32. This is set using a hidden parameter “_optimizer_dyn_smp_blks” (ofcourse one should not change it). Here’s the query to see the default settings:


SQL> column "Parameter" format a30
SQL> column "Session Value" format a20
SQL> column "Instance Value" format a20
SQL>
SQL> select a.ksppinm  "Parameter", b.ksppstvl "Session Value", c.ksppstvl "Instance Value"
  2    from x$ksppi a, x$ksppcv b, x$ksppsv c
  3   where a.indx = b.indx and a.indx = c.indx
  4     and ksppinm= '_optimizer_dyn_smp_blks';

Parameter                      Session Value        Instance Value
------------------------------ -------------------- --------------------
_optimizer_dyn_smp_blks        32                   32

Starting with Oracle 10g, by default “optimizer_dynamic_sampling” initialization parameter is set to “2”, meaning optimizer will dynamically sample all unanalyzed tables. The number of blocks sampled is two times the default number of dynamic sampling blocks. So at this level, Optimizer will sample 64 blocks.

To verify how many blocks are sampled, we need to dig into the 10053 trace output. For the purpose of this test, I will create a test table and dump some data into it. Later enable 10053 trace, execute the query and disable the trace.


SQL> drop table test purge;

Table dropped.

SQL> create table test as select * from all_objects where 1=2;

Table created.

SQL> create index t_object_id on test(object_id);

Index created.

SQL> insert into test select * from all_objects;

35023 rows created.

SQL> commit;

Commit complete.

SQL> alter session set tracefile_identifier = 'blksize';

Session altered.

SQL> alter session set events '10053 trace name context forever, level 1';

Session altered.

SQL> select * from test where object_id = 123;

no rows selected

SQL> alter session set events '10053 trace name context off';

Session altered.

SQL> disc
Disconnected from Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
SQL>

Digging into the trace, we see the following:

** Executed dynamic sampling query:
    level : 2
    sample pct. : 12.701613
    actual sample size : 4097
    filtered sample card. : 0
    orig. card. : 82
    block cnt. table stat. : 496
    block cnt. for sampling: 496
    max. sample block cnt. : 64
    sample block cnt. : 63
    min. sel. est. : 0.01000000

Well, it’s clear for the trace that at level 2, the maximum number of blocks sampled (max. sample block cnt) are 64 and the actual number of blocks sampled (sample block cnt) are 63. (For some reason, Oracle always samples one block less)

So, this myth has been uncovered and we know how many blocks are sampled.

Myth 2: Tables having less than 32 blocks are not considered by the optimizer for dynamic sampling

It is widely understood that, tables having less than 32 blocks are ignored by the optimizer for dynamic sampling.

Let’s continue using the same table as in case 1. However, it contains data, so we will truncate it. After truncating, verify number of blocks allocated by querying USER_SEGMENTS view.


SQL> truncate table test;

Table truncated.

SQL> select blocks from user_segments where segment_name = 'TEST';

    BLOCKS
----------
         8
This table has only 8 blocks allocated. Let’s see whether optimizer considers dynamic sampling on this table by executing a simple query.
SQL> set autotrace traceonly exp
SQL> select * from test where object_id = 123;

Execution Plan
----------------------------------------------------------
Plan hash value: 1357081020

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |     1 |   128 |     2   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| TEST |     1 |   128 |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("OBJECT_ID"=123)

Note
-----
   - dynamic sampling used for this statement

SQL> set autotrace off
SQL>

Yes, Oracle optimizer did consider dynamic sampling although the number of blocks were less than 32.

So, in this post we were able to uncover myths associated with dynamic sampling through very simple and reproducible examples.

Wednesday, November 05, 2008

Analytic Functions: A Savior

Hi,

Yesterday, I have come across yet another performance issue. I received a complaint from one of our developers that they have a business report which is running extremely slow.

I received his mail and attached was the query:

SELECT   
  XLT.VOU_DET.VOU_DATE,
  XLT.VOU_DET.VALUE_DATE,
  XLT.VOU_DET.DESCRIPTION,
  XLT.VOU_DET.PIX_CODE,
  VOU_TYPE.DESCR,
  XLT.VOU_DET.PIX_BRN,
  XLT.VOU_DET.DR_BAL_ORIG,
  XLT.VOU_DET.CR_BAL_ORIG,
  XLT.VOU_DET.CLOSING_BAL_ORIG,
  XLT.VOU_LOOKUP.NAME,
  XLT.VOU_DET.VOU_SEQ,
  (SELECT TO_CHAR(X.OPEN_BAL_ORIG) 
     FROM XLT.VOU_DET X 
    WHERE X.ASOF_DATE BETWEEN to_date('01-05-2007', 'dd-mm-yyyy') 
                          AND to_date('30-09-2008', 'dd-mm-yyyy')
      AND X.VOU_CODE  = '9900016WXYRT01'
      AND X.VOU_SEQ = 1
      AND ROWNUM = 1) OPEN_BAL_ORIG,
  (SELECT count(  XLT.VOU_DET.DR_BAL_ORIG) 
     FROM (select DR_BAL_ORIG 
              From XLT.VOU_DET X 
             WHERE X.ASOF_DATE BETWEEN to_date('01-05-2007', 'dd-mm-yyyy') 
                                   AND to_date('30-09-2008', 'dd-mm-yyyy')
               AND X.VOU_CODE  = '9900016WXYRT01'
               AND X.DR_BAL_ORIG <>0)) DR_BAL_ORIG_CNT,
  (SELECT count(  XLT.VOU_DET.CR_BAL_ORIG) 
     FROM (select CR_BAL_ORIG 
             From XLT.VOU_DET X 
            WHERE X.ASOF_DATE BETWEEN to_date('01-05-2007', 'dd-mm-yyyy') 
                                  AND to_date('30-09-2008', 'dd-mm-yyyy')
              AND X.VOU_CODE  = '9900016WXYRT01'
              AND X.CR_BAL_ORIG <>0)) CR_BAL_ORIG_CNT 
FROM
  XLT.VOU_DET,
  XLT.X_VOU_TYPE  VOU_TYPE,
  XLT.VOU_LOOKUP
WHERE XLT.VOU_DET.VOU_TYPE_ID=VOU_TYPE.VOU_TYPE_ID
  AND XLT.VOU_DET.VOU_REF(+)=XLT.VOU_LOOKUP.CUST_CODE_WNG
  AND XLT.VOU_DET.ASOF_DATE  BETWEEN  to_date('01-05-2007', 'dd-mm-yyyy') AND to_date('30-09-2008', 'dd-mm-yyyy')
  AND XLT.VOU_DET.VOU_CODE  =  '9900016WXYRT01'
ORDER BY
  XLT.VOU_DET.VOU_SEQ,
  XLT.VOU_DET.ASOF_DATE;

This looks like a case of missing Analytic functions.

The three SELECT statements within the query can be easily replaced with simple Analytic functions. The first one is to fetch the opening balance:

  (SELECT TO_CHAR(X.OPEN_BAL_ORIG) 
     FROM XLT.VOU_DET X 
    WHERE X.ASOF_DATE BETWEEN to_date('01-05-2007', 'dd-mm-yyyy') 
                          AND to_date('30-09-2008', 'dd-mm-yyyy')
      AND X.VOU_CODE  = '9900016WXYRT01'
      AND X.VOU_SEQ = 1
      AND ROWNUM = 1) OPEN_BAL_ORIG,

and could be easily rewritten as:

first_value(OPEN_BAL_ORIG) 
      over (partition by VOU_CODE 
                Order by ASOF_DATE, decode(VOU_SEQ, 0, 99, VOU_SEQ)) VOU_SEQ,

The second and third are the number of Debit and Credit transactions respectively and can also be written as:

  count(decode(DR_BAL_ORIG, null, null, 0, null, 1)) 
      over (partition by  VOU_CODE) DR_BAL_ORIG_cnt,

and

  count(decode(CR_BAL_ORIG, null, null, 0, null, 1)) 
      over (partition by  VOU_CODE) CR_BAL_ORIG_cnt

Now, its time to execute the modified query and (1) compare the result set, (2) measure time taken, (3) and compare statistics with the original query.

Well, the result set was compared and was convincing. The enhanced query completes in less than 20 seconds while the original took just over 17 minutes. So, the new query is 173 times faster than the original one.

Lastly, let’s compare statistics.

Statistics of Original query:

SQL> set autotrace traceonly stat
SQL> @tuneme

661 rows selected.

Elapsed: 00:17:26.91

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
    2066944  consistent gets
     753812  physical reads
        116  redo size
      68506  bytes sent via SQL*Net to client
        842  bytes received via SQL*Net from client
         46  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
        661  rows processed

SQL> 

Statistics of Modified query:

SQL> set autotrace traceonly exp stat
SQL> @tunedquery

661 rows selected.

Elapsed: 00:00:17.35
Statistics
----------------------------------------------------------
          2  recursive calls
          0  db block gets
      22514  consistent gets
       2580  physical reads
          0  redo size
      68003  bytes sent via SQL*Net to client
        547  bytes received via SQL*Net from client
         46  SQL*Net roundtrips to/from client
          2  sorts (memory)
          0  sorts (disk)
        661  rows processed

SQL>

Instead of “2,066,944” consistent gets and “753,812” physical reads, it took merely “22,514” consistent gets and “2,580” physical reads. That’s 98% reduction in LIO’s and 99%+ reduction in PIO.

Again Analytics Functions is the winner over traditional query writing style.

Monday, November 03, 2008

How to read Data between different Characterset Databases - Part 2

Hazem Ameen has left a comment on my previous post, where he says he is able to read/write data over a database link between two databases having different charactersets.

We have your same environment and reading/writing over db link is fine. Oracle does character-set conversion between the 2 Arabic character-set as expected. Client NLS_LANG: America_America.ar8mswin1256 Server1: ar8mswin1256 Server 2: ar8iso8859p6 I can only suspect your client NLS_LANG Our version is 10.2.0.4 on Linux. Client is MS Windows Vista 10.2.0.4

reading/writing over db link is fine

What I would assert from this is, although you have databases with two different charactersets, your client characterset is SAME. This is the reason you are able to read/write over the db link. But, in our case, clients connecting to database A have same characterset as A, while clients connecting to database B have same characterset as B. Here's a test case with my client characterset set to "AR8MSWIN1256".

Create tables in both the databases:

Create database links in both databases and query data from other side:

In the above example, we were able to read data over db links meaning client characterset's are playing the vital role.

This means, all your client's characterset is set to "AR8MSWIN1256" for both the types of databases. This is the reason you are able to read/write over a db link.

I hope I was clear in my examples. Thanks for leaving your comment.

Regards

Regards Asif Momen

Tuesday, October 14, 2008

How to read Data between different Characterset Databases

We have two 24x7 production databases with different charactersets. Yes, both charactersets are related to Arabic language. Database A’s characterset is AR8MSWIN1256 while database B’s characterset is AR8ISO8859P6.

We had a requirement wherein database A will read Arabic data from database B. If you create a database link and try to query data from database B it will appear to be garbage, as shown below:

whereas when you run the same query on database B, this is how the output looks:

Even you are unaware of Arabic language, you may visually compare and see the output is a garbage when queried from database A.

DUMP function comes handy under these situations. It returns internal representation of an expression. When used to against the “name” failed, I get this output.

SQL> select dump(name) from test_iso@testdb;

DUMP(NAME)
--------------------------------------------------------------------
Typ=1 Len=13: 227,205,227,207,32,194,213,221,32,227,196,227,228

where, “Typ” refers to datatype, “Len” means the length of the string, and everything after “:” is comma separated ASCII values of all the characters in the “name” field.

I wrote a small procedure where convert all the comma separated ASCII values to character string.

create or replace function read_name (p_id in number) return varchar2 is
  t_str varchar2(2000);
  l_name varchar2(2000);
  l_output varchar2(2000);
begin
  select 'select '||'chr('||replace(substr(dump(name, 1010), instr(dump(name, 1010), ':')+2), ',', ')|| chr(')||') from dual' 
    into t_str from test_iso@testdb
   where id = p_id;
 
--  dbms_output.put_line(t_str);
  execute immediate t_str into l_output;  
  return l_output;
end;
/

Now upon using this function, I am able to query remote table in a readable format.

Here’s the output:

Sunday, September 21, 2008

Analytic Functions: The most ignored ones

Dear Readers,

Analytic Functions were first introduced in Oracle 8i, way back in 1999. Tom Kyte’s book “Expert One-on-One Oracle” has a dedicated chapter on this topic. Not only Tom’s book, any book on SQL (I have seen so far) has a separate chapter written on Analytic Functions. Yet, the developer community seems to be paying least attention in trying to understand and use them. I think before educating students, tutors should be trained to emphasize the importance of Analytic Functions in day to day life of a developer.

Anyways, here’s a similar case on one of our very busy OLTP database.

This SQL statement topped in the AWR report consuming nearly 84% of database time and was executed more than 1200 times during a 45-minutes AWR report.

A query is required to fetch TECH_PAC_ID for a customer along with the number of records for that customer.

Here’s the original query, its Explain Plan and Statistics:

SQL> set autotrace on
SQL> SELECT Tech_PAC_ID, Users_USER_CD, TOT_PAC 
  2   FROM (SELECT Tech_PAC_ID, Users_USER_CD 
  3          FROM Tech, Users 
  4         WHERE Tech_STS = 'Y' AND Users_USER_CD = UPPER('ABX65842' )
  5           AND Users_USER_CD = Tech_ETI_USER_CD) DET, 
  6        (SELECT COUNT(*) TOT_PAC 
  7           FROM Tech, Users 
  8          WHERE Tech_STS = 'Y' AND Users_USER_CD = UPPER('ABX65842' ) 
  9            AND Users_USER_CD = Tech_ETI_USER_CD) TOT;

TECH_PAC_N USERS_USER_CD        TOT_PAC
---------- -------------------- ----------
236XXX123  ABX65842                     5

Elapsed: 00:00:00.06

Execution Plan
----------------------------------------------------------
Plan hash value: 3349818188

------------------------------------------------------------------------------------------------
| Id  | Operation                       | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                |              |     1 |    54 |  1280   (1)| 00:00:16 |
|   1 |  MERGE JOIN CARTESIAN           |              |     1 |    54 |  1280   (1)| 00:00:16 |
|*  2 |   TABLE ACCESS BY INDEX ROWID   | TECH         |     1 |    18 |     4   (0)| 00:00:01 |
|   3 |    NESTED LOOPS                 |              |     1 |    41 |   640   (1)| 00:00:08 |
|*  4 |     TABLE ACCESS FULL           | USERS        |     1 |    23 |   636   (1)| 00:00:08 |
|*  5 |     INDEX RANGE SCAN            | TECH_USER_CD |     3 |       |     1   (0)| 00:00:01 |
|   6 |   BUFFER SORT                   |              |     1 |    13 |  1276   (1)| 00:00:16 |
|   7 |    VIEW                         |              |     1 |    13 |   640   (1)| 00:00:08 |
|   8 |     SORT AGGREGATE              |              |     1 |   104 |            |          |
|*  9 |      TABLE ACCESS BY INDEX ROWID| TECH         |     1 |    12 |     4   (0)| 00:00:01 |
|  10 |       NESTED LOOPS              |              |     1 |   104 |   640   (1)| 00:00:08 |
|* 11 |        TABLE ACCESS FULL        | USERS        |     1 |    92 |   636   (1)| 00:00:08 |
|* 12 |        INDEX RANGE SCAN         | TECH_USER_CD |     3 |       |     1   (0)| 00:00:01 |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("TECH_STS"='Y')
   4 - filter("USERS_USER_CD"='ABX65842')
   5 - access("USERS_USER_CD"="Tech_ETI_USER_CD")
       filter("TECH_ETI_USER_CD" IS NOT NULL)
   9 - filter("TECH_STS"='Y')
  11 - filter("USERS_USER_CD"='ABX65842')
  12 - access("USERS_USER_CD"="Tech_ETI_USER_CD")
       filter("TECH_ETI_USER_CD" IS NOT NULL)


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       4671  consistent gets
          0  physical reads
          0  redo size
        558  bytes sent via SQL*Net to client
        381  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
          1  rows processed

SQL>

Instead of writing two queries to fetch TECH_PAC_ID and COUNT(USERS_USER_CD) we can achieve the same result by using Analytic Functions. The new query not only returns the required result but is also less resource intensive and more database-friendly.

Here goes the enhanced query using Analytic Function:

SQL> SELECT Tech_PAC_ID, Users_USER_CD, 
  2         count(1) over (partition by Users_USER_CD) TOT_PAC
  3          FROM Tech, Users 
  4         WHERE Tech_STS = 'Y' AND Users_USER_CD = UPPER('ABX65842' )
  5           AND Users_USER_CD = Tech_ETI_USER_CD
  6  GROUP BY Tech_PAC_ID, Users_USER_CD;

TECH_PAC_N USERS_USER_CD        TOT_PAC
---------- -------------------- ----------
236XXX123  ABX65842                     5

Elapsed: 00:00:00.04

Execution Plan
----------------------------------------------------------
Plan hash value: 1328229640

----------------------------------------------------------------------------------------------
| Id  | Operation                     | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |              |     1 |    41 |   641   (1)| 00:00:08 |
|   1 |  WINDOW BUFFER                |              |     1 |    41 |   641   (1)| 00:00:08 |
|   2 |   SORT GROUP BY               |              |     1 |    41 |   641   (1)| 00:00:08 |
|*  3 |    TABLE ACCESS BY INDEX ROWID| TECH         |     1 |    18 |     4   (0)| 00:00:01 |
|   4 |     NESTED LOOPS              |              |     1 |    41 |   640   (1)| 00:00:08 |
|*  5 |      TABLE ACCESS FULL        | USERS        |     1 |    23 |   636   (1)| 00:00:08 |
|*  6 |      INDEX RANGE SCAN         | TECH_USER_CD |     3 |       |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter("TECH_STS"='Y')
   5 - filter("USERS_USER_CD"='ABX65842')
   6 - access("USERS_USER_CD"="Tech_ETI_USER_CD")
       filter("TECH_ETI_USER_CD" IS NOT NULL)


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       2335  consistent gets
          0  physical reads
          0  redo size
        558  bytes sent via SQL*Net to client
        381  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          2  sorts (memory)
          0  sorts (disk)
          1  rows processed

SQL> 
SQL>

The new query seems to be doing a good job. The “consistent gets” have dropped from 4671 to 2335. That’s nearly half of the original, the reason being: instead of hitting the table twice, we are getting the work done in one hit. But still, the consistent gets seems to be reasonably high.

Although ROWS=1 is being shown in the Explain Plan for USERS table, but optimizer is spending most of its time doing a Full Table Scan at this step. Adding an index on USERS_USER_CD column of USERS table should do the trick.

After adding the index on USERS (USERS_USER_CD) column, the query seems to be flying.

SQL> SELECT Tech_PAC_ID, Users_USER_CD, 
  2         count(1) over (partition by Users_USER_CD) TOT_PAC
  3          FROM Tech, Users 
  4         WHERE Tech_STS = 'Y' AND Users_USER_CD = UPPER('ABX65842' )
  5           AND Users_USER_CD = Tech_ETI_USER_CD
  6  GROUP BY Tech_PAC_ID, Users_USER_CD;

TECH_PAC_N USERS_USER_CD        TOT_PAC
---------- -------------------- ----------
236XXX123  ABX65842                     5


Execution Plan
----------------------------------------------------------
Plan hash value: 1753916289

---------------------------------------------------------------------------------------------------------
| Id  | Operation                       | Name                  | Rows  | Bytes | Cost (%CPU)| Time  |
---------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                |                       |     1 |    33 |     7  (15)| 00:00:01 |
|   1 |  WINDOW BUFFER                  |                       |     1 |    33 |     7  (15)| 00:00:01 |
|   2 |   SORT GROUP BY                 |                       |     1 |    33 |     7  (15)| 00:00:01 |
|*  3 |    TABLE ACCESS BY INDEX ROWID  | TECH                  |     1 |    15 |     4   (0)| 00:00:01 |
|   4 |     NESTED LOOPS                |                       |     1 |    33 |     6   (0)| 00:00:01 |
|   5 |      TABLE ACCESS BY INDEX ROWID| USERS                 |     1 |    18 |     2   (0)| 00:00:01 |
|*  6 |       INDEX RANGE SCAN          | USERS_MUB_USER_CD_IDX |     1 |       |     1   (0)| 00:00:01 |
|*  7 |      INDEX RANGE SCAN           | TECH_USER_CD          |     3 |       |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter("TECH_STS"='Y')
   6 - filter("USERS_USER_CD"='ABX65842')
   7 - access("USERS_USER_CD"="Tech_ETI_USER_CD")
       filter("TECH_ETI_USER_CD" IS NOT NULL)


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
          6  consistent gets
          1  physical reads
          0  redo size
        558  bytes sent via SQL*Net to client
        399  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          2  sorts (memory)
          0  sorts (disk)
          1  rows processed

SQL>

This time merely 6 “consistent gets” were required. It’s a 99%+ reduction in overall consistent gets.

As I mentioned earlier in the post, I see SQL statements similar to this one over and over again. Analytic function is a nice alternative to the traditional way of writing these types of queries that works extremely well and provides the performance needed for high numbers of executions in a high data volume environment.

More information on Analytic Functions can (should) be obtained from here:

Oracle Database SQL Reference 10g Release 2

On Top-n and Pagination Queries by Tom Kyte

AskTom

One Analytic Function Can do More Than a 1000 Lines of Code by Alex Nuitjen

Happy reading.

Saturday, September 20, 2008

Syntax Highlighter

Dear Readers,

Most of you might have experienced difficulies in placing code snippets in a nice formatted way on Blogger.com. I have been previously using Greg Houston's Code formatter, but when you have pipes ("|") in the code like in Explain Plan, then Blogger fails to display the code correctly.

But, recently I stumbled on one of the Karen Morton's post where she discusses a way to format your code snippets on blogger. Here are more details on highlightling your code in blogger.com using Syntax Highlighter.

The code below has been formatted using SyntaxHighlighter:


SQL> conn scott/tiger
Connected.
SQL> select * from tab;

TNAME                          TABTYPE  CLUSTERID
------------------------------ ------- ----------
DEPT                           TABLE
EMP                            TABLE
BONUS                          TABLE
SALGRADE                       TABLE
TEST_RUN                       TABLE
PLAN_TABLE                     TABLE
EMP_VIEW                       VIEW
EMP2                           TABLE

8 rows selected.

SQL>

More interesting stuff can by found on Fahd Shariff's blog and Morten Lyhr's blog.

Happy formating !!!

Saturday, August 02, 2008

Dynamic Sampling and Table Partitions

Dynamic Sampling was first introduced in Oracle 9i Release 2 and the purpose is to improve query performance by determining more accurate estimates for predicate selectivity and statistics for tables and indexes.

Dynamic sampling is to be used predominantly with un-analyzed tables. Before hard parsing a query Oracle determines if dynamic sampling would improve query performance. If so, then the optimizer issues recursive SQL statements to estimate the necessary statistics dynamically.

Here’s an example of dynamic sampling:

Operating System: Windows XP Database: Oracle 10g Release 2 (10.2.0.4)

  
SQL> create table t (a number);

Table created.

SQL> set autotrace traceonly exp
SQL> select * from t;

Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |     1 |    13 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS FULL| T    |     1 |    13 |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------

Note
-----
   - dynamic sampling used for this statement

SQL> set autotrace off

Its clear from the above Execution Plan that dynamic sampling was performed.

But it seems dynamic sampling is not partition friendly or rather dynamic sampling is partition unaware. What I mean is if you have a partitioned table and at least one partition is analyzed then dynamic sampling is NOT performed on any of the un-analyzed partitions. This is because Optimizer looks at columns like NUM_ROWS of DBA_TABLES view and considers that the table is analyzed.

Let us create a partitioned table with a local index.

  
SQL> create table part(
  2         sno     number,
  3         dt      date,
  4              remark     varchar2(30))
  5      partition by range(dt) (
  6      partition p1 values less than (to_date('01-02-2008', 'dd-mm-yyyy')),
  7      partition p2 values less than (to_date('01-03-2008', 'dd-mm-yyyy')),
  8      partition p3 values less than (to_date('01-04-2008', 'dd-mm-yyyy')),
  9      partition p4 values less than (to_date('01-05-2008', 'dd-mm-yyyy'))
 10     );

Table created.

SQL>
SQL> create index part_idx on part(dt, sno) local;

Index created.

SQL>

Now, populate all but the last partition with test data.

SQL> insert into part
  2      select level,
  3             to_date('01-01-2008', 'dd-mm-yyyy') + level - 1 ,
  4             'test'
  5        from dual
  6      connect by level <=90;

90 rows created.

SQL>
SQL> commit;

Commit complete.

SQL>

Let us now collect statistics for partitions P1, P2, and P3.

SQL> exec dbms_stats.gather_table_stats(user, tabname => 'part', -
>       partname => 'p1', estimate_percent => null, -
>       method_opt => 'for all columns size 1', cascade => true);

PL/SQL procedure successfully completed.

SQL>
SQL>
SQL> exec dbms_stats.gather_table_stats(user, tabname => 'part', -
>       partname => 'p2', estimate_percent => null, -
>       method_opt => 'for all columns size 1', cascade => true);

PL/SQL procedure successfully completed.

SQL>
SQL>
SQL> exec dbms_stats.gather_table_stats(user, tabname => 'part', -
>       partname => 'p3', estimate_percent => null, -
>       method_opt => 'for all columns size 1', cascade => true);

PL/SQL procedure successfully completed.

SQL>
SQL> select table_name, num_rows
  2    from user_tables
  3   where table_name ='PART';

TABLE_NAME                       NUM_ROWS
------------------------------ ----------
PART                                   90

SQL>
SQL>
SQL> select partition_name, num_rows
  2    from user_tab_partitions
  3   where table_name = 'PART';

PARTITION_NAME                   NUM_ROWS
------------------------------ ----------
P1                                     31
P2                                     29
P3                                     30
P4

SQL>
SQL>
SQL> select partition_name, num_rows
  2    from user_ind_partitions
  3   where index_name = 'PART_IDX';

PARTITION_NAME                   NUM_ROWS
------------------------------ ----------
P1                                     31
P2                                     29
P3                                     30
P4

SQL>

It’s time to test whether optimizer considers dynamic sampling or not.

SQL> set autotrace traceonly exp
SQL>
SQL> select *
  2    from part
  3   where dt = to_date('01-04-2008', 'dd-mm-yyyy')
  4     and sno = 1;

Execution Plan
----------------------------------------------------------
Plan hash value: 2633488982

-----------------------------------------------------------------------------------------------
| Id  | Operation              | Name | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
-----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |      |     1 |    16 |     2   (0)| 00:00:01 |       |       |
|   1 |  PARTITION RANGE SINGLE|      |     1 |    16 |     2   (0)| 00:00:01 |     4 |     4 |
|*  2 |   TABLE ACCESS FULL    | PART |     1 |    16 |     2   (0)| 00:00:01 |     4 |     4 |
-----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("DT"=TO_DATE(' 2008-04-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
              "SNO"=1)

SQL>
SQL> set autotrace off
SQL>

No! Optimizer thinks that it has enough information available at hand to with it rather than sampling dynamically.

Now, let us populate the unanalyzed partition with data and re-run the same query.

SQL> insert into part select 1, to_date('01-04-2008', 'dd-mm-yyyy'), 'test'
  2        from dual connect by level <=1000;

1000 rows created.

SQL>
SQL> commit;

Commit complete.

SQL>
SQL> set autotrace traceonly exp
SQL>
SQL>
SQL> select * from part
  2   where dt = to_date('01-04-2008', 'dd-mm-yyyy')
  3     and sno = 1;

Execution Plan
----------------------------------------------------------
Plan hash value: 2633488982

-----------------------------------------------------------------------------------------------
| Id  | Operation              | Name | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
-----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |      |     1 |    16 |     2   (0)| 00:00:01 |       |       |
|   1 |  PARTITION RANGE SINGLE|      |     1 |    16 |     2   (0)| 00:00:01 |     4 |     4 |
|*  2 |   TABLE ACCESS FULL    | PART |     1 |    16 |     2   (0)| 00:00:01 |     4 |     4 |
-----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("DT"=TO_DATE(' 2008-04-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
              "SNO"=1)

SQL>
SQL> set autotrace off

Well, the optimizer took the same plan because statistics were not up to date. Consider gathering statistics on the partition in concern and execute the same query again.

 
SQL> exec dbms_stats.gather_table_stats(user, tabname => 'part', -
>       partname => 'p4', estimate_percent => null, -
>       method_opt => 'for all columns size 1', cascade => true);

PL/SQL procedure successfully completed.

SQL>
SQL> set autotrace traceonly exp
SQL>
SQL>
SQL> select * from part
  2   where dt = to_date('01-04-2008', 'dd-mm-yyyy')
  3     and sno = 1;

Execution Plan
----------------------------------------------------------
Plan hash value: 2633488982

-----------------------------------------------------------------------------------------------
| Id  | Operation              | Name | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
-----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |      |  1000 | 16000 |     3   (0)| 00:00:01 |       |       |
|   1 |  PARTITION RANGE SINGLE|      |  1000 | 16000 |     3   (0)| 00:00:01 |     4 |     4 |
|*  2 |   TABLE ACCESS FULL    | PART |  1000 | 16000 |     3   (0)| 00:00:01 |     4 |     4 |
-----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("DT"=TO_DATE(' 2008-04-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
              "SNO"=1)

SQL>
SQL> set autotrace off
SQL>

Well this time optimizer had updated statistics which gives a better and clear picture to pick the most appropriate access path. So, optimizer thinks “TABLE ACCESS FULL” on partition P4 is appropriate and which makes sense too.

I have purposely populated partition P4 with duplicate data so that we see a difference in optimizer access path.

Finally, the conclusions are:

• Dynamic sampling ignores partition level statistical information. • Always update statistics after data load process, or assign default statistics to unanalyzed partition. Happy reading.

Friday, August 01, 2008

Bugs in REMAINDER Function (Updated)

Dear all,

Regarding Bugs in REMAINDER Function, I have opened a call with Oracle Support and they say "REMAINDER function works as intended and this is a documentation issue which requires minor clarification".

Oracle documentation has modified the text as follows:

"If n1 != 0, then the remainder is n2 - (n1*N) where N is the integer nearest n2/n1. If n2/n1 equals x.5, then N is the nearest even integer."

A bug (7236077) has also been logged.

Regards

Sunday, July 27, 2008

Funny “ORA-00942: table or view does not exist” Error Message

Dear all,

We all know when “ORA-00942” error message is returned. For instance, if I try to query a non-existent table then Oracle joyfully returns this errors message.

  
SQL> select * from funny;
select * from funny
              *
ERROR at line 1:
ORA-00942: table or view does not exist

“ORA-00942” error message is thrown on our face whenever we try to perform any action like CREATE, ALTER, DROP, INSERT, UPDATE, DELETE ... against a non-existent table. But this error is sometimes misleading, like the one below:

  
SQL> create table t(a number);

Table created.

SQL> create view v as select * from t;

View created.

SQL> drop table v;
drop table v
           *
ERROR at line 1:
ORA-00942: table or view does not exist

Error message reports “table or view does not exist”, but definitely a view named “V” exists in the same schema.

Wouldn’t it be more appropriate if Oracle reports “table does not exist”?

Thursday, July 10, 2008

MIN and MAX Functions in a Single Query are Disastrous

Dear Readers,

I would like to discuss a very interesting point about indexes in this post. When we are interested in finding out the minimum value of an indexed column, instead of reading entire table or the index, Oracle intelligently uses the index to navigate to the first index leaf block (leftmost index block) and quickly finds the minimum value of an indexed column.

A simple demo proves this:

SQL> select * from v$version;

BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - Prod
PL/SQL Release 10.2.0.4.0 - Production
CORE    10.2.0.4.0      Production
TNS for 32-bit Windows: Version 10.2.0.4.0 - Production
NLSRTL Version 10.2.0.4.0 - Production

SQL>
SQL> drop table t purge;

Table dropped.

SQL>
SQL>
SQL> create table t as select level sno, 'name ' || level name 
  2  from dual connect by level <= 10000000;

Table created.

SQL>

SQL> create unique index t_idx on t(sno);

Index created.

SQL> exec dbms_stats.gather_table_stats(user, 't');

PL/SQL procedure successfully completed.

SQL> set autotrace on
SQL>
SQL> select min(sno) from t;

  MIN(SNO)
----------
         1


Execution Plan
----------------------------------------------------------
Plan hash value: 2683064407

------------------------------------------------------------------------------------
| Id  | Operation                  | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |       |     1 |     6 |  9034   (2)| 00:01:49 |
|   1 |  SORT AGGREGATE            |       |     1 |     6 |            |          |
|   2 |   INDEX FULL SCAN (MIN/MAX)| T_IDX |    10M|    57M|            |          |
------------------------------------------------------------------------------------


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
          3  consistent gets
          1  physical reads
          0  redo size
        411  bytes sent via SQL*Net to client
        399  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

SQL>
SQL> set autotrace off
SQL>

Although this table contains 10 Million rows, Oracle required only 3 consistent gets to fetch the minimum value. Superb !!!

Similarly, when finding out the maximum value, Oracle reads the last block on the right-hand side of the index structure.

SQL> set autotrace on
SQL> select max(sno) from t;

  MAX(SNO)
----------
  10000000


Execution Plan
----------------------------------------------------------
Plan hash value: 2683064407

------------------------------------------------------------------------------------
| Id  | Operation                  | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |       |     1 |     6 |  9034   (2)| 00:01:49 |
|   1 |  SORT AGGREGATE            |       |     1 |     6 |            |          |
|   2 |   INDEX FULL SCAN (MIN/MAX)| T_IDX |    10M|    57M|            |          |
------------------------------------------------------------------------------------


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
          3  consistent gets
          2  physical reads
          0  redo size
        411  bytes sent via SQL*Net to client
        399  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

SQL> set autotrace off
SQL>

Once again only 3 consistent gets were required to fetch the maximum value.

But things get messy when you use both MIN and MAX functions in the same query. Instead of using same "INDEX FULL SCAN (MIN/MAX)" path to read the left-most block and right-most block to arrive at the minimum and maximum values, Oracle goes with FULL TABLE SCAN. A Full Table Scan on 10 Million rows !!!

SQL> set autotrace on
SQL> select min(sno), max(sno) from t;

  MIN(SNO)   MAX(SNO)
---------- ----------
         1   10000000


Execution Plan
----------------------------------------------------------
Plan hash value: 2966233522

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     6 |  9034   (2)| 00:01:49 |
|   1 |  SORT AGGREGATE    |      |     1 |     6 |            |          |
|   2 |   TABLE ACCESS FULL| T    |    10M|    57M|  9034   (2)| 00:01:49 |
---------------------------------------------------------------------------


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
      32928  consistent gets
      11391  physical reads
          0  redo size
        472  bytes sent via SQL*Net to client
        399  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

SQL> set autotrace off
SQL>

From this terrible behavior of Oracle Optimizer, What I infer and suggest is:

"Write separate queries to fetch MIN and MAX values instead of combining them into one query".

Saturday, July 05, 2008

Bugs in REMAINDER Function

While browsing through SQL Reference documentation (Oracle 10g Release 2), I stopped at REMAINDER function. The REMAINDER returns the remainder of n2 divided by n1 and internally uses ROUND function to arrive at the remainder value.

Oracle documentation also says, "If n1 != 0, then the remainder is n2 - (n1*N) where N is the integer nearest n2/n1."

SQL> Select REMAINDER(501, 5) from dual;

REMAINDER(501,5)
----------------
               1

SQL>

But, because of an internal bug, REMAINDER function returns incorrect results when "n2/n1" results in 0.5, 1.5, 2.5, .....
SQL> Select REMAINDER(5, 2) from dual;

REMAINDER(5,2)
--------------
             1

SQL>

According to the formula,

n2 - (n1 * N) = 5 - (2 * ROUND(5/2)) = 5 - (2 * 3) = 5 - 6 = -1

the value returned should be "-1" where as the REMAINDER function returns "+1".

My observation is that, for example "1.5" is getting rounded to "1" instead of "2" and is spoiling the outcome of REMAINDER.

I wrote a small PL/SQL block to screen my observation:
SQL> set linesize 1000
SQL> set pages     100
SQL> set serveroutput on
SQL>
SQL>    declare
  2       n1  number;
  3       n2  number;
  4       N   number;
  5
  6       sys_rem number;
  7       my_rem number;
  8     begin
  9       for n1 in 1..5 loop
 10        --for n2 in 1..50 loop
 11        n2 := 1;
 12        while n2 <= 20 loop
 13          Sys_rem := REMAINDER(n2, n1);
 14          N := Round(n2/n1);
 15          My_rem := n2 - (n1 * N);
 16          If sys_rem <> my_rem then
 17            Dbms_output.put_line('n2 : ' || rpad(to_char(n2), 4, ' ') ||
 18                                 '  n1 : ' || to_char(n1) ||
 19                                 '   ' ||
 20                                 ' n2/n1 = ' || rpad(to_char(n2/n1), 4, ' ') ||
 21                                 '     ' ||
 22                                 ' ORCL rem = ' || rpad(to_char(sys_rem), 4, ' ') ||
 23                                 '     ' ||
 24                                 ' Calc Rem [' || rpad(to_char(n2), 4, ' ') ||
 25                                 ' - (' || to_char(n1) || '*' ||
 26                                 rpad(to_char(N), 4, ' ') || ')] = '||
 27                                 to_char(my_rem));
 28          End If;
 29          n2 := n2 + 0.1;
 30        end loop;
 31      end loop;
 32    end;
 33    /
n2 : 2.5   n1 : 1    n2/n1 = 2.5       ORCL rem = .5        Calc Rem [2.5  - (1*3   )] = -.5
n2 : 4.5   n1 : 1    n2/n1 = 4.5       ORCL rem = .5        Calc Rem [4.5  - (1*5   )] = -.5
n2 : 6.5   n1 : 1    n2/n1 = 6.5       ORCL rem = .5        Calc Rem [6.5  - (1*7   )] = -.5
n2 : 8.5   n1 : 1    n2/n1 = 8.5       ORCL rem = .5        Calc Rem [8.5  - (1*9   )] = -.5
n2 : 10.5  n1 : 1    n2/n1 = 10.5      ORCL rem = .5        Calc Rem [10.5 - (1*11  )] = -.5
n2 : 12.5  n1 : 1    n2/n1 = 12.5      ORCL rem = .5        Calc Rem [12.5 - (1*13  )] = -.5
n2 : 14.5  n1 : 1    n2/n1 = 14.5      ORCL rem = .5        Calc Rem [14.5 - (1*15  )] = -.5
n2 : 16.5  n1 : 1    n2/n1 = 16.5      ORCL rem = .5        Calc Rem [16.5 - (1*17  )] = -.5
n2 : 18.5  n1 : 1    n2/n1 = 18.5      ORCL rem = .5        Calc Rem [18.5 - (1*19  )] = -.5
n2 : 1     n1 : 2    n2/n1 = .5        ORCL rem = 1         Calc Rem [1    - (2*1   )] = -1
n2 : 5     n1 : 2    n2/n1 = 2.5       ORCL rem = 1         Calc Rem [5    - (2*3   )] = -1
n2 : 9     n1 : 2    n2/n1 = 4.5       ORCL rem = 1         Calc Rem [9    - (2*5   )] = -1
n2 : 13    n1 : 2    n2/n1 = 6.5       ORCL rem = 1         Calc Rem [13   - (2*7   )] = -1
n2 : 17    n1 : 2    n2/n1 = 8.5       ORCL rem = 1         Calc Rem [17   - (2*9   )] = -1
n2 : 1.5   n1 : 3    n2/n1 = .5        ORCL rem = 1.5       Calc Rem [1.5  - (3*1   )] = -1.5
n2 : 7.5   n1 : 3    n2/n1 = 2.5       ORCL rem = 1.5       Calc Rem [7.5  - (3*3   )] = -1.5
n2 : 13.5  n1 : 3    n2/n1 = 4.5       ORCL rem = 1.5       Calc Rem [13.5 - (3*5   )] = -1.5
n2 : 19.5  n1 : 3    n2/n1 = 6.5       ORCL rem = 1.5       Calc Rem [19.5 - (3*7   )] = -1.5
n2 : 2     n1 : 4    n2/n1 = .5        ORCL rem = 2         Calc Rem [2    - (4*1   )] = -2
n2 : 10    n1 : 4    n2/n1 = 2.5       ORCL rem = 2         Calc Rem [10   - (4*3   )] = -2
n2 : 18    n1 : 4    n2/n1 = 4.5       ORCL rem = 2         Calc Rem [18   - (4*5   )] = -2
n2 : 2.5   n1 : 5    n2/n1 = .5        ORCL rem = 2.5       Calc Rem [2.5  - (5*1   )] = -2.5
n2 : 12.5  n1 : 5    n2/n1 = 2.5       ORCL rem = 2.5       Calc Rem [12.5 - (5*3   )] = -2.5

PL/SQL procedure successfully completed.

SQL>

I have tested this code on Oracle 10g Release 2. Unfortunately, this bug also exists in Oracle 11g.

I have raised a Service Request and also filed a bug with Oracle Support.

Regards

Saturday, June 28, 2008

OTN - Discussion Forums in a great new look

Dear Readers,

Morning (GMT+3) when I tried to login to OTN-Discussion forum, it was down. But later when I checked it again, it's with a great new look.

With the new look comes new features. Following are the new features I have noticed:

* Great new look

* Add smiley's to messages

* Spell Check

* Formatting

* Reward points

* Save messages

* Tag messages and

* Abuse alerts

It should be easier for the OTN-newbie's to post their questions in a well formatted and readable format.

Enjoy discussions !!!

Tuesday, June 24, 2008

Why my index is not used?

Well, this question keeps popping up now and then. Yesterday, one of my colleagues also came up with this question: "Why is it that Oracle is not using index even though I am selecting less than 10% of data?".

We ran the query with autotrace enabled and the execution plan showed a Full Table Scan. This table contains over 29 Million records and by adding the predicate, result set is reduced to 2.3 Million records, which is 8% of total records.

SQL> set autotrace traceonly exp
SQL> SELECT *
  2      FROM quint_sec_tbl
  3      WHERE quint_type = 'XX06FR';

Execution Plan
----------------------------------------------------------
Plan hash value: 3917650069

----------------------------------------------------------------------------------------
| Id  | Operation         | Name               | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |                    |  3440 |   618K|   454K  (1)| 01:30:55 |
|*  1 |  TABLE ACCESS FULL| QUINT_SEC_TBL      |  3440 |   618K|   454K  (1)| 01:30:55 |
----------------------------------------------------------------------------------------

Cary Millsap's very old (but still valid) article "When to Use an Index" greatly helped me in this regard. This article unveils that Index consideration is based on block selectivity and not on row selectivity.

He also defines what Row Selectivity and Block Selectivity are in the article.

Row Selectivity:

You can define the row selectivity of a given where-clause predicate as the number of rows returned by the predicate (r) divided by the total number of rows in the table (R):


        P(r) =  r / R

Block Selectivity:

You can define the block selectivity of a given where-clause predicate analogously, as the number of data blocks containing at least one row matching the predicate condition (b) divided by the total number of data blocks below the high-water mark (B):


        P(b) = b / B

We can calculate block selectivity and row selectivity using SQL provided in this article. I used this SQL against my query and following are the results:

SQL> @hds
TableOwner : MYPRODUSER
TableName : QUINT_SEC_TBL
ColumnList : QUINT_TYPE
WhereClause: 
PageSize : 100

Table blocks below hwm    Table rows
         (B)                 (R)
---------------------- ----------------
             1,672,704       29,270,757
More: 

       Block selectivity  Block count    Row selectivity     Row count
QUINT_    (pb = b/B)          (b)          (pr = r/R)           (r)
------ ----------------- -------------- ----------------- ----------------
TT34DV            45.03%        753,277            37.99%       11,120,869
FG76SC            44.47%        743,788            13.67%        4,000,205
LH23Q2            42.78%        715,558             9.44%        2,762,284
XX06FR            42.32%        707,894             8.02%        2,346,846
:
:
:

Output of this SQL is sorted in descending order of block selectivity.

Looking at this output, row selectivity is only 8% but to fetch these 8% of rows Oracle has to visit 42% of blocks (block selectivity). That means, nearly half of the table's blocks contain at least one row for which QUINT_TYPE='XX06FR'. Instead of going through the hard path of Index Access, it’s more efficient for the optimizer to do a Full Table Scan.

So, now we know why the index was ignored and a Full Table Scan was preferred.

P.S.: Due to security reasons Username, Table name, column name and column values are modified.

Wednesday, June 18, 2008

SQL for Loan Payment

Dear all,

Inspired by "SQL for Buying a New Car", I thought of converting a very old excel sheet of mine into an SQL query. This excel sheet lists equated monthly installment (EMI), principal amount and the interest amount for duration of loan payment. I feed Rate of Interest, Loan Amount and Period of Loan (in years) to the excel sheet. This was very convenient for me to analyze what part of my EMI is going towards interest payment and what portion goes as principal repayment. It is extremely important to understand what goes for interest and what goes for principal repayment when you are planning for a loan.

Output from Excel sheet:

This can easily be converted into an SQL query using MODEL clause introduced in Oracle 10g.

The output of this query includes a summary of total interest to be paid, principal to be paid and the net payment to be made. Ofcourse, we need to ignore the Summary on "Remaining" column as this is meaningless.

SQL> set line 10000
SQL> set pages 500
SQL> set verify off
SQL>
SQL> column interest  format 999,999,999.99
SQL> column principal format 999,999,999.99
SQL> column remaining format 999,999,999.99
SQL>
SQL> define roi = 10          -- Rate of Interest
SQL> define years = 2         -- Payment Period
SQL> define amt = 100000      -- Loan Amount
SQL> define period = 12       -- Mode of payment.
SQL>                          --     Monthly=12, Bi-monthly=48, Quarterly=4, Half-yearly=2, Yearly=1, Adhoc=n
SQL>
SQL>
SQL> Select inst_no,
  2         sum(new_emi) emi,
  3         sum(new_interest) interest,
  4         sum(principal) principal,
  5         sum(remaining) remaining
  6    From ( Select inst_no + 1 inst_no,
  7                  new_emi,
  8                  interest,
  9                  decode( inst_no + 1, &period*&years, interest - remaining, interest)  new_interest,
 10                  decode( inst_no + 1, &period*&years, principal + remaining, principal)  principal,
 11                  decode( inst_no + 1, &period*&years, 0, remaining) remaining
 12             From  (select  rownum inst_no, roi, yrs, amt, period,
 13                            round( (roi * amt/(period * 100)) /
 14                                   (1 - power((1 + roi/(period * 100)), -yrs * period)), 2) emi
 15                      From (select &roi roi, &years yrs, &amt amt, &period period
 16                              From Dual) )
 17    MODEL
 18    DIMENSION BY ( inst_no )
 19    MEASURES     ( roi, yrs, amt, emi , period, 0 as new_emi, 0 as interest, 0 as principal, 0 as remaining)
 20    RULES
 21      UPSERT
 22      ITERATE (1000)
 23      (
 24        new_emi[ITERATION_NUMBER] = emi[1],
 25        interest[ITERATION_NUMBER]  = round(&roi/(period[1] * 100) *
 26                                      nvl(remaining[ITERATION_NUMBER-1], amt[1]), 2),
 27        principal[ITERATION_NUMBER] = emi[1] - interest[cv()],
 28        remaining[ITERATION_NUMBER] = nvl(remaining[ITERATION_NUMBER-1], amt[1]) - principal[cv()]))
 29  Where interest > 0
 30  Group by rollup( inst_no)
 31  order by  inst_no;

   INST_NO        EMI        INTEREST       PRINCIPAL       REMAINING
---------- ---------- --------------- --------------- ---------------
         1    4614.49          833.33        3,781.16       96,218.84
         2    4614.49          801.82        3,812.67       92,406.17
         3    4614.49          770.05        3,844.44       88,561.73
         4    4614.49          738.01        3,876.48       84,685.25
         5    4614.49          705.71        3,908.78       80,776.47
         6    4614.49          673.14        3,941.35       76,835.12
         7    4614.49          640.29        3,974.20       72,860.92
         8    4614.49          607.17        4,007.32       68,853.60
         9    4614.49          573.78        4,040.71       64,812.89
        10    4614.49          540.11        4,074.38       60,738.51
        11    4614.49          506.15        4,108.34       56,630.17
        12    4614.49          471.92        4,142.57       52,487.60
        13    4614.49          437.40        4,177.09       48,310.51
        14    4614.49          402.59        4,211.90       44,098.61
        15    4614.49          367.49        4,247.00       39,851.61
        16    4614.49          332.10        4,282.39       35,569.22
        17    4614.49          296.41        4,318.08       31,251.14
        18    4614.49          260.43        4,354.06       26,897.08
        19    4614.49          224.14        4,390.35       22,506.73
        20    4614.49          187.56        4,426.93       18,079.80
        21    4614.49          150.67        4,463.82       13,615.98
        22    4614.49          113.47        4,501.02        9,114.96
        23    4614.49           75.96        4,538.53        4,576.43
        24    4614.49           38.06        4,576.43             .00
            110747.76       10,747.76      100,000.00    1,189,739.34

25 rows selected.

SQL>

Results of this query can be verified here.

References:

1. Loan Calculator.

2. How to calculate EMI.

Happy reading.

Friday, June 06, 2008

Create a Unique Constraint Ignoring Existing Duplicate Data

Hi,

There was an interesting post on OTN forums, where the OP asks:

Can we create a unique index on a column having duplicate values which can enforce the uniqueness for future values.?

Well, this is possible as Oracle is rich in features. It can be achieved by creating a unique constraint with DEFERRABLE and NOVALIDATE keywords. By default, constraints are created as NON DEFERRABLE and VALIDATE. When we create a constraint with DEFERRABLE and NOVALIDATE, we instruct Oracle to defer checking existing data for uniqueness but always check the new or modified rows.

Here's how we can do this:
SQL> drop table t purge;

Table dropped.

SQL>
SQL> create table t
  2  ( x int );

Table created.

SQL>
SQL>
SQL> insert into t values ( 1 );

1 row created.

SQL> insert into t values ( 1);

1 row created.

SQL>
SQL> alter table t add constraint t_uk unique (x)
  2    deferrable initially immediate novalidate;

Table altered.

SQL> select * from t;

         X
----------
         1
         1

SQL>
SQL> alter table t modify constraint t_uk enable novalidate;

Table altered.

SQL>
SQL> insert into t values ( 1);
insert into t values ( 1)
*
ERROR at line 1:
ORA-00001: unique constraint (TEST.T_UK) violated


SQL> select * from t;

         X
----------
         1
         1

SQL> insert into t values (2);

1 row created.

SQL>

Happy reading !!!

Monday, May 26, 2008

Database is very slow!!!

Dear Readers,

I received a call from one of our developers and following was the conversation that took place between us:

He: “Database is very slow”

Me: What is very slow? Can you tell me what actually you are doing?

He: I am running a simple report which is supposed to return less than 100 records. But it’s been more than 4 hours and the query is still running.

Me: How long it use to take before?

He: This is the first time we are running this query.

Me: Ok, let me log in to the database.



I logged into the database and ran my set of commands to trace the culprit SQL. I was able to identify the query and here it is:

SELECT
      TO_CHAR(TRN_DATE, 'YYYY-MM-DD HH24:MI:SS') TRN_DATE,
      TRAN_TYPE,
      TRAN_REF,
      KEY_NO,
      IN_AMT A1,
      0 A2
 FROM TRANSACTION_DAILY
WHERE TRN_DATE = TO_DATE ( '#BUSDATE#', 'DDMMYYYY' )
      AND SRC_FLAG ='L'
      AND (SUBSTR(OUT_KEY_NO,1,10) = 'DXIY0-19XV' or SUBSTR(IN_KEY_NO,1,10) = 'DXIY0-19XV')
      AND TRAN_TYPE  NVL(KEY_NO,'A')  NVL(IN_AMT,0) NOT IN (
                    SELECT  TD_TRAN_TYPE  TD_KEY_NO_REF  NVL(OUT_AMT,0)
                      FROM TRANSACTION_DETAILS
                     WHERE TRN_DATE = TO_DATE ( '#BUSDATE#', 'DDMMYYYY' ) 
                       AND RM_CODE = 'XXX123'  )
UNION
SELECT
      TO_CHAR(TRN_DATE, 'YYYY-MM-DD HH24:MI:SS') TRN_DATE,
      TD_TRAN_TYPE,
      TRAN_REF, 
      TD_KEY_NO_REF, 
      0 A1,
      OUT_AMT A2
 FROM TRANSACTION_DETAILS
WHERE TRN_DATE = TO_DATE ( '#BUSDATE#', 'DDMMYYYY' )
  AND RM_CODE = 'XXX123' 
  AND TD_TRAN_TYPE  TD_KEY_NO_REF  NVL(OUT_AMT,0)   NOT IN (
                    SELECT TRAN_TYPE  NVL(KEY_NO,'A')  NVL(IN_AMT,0)
                      FROM TRANSACTION_DAILY
                     WHERE TRN_DATE = TO_DATE ( '#BUSDATE#', 'DDMMYYYY' )
                       AND SRC_FLAG ='L'
                       AND (SUBSTR(OUT_KEY_NO,1,10) = 'DXIY0-19XV' 
                            or SUBSTR(IN_KEY_NO,1,10) = 'DXIY0-19XV') )

and this is the execution plan:

-------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name                  | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
-------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |                       |       |       |    51M(100)|          |       |       |
|   1 |  SORT UNIQUE                          |                       |  2842 |   159K|    51M (70)|198:47:09 |       |       |
|   2 |   UNION-ALL                           |                       |       |       |            |          |       |       |
|*  3 |    FILTER                             |                       |       |       |            |          |       |       |
|   4 |     PARTITION RANGE SINGLE            |                       |  2042 |   135K|  2854   (2)| 00:00:40 |   KEY |   KEY |
|*  5 |      TABLE ACCESS FULL                | TRANSACTION_DAILY     |  2042 |   135K|  2854   (2)| 00:00:40 |   KEY |   KEY |
|   6 |     PARTITION RANGE SINGLE            |                       |   760 | 19000 | 15269   (1)| 00:03:34 |   KEY |   KEY |
|*  7 |      TABLE ACCESS BY LOCAL INDEX ROWID| TRANSACTION_DETAILS   |   760 | 19000 | 15269   (1)| 00:03:34 |   KEY |   KEY |
|*  8 |       INDEX RANGE SCAN                | TRAN_DET_IDX          |   434K|       |   980   (1)| 00:00:14 |   KEY |   KEY |
|*  9 |    FILTER                             |                       |       |       |            |          |       |       |
|  10 |     PARTITION RANGE SINGLE            |                       |   800 | 24800 | 15269   (1)| 00:03:34 |   KEY |   KEY |
|* 11 |      TABLE ACCESS BY LOCAL INDEX ROWID| TRANSACTION_DETAILS   |   800 | 24800 | 15269   (1)| 00:03:34 |   KEY |   KEY |
|* 12 |       INDEX RANGE SCAN                | TRAN_DET_IDX          |   434K|       |   980   (1)| 00:00:14 |   KEY |   KEY |
|* 13 |     TABLE ACCESS BY GLOBAL INDEX ROWID| TRANSACTION_DAILY     |  1940 |   111K| 88735   (1)| 00:20:43 | ROW L | ROW L |
|* 14 |      INDEX RANGE SCAN                 | PK_TRANSACTION_DAILY  |   102K|       |   413   (1)| 00:00:06 |       |       |
-------------------------------------------------------------------------------------------------------------------------------


From the above execution plan, Optimizer thinks it needs “198:47:09” hours to “Sort Unique” the result set. But the actual time the query took to complete was 6 hours and 49 minute (24540 seconds).

I then simplified the query in order to understand it clearly by removing some of the predicates. The simplified query is:

 
SELECT *
 FROM TRANSACTION_DAILY
WHERE TRN_DATE = TO_DATE ( '#BUSDATE#', 'DDMMYYYY' ) 
      AND TRAN_TYPE || NVL(KEY_NO,'A') || NVL(IN_AMT,0) NOT IN (
                    SELECT  TD_TRAN_TYPE || 
                            TD_KEY_NO_REF || 
                            NVL(OUT_AMT,0) 
                      FROM TRANSACTION_DETAILS
                     WHERE TRN_DATE = 
                           TO_DATE ( '#BUSDATE#', 'DDMMYYYY' )  )
UNION
SELECT *
 FROM TRANSACTION_DETAILS
WHERE TRN_DATE = TO_DATE ( '#BUSDATE#', 'DDMMYYYY' ) 
  AND TD_TRAN_TYPE || TD_KEY_NO_REF || NVL(OUT_AMT,0)   NOT IN (
                    SELECT TRAN_TYPE || 
                           NVL(KEY_NO,'A') || 
                           NVL(IN_AMT,0)
                      FROM TRANSACTION_DAILY
                     WHERE TRN_DATE = 
                            TO_DATE ( '#BUSDATE#', 'DDMMYYYY' ) )

Hmmmm, this query seems to be doing a full outer join on “TRANSACTION_DAILY” and “TRANSACTION_DETAILS” tables. I called him back:

Me: What is the objective of this report?

He: We need to find all the information from “TRANSACTION_DAILY” table which does not exist in “TRANSACTION_DETAILS” table and vice versa.

Yes, I was right. It’s doing a “Full Outer Join”. I modified his original query and proposed him a new query which is executing at lighting speed and more importantly yielding the same result:

 
SELECT  nvl(a.TRN_DATE, b.TRN_DATE) TRN_DATE,
        a.TRN_TYPE, b.OTH_TRN_TYPE,
        a.KEY_NO, b.KEY_NO_REF,
        nvl(a.IN_AMT, 0) A1, 
        nvl(b.OUT_AMT, 0) A2
FROM  (select TRN_DATE, TRN_TYPE, KEY_NO, IN_AMT, 
         from TRANSACTION_DAILY 
        where TRN_DATE = to_date('16-05-2008', 'DD-MM-YYYY')
          and (substr(IN_KEY_NO, 1, 10) = 'DXIY0-19XV' OR
               substr(OUT_KEY_NO, 1, 10) = 'DXIY0-19XV')) a 
  FULL OUTER JOIN 
      (select TRN_DATE, OTH_TRN_TYPE, KEY_NO_REF, OUT_AMT, 
         from TRANSACTION_DETAILS 
        where RM_CODE = 'XXX123'
          and TRN_DATE = to_date('16-05-2008', 'DD-MM-YYYY')) b
  ON (a.TRN_DATE = b.TRN_DATE and 
      a.TRN_TYPE = b.OTH_TRN_TYPE and 
      a.KEY_NO = b.KEY_NO_REF and 
      a.IN_AMT = b.OUT_AMT and 
      a.rno = b.rno)
WHERE (a.TRN_TYPE is null or 
       b.OTH_TRN_TYPE is null or 
       a.KEY_NO is null or 
       b.KEY_NO_REF is null or 
       a.IN_AMT is null or 
       b.OUT_AMT is null);

I called him back with over-excitement announcing him that the query is ready and let us test it:

Me: I have re-written the query, kindly come over to my office so that we can test it together.

He: How long does it take to execute?

Me: It completes in less than 4 seconds.

He: What...? … less than 4 seconds … ? … Are you sure, have you included all the condition that I have mentioned?

Me: Yes, let us test it.

He made couple of tests and the query was perfect to all the test cases but except one. Requirements started getting little messy so, I decided to create two tables and test the query against them. I created two tables, A and B, of same table structure with sample data:

 
create table a(id number, amt number);
insert into a values (1, 10);
insert into a values (2, 20);
insert into a values (3, 30);
insert into a values (4, 40);

create table b(id number, amt number);
insert into b values (2, 20);
insert into b values (3, 30);
insert into b values (5, 50);
insert into b values (6, 60);

commit;

Basically, we need to find out data that exists in A and is not in B and also records that exist in B which are not A. Here’s the query:

 
SQL> select *
  2    from a FULL OUTER JOIN b
  3      on a.id = b.id
  4   where a.id is null or b.id is null;

        ID        AMT         ID        AMT
---------- ---------- ---------- ----------
         4         40
                               6         60
                               5         50

SQL>

He said: “Yes, the query is fetching right data, but what happens when you have two records with same id and amt in table A and only a single record in B? In this case, the query should display one record from table A.”

Adding more to the complexity he said: “There is no third column in either table to distinguish this occurrence.”

So, we inserted a record in each table and ran the same query:

 
insert into a values (1, 10);
insert into b values (1, 10);


SQL> select *
  2    from a FULL OUTER JOIN b
  3      on a.id = b.id
  4   where a.id is null or b.id is null;

        ID        AMT         ID        AMT
---------- ---------- ---------- ----------
         4         40
                               6         60
                               5         50

SQL>

Oops!! The query fails at this point. Ok, I then decided to re-write this query and this time use analytical functions to rescue me from the current problematic situation.

 
SQL> select x.id, x.amt, y.id, y.amt
  2    from (select id, amt, 
  3                 row_number() over (partition by id, amt 
  4                                    order by id, amt) rno 
  5            from a) x
  6         FULL OUTER JOIN
  7         (select id, amt, 
  8                 row_number() over (partition by id, amt 
  9                                     order by id, amt) rno 
 10                 from b) y
 11      on x.id = y.id and x.rno = y.rno
 12   where x.id is null 
 13      or y.id is null 
 14      or x.rno is null 
 15      or y.rno is null;

        ID        AMT         ID        AMT
---------- ---------- ---------- ----------
         1         10
         4         40
                               5         50
                               6         60

SQL>

Yippy!!! This query rocks. I then quickly transformed the original query into this form and ran it again:

 
SELECT  nvl(a.TRN_DATE, b.TRN_DATE) TRN_DATE,
        a.TRN_TYPE, b.OTH_TRN_TYPE,
        a.KEY_NO, b.KEY_NO_REF,
        nvl(a.IN_AMT, 0) A1, 
        nvl(b.OUT_AMT, 0) A2
FROM  (select TRN_DATE, TRN_TYPE, KEY_NO, IN_AMT, 
              row_number() over (partition by KEY_NO, 
                                              TRN_TYPE, 
                                              IN_AMT 
                                     order by KEY_NO, 
                                              TRN_TYPE, 
                                              IN_AMT) rno 
         from TRANSACTION_DAILY 
        where TRN_DATE = to_date('16-05-2008', 'DD-MM-YYYY')
          and (substr(IN_KEY_NO, 1, 10) = 'DXIY0-19XV' OR
               substr(OUT_KEY_NO, 1, 10) = 'DXIY0-19XV')) a 
  FULL OUTER JOIN 
      (select TRN_DATE, OTH_TRN_TYPE, KEY_NO_REF, OUT_AMT, 
              row_number() over (partition by KEY_NO_REF, 
                                              OTH_TRN_TYPE, 
                                              OUT_AMT 
                                     order by KEY_NO_REF, 
                                              OTH_TRN_TYPE, 
                                              OUT_AMT) rno    
         from TRANSACTION_DETAILS 
        where RM_CODE = 'XXX123'
          and TRN_DATE = to_date('16-05-2008', 'DD-MM-YYYY')) b
  ON (a.TRN_DATE = b.TRN_DATE and 
      a.TRN_TYPE = b.OTH_TRN_TYPE and 
      a.KEY_NO = b.KEY_NO_REF and
      a.IN_AMT = b.OUT_AMT and 
      a.rno = b.rno)
WHERE (a.TRN_TYPE is null or 
       b.OTH_TRN_TYPE is null or 
       a.KEY_NO is null or 
       b.KEY_NO_REF is null or 
       a.IN_AMT is null or 
       b.OUT_AMT is null or
       a.rno is null or
       b.rno is null);

Execution plan of this query is:

 
----------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                | Name                  | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
----------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                         |                       |  2230 |   409K| 48722   (1)| 00:11:23 |       |       |
|   1 |  VIEW                                    |                       |  2230 |   409K| 48722   (1)| 00:11:23 |       |       |
|   2 |   UNION-ALL                              |                       |       |       |            |          |       |       |
|*  3 |    FILTER                                |                       |       |       |            |          |       |       |
|*  4 |     HASH JOIN RIGHT OUTER                |                       |  1552 |   175K| 24361   (1)| 00:05:42 |       |       |
|   5 |      VIEW                                |                       |   678 | 40002 | 14852   (1)| 00:03:28 |       |       |
|   6 |       WINDOW SORT                        |                       |   678 | 12882 | 14852   (1)| 00:03:28 |       |       |
|   7 |        PARTITION RANGE SINGLE            |                       |   678 | 12882 | 14851   (1)| 00:03:28 |    14 |    14 |
|*  8 |         TABLE ACCESS BY LOCAL INDEX ROWID| TRANSACTION_DETAILS   |   678 | 12882 | 14851   (1)| 00:03:28 |    14 |    14 |
|*  9 |          INDEX RANGE SCAN                | TRAN_DET_IDX          |   422K|       |   897   (1)| 00:00:13 |    14 |    14 |
|  10 |      VIEW                                |                       |  1552 | 88464 |  9508   (1)| 00:02:14 |       |       |
|  11 |       WINDOW SORT                        |                       |  1552 | 69840 |  9508   (1)| 00:02:14 |       |       |
|  12 |        PARTITION RANGE SINGLE            |                       |  1552 | 69840 |  9507   (1)| 00:02:14 |     7 |     7 |
|* 13 |         TABLE ACCESS FULL                | TRANSACTION_DAILY     |  1552 | 69840 |  9507   (1)| 00:02:14 |     7 |     7 |
|* 14 |    HASH JOIN ANTI                        |                       |   678 | 78648 | 24361   (1)| 00:05:42 |       |       |
|  15 |     VIEW                                 |                       |   678 | 40002 | 14852   (1)| 00:03:28 |       |       |
|  16 |      WINDOW SORT                         |                       |   678 | 12882 | 14852   (1)| 00:03:28 |       |       |
|  17 |       PARTITION RANGE SINGLE             |                       |   678 | 12882 | 14851   (1)| 00:03:28 |    14 |    14 |
|* 18 |        TABLE ACCESS BY LOCAL INDEX ROWID | TRANSACTION_DETAILS   |   678 | 12882 | 14851   (1)| 00:03:28 |    14 |    14 |
|* 19 |         INDEX RANGE SCAN                 | TRAN_DET_IDX          |   422K|       |   897   (1)| 00:00:13 |    14 |    14 |
|  20 |     VIEW                                 |                       |  1552 | 88464 |  9508   (1)| 00:02:14 |       |       |
|  21 |      WINDOW SORT                         |                       |  1552 | 69840 |  9508   (1)| 00:02:14 |       |       |
|  22 |       PARTITION RANGE SINGLE             |                       |  1552 | 69840 |  9507   (1)| 00:02:14 |     7 |     7 |
|* 23 |        TABLE ACCESS FULL                 | TRANSACTION_DAILY     |  1552 | 69840 |  9507   (1)| 00:02:14 |     7 |     7 |
----------------------------------------------------------------------------------------------------------------------------------

The results were convincing and it still takes 4 seconds to fetch the data compared to 6 hours and 49 minutes. It is 6135 times faster than the original one.

This query left End-Users rejoicing and he left my desk smiling but without answering my question, “Is the database very slow?”

P.S.: I have renamed all the table names and columns names so as not to reveal official and sensitive information on my personal blog, yet preserving the reality.

Regards

Friday, May 23, 2008

Automatic statistics gathering during Index Creation and Rebuilds

Dear all,

While testing dynamic sampling in Oracle 10g, I came to learn a 10g new feature (very late though).

My test case for dynamic sampling goes like this:

(a) Create table with data (b) Create an index, and (c) Execute the query and note the execution plan
SQL> Create table t as select * from scott.emp;

Table created.

SQL> Create index t_idx on t(empno);

Index created.

SQL> set autotrace traceonly explain
SQL> Select * from t where empno = 7788;

Execution Plan
----------------------------------------------------------
Plan hash value: 1020776977

-------------------------------------------------------------------------------------
| Id  | Operation                   | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |       |     1 |    87 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T     |     1 |    87 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | T_IDX |     1 |       |     1   (0)| 00:00:01 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("EMPNO"=7788)

Note
-----
   - dynamic sampling used for this statement

SQL> Set autotrace off

Yes, the optimizer did dynamic sampling and picked up the index plan as the optimal execution plan. Now, let me delete the statistics and re-run the same SQL.

SQL> Exec dbms_stats.delete_table_stats(user, 't');

PL/SQL procedure successfully completed.

SQL> set autotrace traceonly explain
SQL> Select * from t where empno = 7788;

Execution Plan
----------------------------------------------------------
Plan hash value: 2153619298

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |     1 |    87 |     3   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T    |     1 |    87 |     3   (0)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("EMPNO"=7788)

Note
-----
   - dynamic sampling used for this statement

SQL> Set autotrace off

Oops, optimizer did consider dynamic sampling but this time chose a Full Table Scan. How’ come the same optimizer picked index access plan on the first run?

Ok, let me do it again and this time being very careful.

(a) Create table with data
SQL> Drop table t purge;

Table dropped.

SQL> Create table t as select * from scott.emp;

Table created.

SQL> Select num_rows, last_analyzed from user_tables where table_name = 'T';

  NUM_ROWS LAST_ANALYZED
---------- --------------


SQL>

(b) Create an index, and

SQL> Create index t_idx on t(empno);

Index created.

SQL> column index_name format a10
SQL> set line 10000

SQL> select blevel, leaf_blocks, distinct_keys, clustering_factor, 
  2  num_rows, last_analyzed, user_stats, global_stats
  3  from user_indexes
  4  where index_name = 'T_IDX';

    BLEVEL LEAF_BLOCKS DISTINCT_KEYS CLUSTERING_FACTOR   NUM_ROWS LAST_ANALYZED        USE GLO
---------- ----------- ------------- ----------------- ---------- -------------------- --- ---
         0           1            14                 1         14 05-May-2008 10:14:48 NO  NO

SQL> 

Oh! I did not gather statistics.

(c) Execute the query and note the execution plan

SQL> set autotrace traceonly explain
SQL> Select * from t where empno = 7788;

Execution Plan
----------------------------------------------------------
Plan hash value: 1020776977

-------------------------------------------------------------------------------------
| Id  | Operation                   | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |       |     1 |    87 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T     |     1 |    87 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | T_IDX |     1 |       |     1   (0)| 00:00:01 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("EMPNO"=7788)

Note
-----
   - dynamic sampling used for this statement

SQL> Set autotrace off

Now it’s clear why optimizer finds Index Access Path to be best plan. Let’s continue and delete the statistics and then execute the query again:

SQL> Exec dbms_stats.delete_table_stats(user, 't');

PL/SQL procedure successfully completed.

SQL> select blevel, leaf_blocks, distinct_keys, clustering_factor, 
  2  num_rows, last_analyzed, user_stats, global_stats
  3  from user_indexes
  4  where index_name = 'T_IDX';

    BLEVEL LEAF_BLOCKS DISTINCT_KEYS CLUSTERING_FACTOR   NUM_ROWS LAST_ANALYZED        USE GLO
---------- ----------- ------------- ----------------- ---------- -------------------- --- ---
                                                                                       NO  NO

SQL> set autotrace traceonly explain
SQL> Select * from t where empno = 7788;

Execution Plan
----------------------------------------------------------
Plan hash value: 2153619298

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |     1 |    87 |     3   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T    |     1 |    87 |     3   (0)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("EMPNO"=7788)

Note
-----
   - dynamic sampling used for this statement

SQL>  Set autotrace off

This time the optimizer did dynamic sampling and FTS seems to be less expensive.

In Oracle 10g, when an index is created, Oracle automatically gathers statistics on the newly created index. This is also true in case of index rebuild. Because of this reason, optimizer picked Index Path to be the best plan.

References:

Oracle® Database SQL Reference 10g Release 2 (10.2)

Regards

Monday, May 12, 2008

ORA-27061: waiting for async I/Os failed

Hi

One of our Physical Standby database aborted with "ORA-27061: waiting for async I/Os failed" error. The instance was killed by the DBWR process. Apparently, it was found to be an OS bug (IBM - AIX5L Based Systems) and an OS patch seems to resolve this issue.

Our current OS Patch level is AIX5.3 TL5-CSP and we need to upgrade it to either AIX5.3 TL6 SP4 or AIX5.3 TL7 to avoid this problem occurring in the future.

References:

Metalink Note Id: 467613.1 addresses this issue.

Happy reading :-)

Monday, May 05, 2008

Becareful when using DBMS_UTILITY to analyze

Hello,

If you are still using DBMS_UTILITY.ANALYZE_DATABASE or DBMS_UTILITY.ANALYZE_SCHEMA to analyze your database/schema's then you need to be very cautious.

I have observed a very strange behavior of this procedure against partitioned tables in one of our databases. Statistics are not being updated at table level. Although, partitions statistics are up to date.

The reason for incorrect statistics is:

If you use DBMS_STATS package to gather table statistics on a partitioned table and then later you use DBMS_UTILITY.ANALYZE_SCHEMA, table-level statistics are NOT updated, rather, statistics on partitions and indexes are modified.

This peculiar behavior is observed only with partitioned tables.

ANALYZE_SCHEMA procedure is obsolete and any one of us using this to gather statistics should seriously think of moving to DBMS_STATS package.

Following is a simple demo:

SQL> CREATE TABLE part_tab
  2    (id  NUMBER(5),
  3     dt    DATE)
  4     PARTITION BY RANGE(dt)
  5     (
  6     PARTITION part1_jan2008 VALUES LESS THAN(TO_DATE('01/02/2008','DD/MM/YYYY')),
  7     PARTITION part2_feb2008 VALUES LESS THAN(TO_DATE('01/03/2008','DD/MM/YYYY')),
  8     PARTITION part3_mar2008 VALUES LESS THAN(TO_DATE('01/04/2008','DD/MM/YYYY')),
  9     PARTITION part4_apr2008 VALUES LESS THAN(TO_DATE('01/05/2008','DD/MM/YYYY'))
 10    );

Table created.

SQL>
SQL>
SQL> create table non_part_tab (id number, dt date);

Table created.

SQL> select table_name, num_rows, last_analyzed from user_tables ;

TABLE_NAME                       NUM_ROWS LAST_ANALYZED
------------------------------ ---------- -------------------
NON_PART_TAB
PART_TAB

SQL> select partition_name, num_rows, last_analyzed from user_tab_partitions;

PARTITION_NAME                   NUM_ROWS LAST_ANALYZED
------------------------------ ---------- -------------------
PART2_FEB2008
PART1_JAN2008
PART3_MAR2008
PART4_APR2008

SQL> exec dbms_utility.analyze_schema('TEST', 'COMPUTE');

PL/SQL procedure successfully completed.

SQL> select table_name, num_rows, last_analyzed from user_tables ;

TABLE_NAME                       NUM_ROWS LAST_ANALYZED
------------------------------ ---------- -------------------
NON_PART_TAB                            0 05-05-2008 00:05:43
PART_TAB                                0 05-05-2008 00:05:43

SQL> select partition_name, num_rows, last_analyzed from user_tab_partitions;

PARTITION_NAME                   NUM_ROWS LAST_ANALYZED
------------------------------ ---------- -------------------
PART2_FEB2008                           0 05-05-2008 00:05:43
PART1_JAN2008                           0 05-05-2008 00:05:43
PART3_MAR2008                           0 05-05-2008 00:05:43
PART4_APR2008                           0 05-05-2008 00:05:43

SQL> exec dbms_utility.analyze_schema('TEST', 'COMPUTE');

PL/SQL procedure successfully completed.

SQL> select table_name, num_rows, last_analyzed from user_tables ;

TABLE_NAME                       NUM_ROWS LAST_ANALYZED
------------------------------ ---------- -------------------
NON_PART_TAB                            0 05-05-2008 00:05:59
PART_TAB                                0 05-05-2008 00:05:59

SQL> select partition_name, num_rows, last_analyzed from user_tab_partitions;

PARTITION_NAME                   NUM_ROWS LAST_ANALYZED
------------------------------ ---------- -------------------
PART2_FEB2008                           0 05-05-2008 00:05:59
PART1_JAN2008                           0 05-05-2008 00:05:59
PART3_MAR2008                           0 05-05-2008 00:05:59
PART4_APR2008                           0 05-05-2008 00:05:59

SQL> exec dbms_stats.gather_schema_stats('TEST');

PL/SQL procedure successfully completed.

SQL> select table_name, num_rows, last_analyzed from user_tables ;

TABLE_NAME                       NUM_ROWS LAST_ANALYZED
------------------------------ ---------- -------------------
NON_PART_TAB                            0 05-05-2008 00:06:11
PART_TAB                                0 05-05-2008 00:06:11

SQL> select partition_name, num_rows, last_analyzed from user_tab_partitions;

PARTITION_NAME                   NUM_ROWS LAST_ANALYZED
------------------------------ ---------- -------------------
PART2_FEB2008                           0 05-05-2008 00:06:11
PART1_JAN2008                           0 05-05-2008 00:06:11
PART3_MAR2008                           0 05-05-2008 00:06:11
PART4_APR2008                           0 05-05-2008 00:06:11

SQL> exec dbms_utility.analyze_schema('TEST', 'COMPUTE');

PL/SQL procedure successfully completed.

SQL> select table_name, num_rows, last_analyzed from user_tables ;

TABLE_NAME                       NUM_ROWS LAST_ANALYZED
------------------------------ ---------- -------------------
NON_PART_TAB                            0 05-05-2008 00:06:23
PART_TAB                                0 05-05-2008 00:06:11 ---> Statistics are NOT updated.

SQL> select partition_name, num_rows, last_analyzed from user_tab_partitions;

PARTITION_NAME                   NUM_ROWS LAST_ANALYZED
------------------------------ ---------- -------------------
PART2_FEB2008                           0 05-05-2008 00:06:23
PART1_JAN2008                           0 05-05-2008 00:06:23
PART3_MAR2008                           0 05-05-2008 00:06:23
PART4_APR2008                           0 05-05-2008 00:06:23

SQL> exec dbms_utility.analyze_schema('TEST', 'COMPUTE');

PL/SQL procedure successfully completed.

SQL> select table_name, num_rows, last_analyzed from user_tables ;

TABLE_NAME                       NUM_ROWS LAST_ANALYZED
------------------------------ ---------- -------------------
NON_PART_TAB                            0 05-05-2008 00:08:53
PART_TAB                                0 05-05-2008 00:06:11 ---> Statistics are NOT updated.

SQL> select partition_name, num_rows, last_analyzed from user_tab_partitions;

PARTITION_NAME                   NUM_ROWS LAST_ANALYZED
------------------------------ ---------- -------------------
PART2_FEB2008                           0 05-05-2008 00:08:53
PART1_JAN2008                           0 05-05-2008 00:08:53
PART3_MAR2008                           0 05-05-2008 00:08:53
PART4_APR2008                           0 05-05-2008 00:08:53

SQL> exec dbms_stats.gather_schema_stats('TEST');

PL/SQL procedure successfully completed.

SQL> select table_name, num_rows, last_analyzed from user_tables ;

TABLE_NAME                       NUM_ROWS LAST_ANALYZED
------------------------------ ---------- -------------------
NON_PART_TAB                            0 05-05-2008 00:09:05
PART_TAB                                0 05-05-2008 00:09:05

SQL> select partition_name, num_rows, last_analyzed from user_tab_partitions;

PARTITION_NAME                   NUM_ROWS LAST_ANALYZED
------------------------------ ---------- -------------------
PART2_FEB2008                           0 05-05-2008 00:09:05
PART1_JAN2008                           0 05-05-2008 00:09:05
PART3_MAR2008                           0 05-05-2008 00:09:05
PART4_APR2008                           0 05-05-2008 00:09:05

SQL>

Useful references:

1)How to Move from ANALYZE (using DBMS_UTILITY) to DBMS_STATS (Note: 237397.1)

2) Oracle® Database PL/SQL Packages and Types Reference 10g Release 2 (10.2)

Happy reading !!!

Thursday, May 01, 2008

Small Change and A Huge Gain

Hi,

Elapsed time of one of our data warehouse procedure was 17 minutes on an average. Following is the skeleton procedure:

Create or replace procedure Update_Customers Is
  Cursor Cust_Cur Is
         Select * From Stg_Customers;

  l_Cust_Value Customers.Cust_Value;
Begin
  For Cust_Rec In Cust_Cur Loop
    :
    :
    Select Value Into l_Cust_Value
      From Customers
     Where Cust_Code = Cust_Rec.Cust_Code
       And Cust_Key = Cust_Rec.Cust_Key;
    :
    :
  End Loop;
End Update_Customers;

Upon taking snaps before and after executing the procedure it was evident that most of the time was consumed by the "SELECT...FROM Customers...." statement.

Elapsed Time from AWR report:

Procedure: 1021 Seconds

SQL Statement: 925 Seconds

There is a Composite Index on "Cust_Code" and "Cust_Key" columns. When I ran the same statement in SQL*Plus, it was fetching results very fast using the appropriate index.

STG_CUSTOMERS is a staging table which consists of nearly 250,000 records. Data is daily purged and populated in this table. CUSTOMERS table was probed for 250,000 times in the loop, individual query execution was very fast but repeated executions within the loop were causing the query to consume more time.

I replaced the original cursor by joining CUSTOMERS and STG_CUSTOMERS tables as shown below:

Create or replace procedure Update_Customers Is
  Cursor Cust_Cur Is
         Select * From Stg_Customers A, Customers b
           Where a.cust_code = b.cust_code
              And a.cust_key =  b.cust_key;

  l_Cust_Value Customers.Cust_Value;
Begin
  For Cust_Rec In Cust_Cur Loop
    :
    :
    :
  End Loop;
End Update_Customers;

When this modified procedure was executed, the performance was remarkably improved and the elapsed time dropped to only less than 75 seconds.

Below is the elapsed time of the same procedure before and after modification:

TYPE            DATE           Elapsed(Min)        CPU(Min)
--------------- ----------- --------------- ---------------
Procedure       19-Apr-2008           17.70            1.87
SQL Statement   19-Apr-2008           15.98           13.15
Procedure       20-Apr-2008           17.67            1.80
SQL Statement   20-Apr-2008           16.05           12.95
Procedure       21-Apr-2008           16.93            1.82
SQL Statement   21-Apr-2008           15.35           12.85
Procedure       22-Apr-2008           16.68            1.78
SQL Statement   22-Apr-2008           15.08           12.42
Procedure       23-Apr-2008           16.38            1.78
SQL Statement   23-Apr-2008           14.77           12.43
Procedure       24-Apr-2008            1.15             .92
Procedure       25-Apr-2008            1.13             .92
Procedure       26-Apr-2008            1.20             .93
Procedure       27-Apr-2008            1.23             .93

Regards