Sunday, November 23, 2008

Dynamic Sampling Myths Dispelled

There are couple of myths associated with Dynamic Sampling, I thought of eliminating them with the help of this post. Two myths associated with DS are:

i) Default number of blocks sampled are 32 in Oracle 10g and ii) Tables having less than 32 blocks are not considered by the optimizer for dynamic sampling.

Myth 1: Default number of blocks sampled are 32 in Oracle 10g

The default number of dynamic sampling blocks are 32. This is set using a hidden parameter “_optimizer_dyn_smp_blks” (ofcourse one should not change it). Here’s the query to see the default settings:


SQL> column "Parameter" format a30
SQL> column "Session Value" format a20
SQL> column "Instance Value" format a20
SQL>
SQL> select a.ksppinm  "Parameter", b.ksppstvl "Session Value", c.ksppstvl "Instance Value"
  2    from x$ksppi a, x$ksppcv b, x$ksppsv c
  3   where a.indx = b.indx and a.indx = c.indx
  4     and ksppinm= '_optimizer_dyn_smp_blks';

Parameter                      Session Value        Instance Value
------------------------------ -------------------- --------------------
_optimizer_dyn_smp_blks        32                   32

Starting with Oracle 10g, by default “optimizer_dynamic_sampling” initialization parameter is set to “2”, meaning optimizer will dynamically sample all unanalyzed tables. The number of blocks sampled is two times the default number of dynamic sampling blocks. So at this level, Optimizer will sample 64 blocks.

To verify how many blocks are sampled, we need to dig into the 10053 trace output. For the purpose of this test, I will create a test table and dump some data into it. Later enable 10053 trace, execute the query and disable the trace.


SQL> drop table test purge;

Table dropped.

SQL> create table test as select * from all_objects where 1=2;

Table created.

SQL> create index t_object_id on test(object_id);

Index created.

SQL> insert into test select * from all_objects;

35023 rows created.

SQL> commit;

Commit complete.

SQL> alter session set tracefile_identifier = 'blksize';

Session altered.

SQL> alter session set events '10053 trace name context forever, level 1';

Session altered.

SQL> select * from test where object_id = 123;

no rows selected

SQL> alter session set events '10053 trace name context off';

Session altered.

SQL> disc
Disconnected from Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
SQL>

Digging into the trace, we see the following:

** Executed dynamic sampling query:
    level : 2
    sample pct. : 12.701613
    actual sample size : 4097
    filtered sample card. : 0
    orig. card. : 82
    block cnt. table stat. : 496
    block cnt. for sampling: 496
    max. sample block cnt. : 64
    sample block cnt. : 63
    min. sel. est. : 0.01000000

Well, it’s clear for the trace that at level 2, the maximum number of blocks sampled (max. sample block cnt) are 64 and the actual number of blocks sampled (sample block cnt) are 63. (For some reason, Oracle always samples one block less)

So, this myth has been uncovered and we know how many blocks are sampled.

Myth 2: Tables having less than 32 blocks are not considered by the optimizer for dynamic sampling

It is widely understood that, tables having less than 32 blocks are ignored by the optimizer for dynamic sampling.

Let’s continue using the same table as in case 1. However, it contains data, so we will truncate it. After truncating, verify number of blocks allocated by querying USER_SEGMENTS view.


SQL> truncate table test;

Table truncated.

SQL> select blocks from user_segments where segment_name = 'TEST';

    BLOCKS
----------
         8
This table has only 8 blocks allocated. Let’s see whether optimizer considers dynamic sampling on this table by executing a simple query.
SQL> set autotrace traceonly exp
SQL> select * from test where object_id = 123;

Execution Plan
----------------------------------------------------------
Plan hash value: 1357081020

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |     1 |   128 |     2   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| TEST |     1 |   128 |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("OBJECT_ID"=123)

Note
-----
   - dynamic sampling used for this statement

SQL> set autotrace off
SQL>

Yes, Oracle optimizer did consider dynamic sampling although the number of blocks were less than 32.

So, in this post we were able to uncover myths associated with dynamic sampling through very simple and reproducible examples.

Wednesday, November 05, 2008

Analytic Functions: A Savior

Hi,

Yesterday, I have come across yet another performance issue. I received a complaint from one of our developers that they have a business report which is running extremely slow.

I received his mail and attached was the query:

SELECT   
  XLT.VOU_DET.VOU_DATE,
  XLT.VOU_DET.VALUE_DATE,
  XLT.VOU_DET.DESCRIPTION,
  XLT.VOU_DET.PIX_CODE,
  VOU_TYPE.DESCR,
  XLT.VOU_DET.PIX_BRN,
  XLT.VOU_DET.DR_BAL_ORIG,
  XLT.VOU_DET.CR_BAL_ORIG,
  XLT.VOU_DET.CLOSING_BAL_ORIG,
  XLT.VOU_LOOKUP.NAME,
  XLT.VOU_DET.VOU_SEQ,
  (SELECT TO_CHAR(X.OPEN_BAL_ORIG) 
     FROM XLT.VOU_DET X 
    WHERE X.ASOF_DATE BETWEEN to_date('01-05-2007', 'dd-mm-yyyy') 
                          AND to_date('30-09-2008', 'dd-mm-yyyy')
      AND X.VOU_CODE  = '9900016WXYRT01'
      AND X.VOU_SEQ = 1
      AND ROWNUM = 1) OPEN_BAL_ORIG,
  (SELECT count(  XLT.VOU_DET.DR_BAL_ORIG) 
     FROM (select DR_BAL_ORIG 
              From XLT.VOU_DET X 
             WHERE X.ASOF_DATE BETWEEN to_date('01-05-2007', 'dd-mm-yyyy') 
                                   AND to_date('30-09-2008', 'dd-mm-yyyy')
               AND X.VOU_CODE  = '9900016WXYRT01'
               AND X.DR_BAL_ORIG <>0)) DR_BAL_ORIG_CNT,
  (SELECT count(  XLT.VOU_DET.CR_BAL_ORIG) 
     FROM (select CR_BAL_ORIG 
             From XLT.VOU_DET X 
            WHERE X.ASOF_DATE BETWEEN to_date('01-05-2007', 'dd-mm-yyyy') 
                                  AND to_date('30-09-2008', 'dd-mm-yyyy')
              AND X.VOU_CODE  = '9900016WXYRT01'
              AND X.CR_BAL_ORIG <>0)) CR_BAL_ORIG_CNT 
FROM
  XLT.VOU_DET,
  XLT.X_VOU_TYPE  VOU_TYPE,
  XLT.VOU_LOOKUP
WHERE XLT.VOU_DET.VOU_TYPE_ID=VOU_TYPE.VOU_TYPE_ID
  AND XLT.VOU_DET.VOU_REF(+)=XLT.VOU_LOOKUP.CUST_CODE_WNG
  AND XLT.VOU_DET.ASOF_DATE  BETWEEN  to_date('01-05-2007', 'dd-mm-yyyy') AND to_date('30-09-2008', 'dd-mm-yyyy')
  AND XLT.VOU_DET.VOU_CODE  =  '9900016WXYRT01'
ORDER BY
  XLT.VOU_DET.VOU_SEQ,
  XLT.VOU_DET.ASOF_DATE;

This looks like a case of missing Analytic functions.

The three SELECT statements within the query can be easily replaced with simple Analytic functions. The first one is to fetch the opening balance:

  (SELECT TO_CHAR(X.OPEN_BAL_ORIG) 
     FROM XLT.VOU_DET X 
    WHERE X.ASOF_DATE BETWEEN to_date('01-05-2007', 'dd-mm-yyyy') 
                          AND to_date('30-09-2008', 'dd-mm-yyyy')
      AND X.VOU_CODE  = '9900016WXYRT01'
      AND X.VOU_SEQ = 1
      AND ROWNUM = 1) OPEN_BAL_ORIG,

and could be easily rewritten as:

first_value(OPEN_BAL_ORIG) 
      over (partition by VOU_CODE 
                Order by ASOF_DATE, decode(VOU_SEQ, 0, 99, VOU_SEQ)) VOU_SEQ,

The second and third are the number of Debit and Credit transactions respectively and can also be written as:

  count(decode(DR_BAL_ORIG, null, null, 0, null, 1)) 
      over (partition by  VOU_CODE) DR_BAL_ORIG_cnt,

and

  count(decode(CR_BAL_ORIG, null, null, 0, null, 1)) 
      over (partition by  VOU_CODE) CR_BAL_ORIG_cnt

Now, its time to execute the modified query and (1) compare the result set, (2) measure time taken, (3) and compare statistics with the original query.

Well, the result set was compared and was convincing. The enhanced query completes in less than 20 seconds while the original took just over 17 minutes. So, the new query is 173 times faster than the original one.

Lastly, let’s compare statistics.

Statistics of Original query:

SQL> set autotrace traceonly stat
SQL> @tuneme

661 rows selected.

Elapsed: 00:17:26.91

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
    2066944  consistent gets
     753812  physical reads
        116  redo size
      68506  bytes sent via SQL*Net to client
        842  bytes received via SQL*Net from client
         46  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
        661  rows processed

SQL> 

Statistics of Modified query:

SQL> set autotrace traceonly exp stat
SQL> @tunedquery

661 rows selected.

Elapsed: 00:00:17.35
Statistics
----------------------------------------------------------
          2  recursive calls
          0  db block gets
      22514  consistent gets
       2580  physical reads
          0  redo size
      68003  bytes sent via SQL*Net to client
        547  bytes received via SQL*Net from client
         46  SQL*Net roundtrips to/from client
          2  sorts (memory)
          0  sorts (disk)
        661  rows processed

SQL>

Instead of “2,066,944” consistent gets and “753,812” physical reads, it took merely “22,514” consistent gets and “2,580” physical reads. That’s 98% reduction in LIO’s and 99%+ reduction in PIO.

Again Analytics Functions is the winner over traditional query writing style.

Monday, November 03, 2008

How to read Data between different Characterset Databases - Part 2

Hazem Ameen has left a comment on my previous post, where he says he is able to read/write data over a database link between two databases having different charactersets.

We have your same environment and reading/writing over db link is fine. Oracle does character-set conversion between the 2 Arabic character-set as expected. Client NLS_LANG: America_America.ar8mswin1256 Server1: ar8mswin1256 Server 2: ar8iso8859p6 I can only suspect your client NLS_LANG Our version is 10.2.0.4 on Linux. Client is MS Windows Vista 10.2.0.4

reading/writing over db link is fine

What I would assert from this is, although you have databases with two different charactersets, your client characterset is SAME. This is the reason you are able to read/write over the db link. But, in our case, clients connecting to database A have same characterset as A, while clients connecting to database B have same characterset as B. Here's a test case with my client characterset set to "AR8MSWIN1256".

Create tables in both the databases:

Create database links in both databases and query data from other side:

In the above example, we were able to read data over db links meaning client characterset's are playing the vital role.

This means, all your client's characterset is set to "AR8MSWIN1256" for both the types of databases. This is the reason you are able to read/write over a db link.

I hope I was clear in my examples. Thanks for leaving your comment.

Regards

Regards Asif Momen