Saturday, August 02, 2008

Dynamic Sampling and Table Partitions

Dynamic Sampling was first introduced in Oracle 9i Release 2 and the purpose is to improve query performance by determining more accurate estimates for predicate selectivity and statistics for tables and indexes.

Dynamic sampling is to be used predominantly with un-analyzed tables. Before hard parsing a query Oracle determines if dynamic sampling would improve query performance. If so, then the optimizer issues recursive SQL statements to estimate the necessary statistics dynamically.

Here’s an example of dynamic sampling:

Operating System: Windows XP Database: Oracle 10g Release 2 (10.2.0.4)

  
SQL> create table t (a number);

Table created.

SQL> set autotrace traceonly exp
SQL> select * from t;

Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |     1 |    13 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS FULL| T    |     1 |    13 |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------

Note
-----
   - dynamic sampling used for this statement

SQL> set autotrace off

Its clear from the above Execution Plan that dynamic sampling was performed.

But it seems dynamic sampling is not partition friendly or rather dynamic sampling is partition unaware. What I mean is if you have a partitioned table and at least one partition is analyzed then dynamic sampling is NOT performed on any of the un-analyzed partitions. This is because Optimizer looks at columns like NUM_ROWS of DBA_TABLES view and considers that the table is analyzed.

Let us create a partitioned table with a local index.

  
SQL> create table part(
  2         sno     number,
  3         dt      date,
  4              remark     varchar2(30))
  5      partition by range(dt) (
  6      partition p1 values less than (to_date('01-02-2008', 'dd-mm-yyyy')),
  7      partition p2 values less than (to_date('01-03-2008', 'dd-mm-yyyy')),
  8      partition p3 values less than (to_date('01-04-2008', 'dd-mm-yyyy')),
  9      partition p4 values less than (to_date('01-05-2008', 'dd-mm-yyyy'))
 10     );

Table created.

SQL>
SQL> create index part_idx on part(dt, sno) local;

Index created.

SQL>

Now, populate all but the last partition with test data.

SQL> insert into part
  2      select level,
  3             to_date('01-01-2008', 'dd-mm-yyyy') + level - 1 ,
  4             'test'
  5        from dual
  6      connect by level <=90;

90 rows created.

SQL>
SQL> commit;

Commit complete.

SQL>

Let us now collect statistics for partitions P1, P2, and P3.

SQL> exec dbms_stats.gather_table_stats(user, tabname => 'part', -
>       partname => 'p1', estimate_percent => null, -
>       method_opt => 'for all columns size 1', cascade => true);

PL/SQL procedure successfully completed.

SQL>
SQL>
SQL> exec dbms_stats.gather_table_stats(user, tabname => 'part', -
>       partname => 'p2', estimate_percent => null, -
>       method_opt => 'for all columns size 1', cascade => true);

PL/SQL procedure successfully completed.

SQL>
SQL>
SQL> exec dbms_stats.gather_table_stats(user, tabname => 'part', -
>       partname => 'p3', estimate_percent => null, -
>       method_opt => 'for all columns size 1', cascade => true);

PL/SQL procedure successfully completed.

SQL>
SQL> select table_name, num_rows
  2    from user_tables
  3   where table_name ='PART';

TABLE_NAME                       NUM_ROWS
------------------------------ ----------
PART                                   90

SQL>
SQL>
SQL> select partition_name, num_rows
  2    from user_tab_partitions
  3   where table_name = 'PART';

PARTITION_NAME                   NUM_ROWS
------------------------------ ----------
P1                                     31
P2                                     29
P3                                     30
P4

SQL>
SQL>
SQL> select partition_name, num_rows
  2    from user_ind_partitions
  3   where index_name = 'PART_IDX';

PARTITION_NAME                   NUM_ROWS
------------------------------ ----------
P1                                     31
P2                                     29
P3                                     30
P4

SQL>

It’s time to test whether optimizer considers dynamic sampling or not.

SQL> set autotrace traceonly exp
SQL>
SQL> select *
  2    from part
  3   where dt = to_date('01-04-2008', 'dd-mm-yyyy')
  4     and sno = 1;

Execution Plan
----------------------------------------------------------
Plan hash value: 2633488982

-----------------------------------------------------------------------------------------------
| Id  | Operation              | Name | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
-----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |      |     1 |    16 |     2   (0)| 00:00:01 |       |       |
|   1 |  PARTITION RANGE SINGLE|      |     1 |    16 |     2   (0)| 00:00:01 |     4 |     4 |
|*  2 |   TABLE ACCESS FULL    | PART |     1 |    16 |     2   (0)| 00:00:01 |     4 |     4 |
-----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("DT"=TO_DATE(' 2008-04-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
              "SNO"=1)

SQL>
SQL> set autotrace off
SQL>

No! Optimizer thinks that it has enough information available at hand to with it rather than sampling dynamically.

Now, let us populate the unanalyzed partition with data and re-run the same query.

SQL> insert into part select 1, to_date('01-04-2008', 'dd-mm-yyyy'), 'test'
  2        from dual connect by level <=1000;

1000 rows created.

SQL>
SQL> commit;

Commit complete.

SQL>
SQL> set autotrace traceonly exp
SQL>
SQL>
SQL> select * from part
  2   where dt = to_date('01-04-2008', 'dd-mm-yyyy')
  3     and sno = 1;

Execution Plan
----------------------------------------------------------
Plan hash value: 2633488982

-----------------------------------------------------------------------------------------------
| Id  | Operation              | Name | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
-----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |      |     1 |    16 |     2   (0)| 00:00:01 |       |       |
|   1 |  PARTITION RANGE SINGLE|      |     1 |    16 |     2   (0)| 00:00:01 |     4 |     4 |
|*  2 |   TABLE ACCESS FULL    | PART |     1 |    16 |     2   (0)| 00:00:01 |     4 |     4 |
-----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("DT"=TO_DATE(' 2008-04-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
              "SNO"=1)

SQL>
SQL> set autotrace off

Well, the optimizer took the same plan because statistics were not up to date. Consider gathering statistics on the partition in concern and execute the same query again.

 
SQL> exec dbms_stats.gather_table_stats(user, tabname => 'part', -
>       partname => 'p4', estimate_percent => null, -
>       method_opt => 'for all columns size 1', cascade => true);

PL/SQL procedure successfully completed.

SQL>
SQL> set autotrace traceonly exp
SQL>
SQL>
SQL> select * from part
  2   where dt = to_date('01-04-2008', 'dd-mm-yyyy')
  3     and sno = 1;

Execution Plan
----------------------------------------------------------
Plan hash value: 2633488982

-----------------------------------------------------------------------------------------------
| Id  | Operation              | Name | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
-----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |      |  1000 | 16000 |     3   (0)| 00:00:01 |       |       |
|   1 |  PARTITION RANGE SINGLE|      |  1000 | 16000 |     3   (0)| 00:00:01 |     4 |     4 |
|*  2 |   TABLE ACCESS FULL    | PART |  1000 | 16000 |     3   (0)| 00:00:01 |     4 |     4 |
-----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("DT"=TO_DATE(' 2008-04-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
              "SNO"=1)

SQL>
SQL> set autotrace off
SQL>

Well this time optimizer had updated statistics which gives a better and clear picture to pick the most appropriate access path. So, optimizer thinks “TABLE ACCESS FULL” on partition P4 is appropriate and which makes sense too.

I have purposely populated partition P4 with duplicate data so that we see a difference in optimizer access path.

Finally, the conclusions are:

• Dynamic sampling ignores partition level statistical information. • Always update statistics after data load process, or assign default statistics to unanalyzed partition. Happy reading.

2 comments:

Govind said...

It is good article to read. One good question. How are you displaying SQL screen print in the blog? I am regular reader of your blog...

Asif Momen said...

Glad that you found my post useful.

Have a look at: http://momendba.blogspot.com/2008/09/syntax-highlighter.html