After reading Michelle Deng’s “Tuning by Cardinality Feedback” paper on
Jonathan's blog, I thought of re-creating the whole scenario to actually feel what is going on.
I did this test on Oracle 10g Release 2 (10.2.0.3) database with a block size of 16K. We first create the table and populate it in such a way that we match exactly with Michelle’s data.
SQL> select * from v$version;
BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - 64bi
PL/SQL Release 10.2.0.3.0 - Production
CORE 10.2.0.3.0 Production
TNS for IBM/AIX RISC System/6000: Version 10.2.0.3.0 - Productio
NLSRTL Version 10.2.0.3.0 - Production
Elapsed: 00:00:00.03
SQL>
SQL> show parameter db_block_size
NAME TYPE VALUE
------------------------------------ ----------- ----------
db_block_size integer 16384
SQL>
I performed couple of tests with different data distributions and here’s the first test case:
Test Case 1
Create and Populate Data
drop table prod_fctr_mv purge;
create table prod_fctr_mv(
extract_mkt_ssk varchar2(5),
mkt_ssk varchar2(20),
mkt_id number(16),
hier_type varchar2(20),
txt varchar2(500));
insert into prod_fctr_mv
select '00006', '00006', 3, 'CORP',
rpad('*', 285, '*')
from dual
connect by level <= 42;
insert into prod_fctr_mv
select
lpad(mod(level, 100), 5, '0') extract_mkt_ssk ,
lpad(mod(level, 73), 5, '0') mkt_ssk ,
mod(level, 91) mkt_id ,
decode(mod(level, 5), 0, 'CORP', 1, 'CALL', 2, 'CORP',
3, 'WALL', 'CORP') hier_type ,
rpad('*', 285, '*')
from dual
connect by level <= 4965 - 42 ;
update prod_fctr_mv set mkt_ssk = null
where mkt_ssk <> '00006'
and rownum <= 296;
update (select mkt_id
from prod_fctr_mv
where mkt_id <> 3)
set mkt_id = 3
where rownum <= 72;
update (select extract_mkt_ssk
from prod_fctr_mv
where extract_mkt_ssk <> '00006'
and mkt_id <> 3)
set extract_mkt_ssk = '00006'
where rownum <= 77;
select count(*)
from prod_fctr_mv
where mkt_ssk = '00006' ;
update (select mkt_ssk
from prod_fctr_mv
where mkt_ssk <> '00006'
and mkt_ssk is not null )
set mkt_ssk = '00006'
where rownum <= 59;
update (select hier_type
from prod_fctr_mv
where hier_type = 'CORP'
and mkt_id <> 3
and extract_mkt_ssk <> '00006')
set hier_type = decode(mod(rownum, 2), 0, 'TALL', 'BALL')
where rownum <= 554;
commit;
Verify Data
Now that the data is populated, let’s see did we get on the right track.
SQL> exec dbms_stats.gather_table_stats(user, 'prod_fctr_mv');
PL/SQL procedure successfully completed.
Elapsed: 00:00:01.42
SQL>
SQL> select table_name, num_rows, blocks, empty_blocks, avg_space,
2 chain_cnt, avg_row_len
3 from user_tables
4 where table_name = 'PROD_FCTR_MV';
TABLE_NAME NUM_ROWS BLOCKS EMPTY_BLOCKS AVG_SPACE CHAIN_CNT AVG_ROW_LEN
------------------------------ ---------- ---------- ------------ ---------- ---------- -----------
PROD_FCTR_MV 4965 108 0 0 0 305
Elapsed: 00:00:00.01
SQL>
SQL>
SQL> select column_name, num_distinct, density, num_buckets,
2 num_nulls, histogram
3 from user_tab_columns
4 where table_name = 'PROD_FCTR_MV';
COLUMN_NAME NUM_DISTINCT DENSITY NUM_BUCKETS NUM_NULLS HISTOGRAM
------------------------------ ------------ ---------- ----------- ---------- ---------------
TXT 1 1 1 0 NONE
HIER_TYPE 5 .000100705 5 0 FREQUENCY
MKT_ID 91 .000100705 91 0 FREQUENCY
MKT_SSK 73 .000107089 73 296 FREQUENCY
EXTRACT_MKT_SSK 100 .000100705 100 0 FREQUENCY
Elapsed: 00:00:00.07
SQL>
SQL>
SQL> select count(decode(mkt_id, 3, 1)) mkt_id,
2 count(decode(extract_mkt_ssk, '00006', 1)) extract_mkt_ssk,
3 count(decode(mkt_ssk, '00006', 1)) mkt_ssk,
4 count(decode(hier_type, 'CORP', 1)) hier_type
5 from prod_fctr_mv ;
MKT_ID EXTRACT_MKT_SSK MKT_SSK HIER_TYPE
---------- --------------- ---------- ----------
169 169 169 2441
Elapsed: 00:00:00.01
SQL>
SQL> select count(*)
2 from prod_fctr_mv
3 where mkt_id = 3
4 and extract_mkt_ssk = '00006'
5 and mkt_ssk = '00006'
6 and hier_type = 'CORP';
COUNT(*)
----------
42
Elapsed: 00:00:00.03
SQL>
SQL>
Search an optimal value for OPTIMIZER_DYNAMIC_SAMPLING parameter
Yea, the total number of rows, the count for the key columns, count for the query in question, and the statistics, they all match to what was presented by Michelle.
The real fun starts here:
SQL> set autotrace traceonly exp
SQL>
SQL> select count(*)
2 from prod_fctr_mv
3 where mkt_id = 3
4 and extract_mkt_ssk = '00006'
5 and mkt_ssk = '00006'
6 and hier_type = 'CORP';
Elapsed: 00:00:00.00
Execution Plan
----------------------------------------------------------
Plan hash value: 2899207340
-----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 20 | 42 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 20 | | |
|* 2 | TABLE ACCESS FULL| PROD_FCTR_MV | 1 | 20 | 42 (0)| 00:00:01 |
-----------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("MKT_ID"=3 AND "EXTRACT_MKT_SSK"='00006' AND
"MKT_SSK"='00006' AND "HIER_TYPE"='CORP')
Delete the statistics and run the same query:
SQL> exec dbms_stats.delete_table_stats(user, 'prod_fctr_mv');
PL/SQL procedure successfully completed.
Elapsed: 00:00:00.39
SQL>
SQL> select count(*)
2 from prod_fctr_mv
3 where mkt_id = 3
4 and extract_mkt_ssk = '00006'
5 and mkt_ssk = '00006'
6 and hier_type = 'CORP';
Elapsed: 00:00:00.00
Execution Plan
----------------------------------------------------------
Plan hash value: 2899207340
-----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 41 | 42 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 41 | | |
|* 2 | TABLE ACCESS FULL| PROD_FCTR_MV | 72 | 2952 | 42 (0)| 00:00:01 |
-----------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("MKT_ID"=3 AND "EXTRACT_MKT_SSK"='00006' AND
"MKT_SSK"='00006' AND "HIER_TYPE"='CORP')
Note
-----
- dynamic sampling used for this statement
SQL>
Optimizer has considered dynamic sampling but still the number of rows reported is now “72”, which is nearly twice than the actual number of rows. Let’s gradually increase the value of OPTIMIZER_DYNAMIC_SAMPLING initialization parameter and check the plan.
SQL> alter session set optimizer_dynamic_sampling=3;
Session altered.
Elapsed: 00:00:00.00
SQL> @t
Elapsed: 00:00:00.00
Execution Plan
----------------------------------------------------------
Plan hash value: 2899207340
-----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 41 | 42 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 41 | | |
|* 2 | TABLE ACCESS FULL| PROD_FCTR_MV | 72 | 2952 | 42 (0)| 00:00:01 |
-----------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("MKT_ID"=3 AND "EXTRACT_MKT_SSK"='00006' AND
"MKT_SSK"='00006' AND "HIER_TYPE"='CORP')
Note
-----
- dynamic sampling used for this statement
SQL> alter session set optimizer_dynamic_sampling=4;
Session altered.
Elapsed: 00:00:00.00
SQL> @t
Elapsed: 00:00:00.00
Execution Plan
----------------------------------------------------------
Plan hash value: 2899207340
-----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 41 | 42 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 41 | | |
|* 2 | TABLE ACCESS FULL| PROD_FCTR_MV | 72 | 2952 | 42 (0)| 00:00:01 |
-----------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("MKT_ID"=3 AND "EXTRACT_MKT_SSK"='00006' AND
"MKT_SSK"='00006' AND "HIER_TYPE"='CORP')
Note
-----
- dynamic sampling used for this statement
SQL> alter session set optimizer_dynamic_sampling=5;
Session altered.
Elapsed: 00:00:00.00
SQL> @t
Elapsed: 00:00:00.00
Execution Plan
----------------------------------------------------------
Plan hash value: 2899207340
-----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 41 | 42 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 41 | | |
|* 2 | TABLE ACCESS FULL| PROD_FCTR_MV | 72 | 2952 | 42 (0)| 00:00:01 |
-----------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("MKT_ID"=3 AND "EXTRACT_MKT_SSK"='00006' AND
"MKT_SSK"='00006' AND "HIER_TYPE"='CORP')
Note
-----
- dynamic sampling used for this statement
SQL> alter session set optimizer_dynamic_sampling=6;
Session altered.
Elapsed: 00:00:00.01
SQL> @t
Elapsed: 00:00:00.00
Execution Plan
----------------------------------------------------------
Plan hash value: 2899207340
-----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 41 | 42 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 41 | | |
|* 2 | TABLE ACCESS FULL| PROD_FCTR_MV | 42 | 1722 | 42 (0)| 00:00:01 |
-----------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("MKT_ID"=3 AND "EXTRACT_MKT_SSK"='00006' AND
"MKT_SSK"='00006' AND "HIER_TYPE"='CORP')
Note
-----
- dynamic sampling used for this statement
SQL>
SQL> set autotrace off
SQL>
Finally, when OPTIMIZER_DYNAMIC_SAMPLING parameter reached to a value of “6”, the plan is now reporting correct cardinality.
Test Case 2
Let’s run through the entire test again with a slight modification. Instead of inserting all the 42 rows together, we will insert them scattered in between.
Create and Populate Data
In this test case, we will insert one row of our interest after every 100 rows. So, here goes the script:
drop table prod_fctr_mv purge;
create table prod_fctr_mv(
extract_mkt_ssk varchar2(5),
mkt_ssk varchar2(20),
mkt_id number(16),
hier_type varchar2(20),
txt varchar2(500));
begin
for i in 1..42 loop
insert into prod_fctr_mv
select '00006', '00006', 3, 'CORP',
rpad('*', 285, '*')
from dual
connect by level <= 1;
insert into prod_fctr_mv
select
lpad(mod(level, 100), 5, '0') extract_mkt_ssk ,
lpad(mod(level, 73), 5, '0') mkt_ssk ,
mod(level, 91) mkt_id ,
decode(mod(level, 5), 0, 'CORP', 1, 'CALL', 2, 'CORP',
3, 'WALL', 'CORP') hier_type ,
rpad('*', 285, '*')
from dual
connect by level <= 100 ;
end loop;
commit;
end;
/
select count(*)
from prod_fctr_mv;
insert into prod_fctr_mv
select
lpad(mod(level, 100), 5, '0') extract_mkt_ssk ,
lpad(mod(level, 73), 5, '0') mkt_ssk ,
mod(level, 91) mkt_id ,
decode(mod(level, 5), 0, 'CORP', 1, 'CALL', 2, 'CORP',
3, 'WALL', 'CORP') hier_type ,
rpad('*', 285, '*')
from dual
connect by level <= 723 ;
update prod_fctr_mv set mkt_ssk = null
where mkt_ssk <> '00006'
and rownum <= 296;
update (select mkt_id
from prod_fctr_mv
where mkt_id <> 3
and extract_mkt_ssk <> '00006')
set mkt_id = 3
where rownum <= 35;
update (select extract_mkt_ssk
from prod_fctr_mv
where extract_mkt_ssk <> '00006'
and mkt_id <> 3)
set extract_mkt_ssk = '00006'
where rownum <= 77;
update (select mkt_ssk
from prod_fctr_mv
where mkt_ssk <> '00006'
and mkt_ssk is not null )
set mkt_ssk = '00006'
where rownum <= 33;
update (select hier_type
from prod_fctr_mv
where hier_type = 'CORP'
and mkt_id <> 3
and extract_mkt_ssk <> '00006')
set hier_type = decode(mod(rownum, 2), 0, 'TALL', 'BALL')
where rownum <= 554;
commit;
Verify Data
Let us run through the same queries to verify data population.
SQL> exec dbms_stats.gather_table_stats(user, 'prod_fctr_mv');
PL/SQL procedure successfully completed.
Elapsed: 00:00:01.09
SQL> select table_name, num_rows, blocks, empty_blocks, avg_space,
2 chain_cnt, avg_row_len
3 from user_tables
4 where table_name = 'PROD_FCTR_MV';
TABLE_NAME NUM_ROWS BLOCKS EMPTY_BLOCKS AVG_SPACE CHAIN_CNT AVG_ROW_LEN
------------------------------ ---------- ---------- ------------ ---------- ---------- -----------
PROD_FCTR_MV 4965 108 0 0 0 305
Elapsed: 00:00:00.01
SQL>
SQL>
SQL> select column_name, num_distinct, density, num_buckets,
2 num_nulls, histogram
3 from user_tab_columns
4 where table_name = 'PROD_FCTR_MV';
COLUMN_NAME NUM_DISTINCT DENSITY NUM_BUCKETS NUM_NULLS HISTOGRAM
------------------------------ ------------ ---------- ----------- ---------- ---------------
TXT 1 1 1 0 NONE
HIER_TYPE 5 .000100705 5 0 FREQUENCY
MKT_ID 91 .000100705 91 0 FREQUENCY
MKT_SSK 73 .000107089 73 296 FREQUENCY
EXTRACT_MKT_SSK 100 .000100705 100 0 FREQUENCY
Elapsed: 00:00:00.03
SQL>
SQL> select count(decode(mkt_id, 3, 1)) mkt_id,
2 count(decode(extract_mkt_ssk, '00006', 1)) extract_mkt_ssk,
3 count(decode(mkt_ssk, '00006', 1)) mkt_ssk,
4 count(decode(hier_type, 'CORP', 1)) hier_type
5 from prod_fctr_mv ;
MKT_ID EXTRACT_MKT_SSK MKT_SSK HIER_TYPE
---------- --------------- ---------- ----------
169 169 169 2441
Elapsed: 00:00:00.01
SQL>
SQL> select count(*)
2 from prod_fctr_mv
3 where mkt_id = 3
4 and extract_mkt_ssk = '00006'
5 and mkt_ssk = '00006'
6 and hier_type = 'CORP';
COUNT(*)
----------
42
Elapsed: 00:00:00.01
SQL>
Search an optimal value for OPTIMIZER_DYNAMIC_SAMPLING parameter
We managed to get the number of records and other statistics same as that of the original document. Now, we will hunt for the optimal value of OPTIMIZER_DYNAMIC_SAMPLING parameter.
SQL> set autotrace traceonly exp
SQL>
SQL> @t
Elapsed: 00:00:00.00
Execution Plan
----------------------------------------------------------
Plan hash value: 2899207340
-----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 20 | 42 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 20 | | |
|* 2 | TABLE ACCESS FULL| PROD_FCTR_MV | 1 | 20 | 42 (0)| 00:00:01 |
-----------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("MKT_ID"=3 AND "EXTRACT_MKT_SSK"='00006' AND
"MKT_SSK"='00006' AND "HIER_TYPE"='CORP')
SQL> exec dbms_stats.delete_table_stats(user, 'prod_fctr_mv');
PL/SQL procedure successfully completed.
Elapsed: 00:00:00.25
SQL>
SQL> @t
Elapsed: 00:00:00.00
Execution Plan
----------------------------------------------------------
Plan hash value: 2899207340
-----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 41 | 42 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 41 | | |
|* 2 | TABLE ACCESS FULL| PROD_FCTR_MV | 36 | 1476 | 42 (0)| 00:00:01 |
-----------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("MKT_ID"=3 AND "EXTRACT_MKT_SSK"='00006' AND
"MKT_SSK"='00006' AND "HIER_TYPE"='CORP')
Note
-----
- dynamic sampling used for this statement
SQL>
We get a better cardinality reported by the Optimizer at the default level of dynamic sampling (OPTIMIZER_DYNAMIC_SAMPLING=2).
Conclusion
In the above test cases the Oracle optimizer started displaying correct number of rows when OPTIMIZER_DYNAMIC_SAMPLING was set to 6 with interested data being inserted together while the default setting for OPTIMIZER_DYNAMIC_SAMPLING worked when data was almost equally distributed.
Michelle’s data distribution might be such that the Optimizer was able report correct number of rows at OPTIMIZER_DYNAMIC_SAMPLING=4.
Lessons learned include:
1) Column correlation could really mislead Optimizer, and
2) Data distribution plays equally critical role.
No comments:
Post a Comment