Saturday, February 07, 2009

Working with DBMS_CRYPTO Package

Oracle Database 10g provides means to encrypt and decrypt your sensitive data using the built-in API called DBMS_CRYPTO. Using DBMS_CRYPTO API one can encrypt sensitive information like SSN, Credit Card Numbers, Debit Card Numbers, etc stored in the database as these are not supposed to be stored in plain text for security reasons. The DBMS_CRYPTO toolkit is easy to use and is intended to replace DBMS_OBFUSCATION_TOOLKIT which was introduced in 8.1.6.

The concept behind any encryption toolkit is to have two things: 1) An algorithm, the actual logic, and 2) The encryption key

Using these combination's we can safely encrypt our data.

In this blog entry, let us see how DBMS_CRYPTO can be used to encrypt data and also consider its performance implications.

To start with, lets create a test table and insert couple of records in plain text:

SQL> select * from v$version;

BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - Prod
PL/SQL Release 10.2.0.4.0 - Production
CORE    10.2.0.4.0      Production
TNS for 32-bit Windows: Version 10.2.0.4.0 - Production
NLSRTL Version 10.2.0.4.0 - Production

SQL>
SQL> create table emp(empid number, ename varchar2(100), ssn varchar2(80));

Table created.

SQL> insert into emp values (1, 'SMITH',        123456701);

1 row created.

SQL> insert into emp values (2, 'ALLEN',        123456702);

1 row created.

SQL> insert into emp values (3, 'WARD',         123456703);

1 row created.

SQL> commit;
SQL> set line 10000
SQL> column ename format a30
SQL> column ssn format a40
SQL> select * from emp;

     EMPID ENAME                          SSN
---------- ------------------------------ -------------
         1 SMITH                          123456701
         2 ALLEN                          123456702
         3 WARD                           123456703

SQL>

The SSN has been inserted into the table in plain text and can be read by anybody having SELECT privilege on EMP table. Now, lets create a package using DBMS_CRYPTO API to handle encryption and decryption.

SQL> Create or replace package cryptit is
  2    Function encrypt_data( p_data IN VARCHAR2 ) Return RAW DETERMINISTIC;
  3    Function decrypt_data( p_data IN RAW ) Return VARCHAR2 DETERMINISTIC;
  4  End cryptit;
  5  /

Package created.

SQL>
SQL> Create or replace package body cryptit is
  2    V_Key       RAW(128) := UTL_RAW.cast_to_raw('testkey1');          -- Key
  3
  4    Function encrypt_data( p_data IN VARCHAR2 ) Return RAW DETERMINISTIC
  5    IS
  6      l_data RAW(2048) := utl_raw.cast_to_raw(p_data);
  7      l_encrypted RAW(2048);
  8    BEGIN
  9      NULL;
 10      l_encrypted := dbms_crypto.encrypt                        -- Algorithm
 11                     ( src => l_data,
 12                       typ => DBMS_CRYPTO.DES_CBC_PKCS5,
 13                       key => V_KEY );
 14
 15      Return l_encrypted;
 16    END encrypt_data;
 17
 18    Function decrypt_data( p_data IN RAW ) Return VARCHAR2 DETERMINISTIC
 19    IS
 20      l_decrypted RAW(2048);
 21    BEGIN
 22      l_decrypted := dbms_crypto.decrypt                              -- Algorithm
 23                      ( src => p_data,
 24                        typ => DBMS_CRYPTO.DES_CBC_PKCS5,
 25                        key => V_KEY );
 26
 27      Return utl_raw.cast_to_varchar2(l_decrypted);
 28    END decrypt_data;
 29  End cryptit;
 30  /

Package body created.

SQL>

The package body consists of functions to encrypt and decrypt data along with a key. As these functions will always output the same value for a specific input, we can make them Deterministic functions. (A Deterministic Function always returns the same result any time they are called with a specific set of input values.)

Let's now make use of these functions to encrypt our sensitive data.

SQL> update emp set ssn = cryptit.encrypt_data(ssn);

3 rows updated.

SQL> select * from emp;

     EMPID ENAME                          SSN
---------- ------------------------------ ----------------------------------
         1 SMITH                          5F3168C22E54060DE7D97B31F7E38BB6
         2 ALLEN                          5F3168C22E54060D0C2527FBBD3DCD6C
         3 WARD                           5F3168C22E54060D8166D3757932A112

SQL>
SQL>

When inserting new records we simply make use of "cryptit.encrypt_data" function in the INSERT statement.

SQL> insert into emp values( 4, 'MOMEN', cryptit.encrypt_data(123456704));

1 row created.

SQL> commit;

Commit complete.

SQL> select * from emp;

     EMPID ENAME                          SSN
---------- ------------------------------ ----------------------------------------
         1 SMITH                          5F3168C22E54060DE7D97B31F7E38BB6
         2 ALLEN                          5F3168C22E54060D0C2527FBBD3DCD6C
         3 WARD                           5F3168C22E54060D8166D3757932A112
         4 MOMEN                          5F3168C22E54060D2CFF7E7A35B14187

SQL>

SQL> column decrypted_ssn format a15
SQL> select empid, ename, cryptit.decrypt_data(ssn) decrypted_ssn, ssn from emp;

     EMPID ENAME                          DECRYPTED_SSN   SSN
---------- ------------------------------ --------------- ----------------------------------------
         1 SMITH                          123456701       5F3168C22E54060DE7D97B31F7E38BB6
         2 ALLEN                          123456702       5F3168C22E54060D0C2527FBBD3DCD6C
         3 WARD                           123456703       5F3168C22E54060D8166D3757932A112
         4 MOMEN                          123456704       5F3168C22E54060D2CFF7E7A35B14187

SQL>

All is fine as far as encryption is concerned, but lets see what happens when SSN is used in the WHERE clause of queries. We will create an index on SSN and run a query against it.

SQL> create index t_ssn on emp(ssn);

Index created.

SQL> set autotrace on exp
SQL> select * from emp where ssn = cryptit.encrypt_data('123456701');

     EMPID ENAME                          SSN
---------- ------------------------------ ----------------------------------------
         1 SMITH                          5F3168C22E54060DE7D97B31F7E38BB6


Execution Plan
----------------------------------------------------------
Plan hash value: 2894032564

-------------------------------------------------------------------------------------
| Id  | Operation                   | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |       |     1 |   107 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| EMP   |     1 |   107 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | T_SSN |     1 |       |     1   (0)| 00:00:01 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("SSN"=RAWTOHEX("CRYPTIT"."ENCRYPT_DATA"('123456701')))

Note
-----
   - dynamic sampling used for this statement

SQL>

Well, the index "T_SSN" has been selected by the optimizer to be cost efficient. When a range of values is to be scanned, optimizer falls flat on its face as it has no idea and picks up FTS.

SQL> select * from emp where cryptit.decrypt_data(ssn) between '123456702' and '123456704';

     EMPID ENAME                          SSN
---------- ------------------------------ ----------------------------------------
         2 ALLEN                          5F3168C22E54060D0C2527FBBD3DCD6C
         3 WARD                           5F3168C22E54060D8166D3757932A112
         4 MOMEN                          5F3168C22E54060D2CFF7E7A35B14187


Execution Plan
----------------------------------------------------------
Plan hash value: 3956160932

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |     1 |   107 |     3   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| EMP  |     1 |   107 |     3   (0)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("CRYPTIT"."DECRYPT_DATA"(HEXTORAW("SSN"))>='123456702'
              AND "CRYPTIT"."DECRYPT_DATA"(HEXTORAW("SSN"))<='123456704')

Note
-----
   - dynamic sampling used for this statement

We can also create Function Based Indexes on encrypted columns such that data is accessed faster with a better execution plan. This is the reason I have marked both ENCRYPT_DATA and DECRYPT_DATA as DETERMINISTIC functions.

SQL> create index f_ssn_idx on emp(cryptit.decrypt_data(ssn));

Index created.

SQL> select * from emp where cryptit.decrypt_data(ssn) between '123456702' and '123456704';

     EMPID ENAME                          SSN
---------- ------------------------------ ----------------------------------------
         2 ALLEN                          5F3168C22E54060D0C2527FBBD3DCD6C
         3 WARD                           5F3168C22E54060D8166D3757932A112
         4 MOMEN                          5F3168C22E54060D2CFF7E7A35B14187


Execution Plan
----------------------------------------------------------
Plan hash value: 9274740

-----------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |     1 |   107 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| EMP       |     1 |   107 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | F_SSN_IDX |     1 |       |     1   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("CRYPTIT"."DECRYPT_DATA"(HEXTORAW("SSN"))>='123456702' AND
              "CRYPTIT"."DECRYPT_DATA"(HEXTORAW("SSN"))<='123456704')

Note
-----
   - dynamic sampling used for this statement

SQL>

The optimizer has rightly picked up the Function based index to access requested data quickly.

DBMS_CRYPTO is a great way to encrypt sensitive data and we can also create indexes on the encrypted columns to speedup our queries. Lastly, remember to wrap the CRYPTIT package body so that the key is not exposed.

DBMS_CRYPTO can also be used to encrypted data recursively. Let us try to encrypt the already encrypted SSN from the above example.

SQL> select * from emp;

     EMPID ENAME    SSN
---------- -------- --------------------------------------------------------------------------------
         1 SMITH    5F3168C22E54060DE7D97B31F7E38BB6
         2 ALLEN    5F3168C22E54060D0C2527FBBD3DCD6C
         3 WARD     5F3168C22E54060D8166D3757932A112
         4 MOMEN    5F3168C22E54060D2CFF7E7A35B14187

SQL> update emp set ssn = cryptit.encrypt_data(ssn);

4 rows updated.

SQL> select * from emp;

     EMPID ENAME    SSN
---------- -------- --------------------------------------------------------------------------------
         1 SMITH    455E357D80B1C81F91D52088DA0FE03A0C133D129CFC3879EE7A8DE993FD459DB5C9E1EBEF37062C
         2 ALLEN    455E357D80B1C81F91D52088DA0FE03A98FC410A71CE64D57935691692D433B770C5F8AF07C9113F
         3 WARD     455E357D80B1C81F91D52088DA0FE03A9D32D14FE64F41DD66BE7FEF4C31384F032C16826EAA4830
         4 MOMEN    455E357D80B1C81F91D52088DA0FE03A04D5C0AD79E689EABD46F8EC335308E32D6F7E94167F6EDF

SQL> commit;

Commit complete.

SQL> select empid, ename, cryptit.decrypt_data(cryptit.decrypt_data(ssn)) decrypted_ssn, ssn from emp;

     EMPID ENAME    DECRYPTED_SSN   SSN
---------- -------- --------------- --------------------------------------------------------------------------------
         1 SMITH    123456701       455E357D80B1C81F91D52088DA0FE03A0C133D129CFC3879EE7A8DE993FD459DB5C9E1EBEF37062C
         2 ALLEN    123456702       455E357D80B1C81F91D52088DA0FE03A98FC410A71CE64D57935691692D433B770C5F8AF07C9113F
         3 WARD     123456703       455E357D80B1C81F91D52088DA0FE03A9D32D14FE64F41DD66BE7FEF4C31384F032C16826EAA4830
         4 MOMEN    123456704       455E357D80B1C81F91D52088DA0FE03A04D5C0AD79E689EABD46F8EC335308E32D6F7E94167F6EDF

SQL>
For information on DBMS_CRYPTO check Oracle documentation.

Sunday, February 01, 2009

Performance Impact of NOT NULL Constraint in PL/SQL

A NOT NULL constraint in PL/SQL requires that a variable contains value, if no value is specified for that variable then an error will occur. But, using this constraint in PL/SQL, we have to pay a small performance price.

Let us create two procedures and perform a small test, the first procedure (TEST_NOTNULL) will be contain "NOT NULL" constraint while the second procedure (TEST_NULL) will be without it.

SQL> select * from v$version;

BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - Prod
PL/SQL Release 10.2.0.4.0 - Production
CORE    10.2.0.4.0      Production
TNS for 32-bit Windows: Version 10.2.0.4.0 - Production
NLSRTL Version 10.2.0.4.0 - Production

SQL>
SQL> create or replace procedure test_notnull(p_iterations in number) is
  2    i  number;
  3    x  number NOT NULL := 0;
  4    t1 number;
  5  begin
  6    t1 := dbms_utility.get_time;
  7    for i in 1..p_iterations loop
  8      x := x + i;
  9    end loop;
 10    dbms_output.put_line( 'Time taken : ' || to_char(dbms_utility.get_time - t1));
 11  end;
 12  /

Procedure created.

SQL>
SQL>
SQL> create or replace procedure test_null(p_iterations in number) is
  2    i  number;
  3    x  number := 0;
  4    t1 number;
  5  begin
  6    t1 := dbms_utility.get_time;
  7    for i in 1..p_iterations loop
  8      x := x + i;
  9      if x is null then
 10        dbms_output.put_line('i know this will never happen, but still ... ');
 11      end if;
 12    end loop;
 13    dbms_output.put_line( 'Time taken : ' || to_char(dbms_utility.get_time - t1));
 14  end;
 15  /

Procedure created.

SQL>
SQL>
SQL>
SQL> set serveroutput on
SQL> exec perf_notnull(1000000);
Time taken : 20

PL/SQL procedure successfully completed.

SQL> exec perf_null(1000000);
Time taken : 17

PL/SQL procedure successfully completed.

SQL>

Executing these procedures, we notice that TEST_NOTNULL procedure consumes a little more than its counterpart TEST_NULL. Yes, we are looping heavily as the point is to compare performance of NOT NULL contraint in PL/SQL.

The reason the NOT NULL Constraint incurs performance cost is: As x is constriant by NOT NULL, the value of the expression "x + i" is first assigned to a temporary variable, this temporary variable is validated against nullity. If the variable is not null, its value is assigned to "x", otherwise an exception is raised. So, this is where the extra time is consumed when one applies NOT NULL constraint in PL/SQL.

An alternative approach is to manually test for nullity and raise an exception as we did in TEST_NULL procedure.