Wednesday, March 14, 2007

"TCP Socket (KGAS)" Wait Event

We recently had this issue in our organization. Suddenly our Oracle Database (Oracle Database 10g Enterprise Edition Release on AIX 5.3) was experiencing waits on "TCP Socket (KGAS)" event. Upon investigation, we found out that it was because the mail server was down and the procedure tries to open a new connection with the SMTP server and it takes 75 seconds to timeout on IBM-AIX. (On Windows 2003, the default timeout is 20 seconds.)

There is no way of controlling timeout from within Oracle. When you pass the timeout parameter while opening connection with the SMTP Server, it doesn’t mean that it will timeout after the specified time rather it means that the subsequent Read/Write operations will timeout.

mail_conn := Utl_Smtp.open_connection(mailhost,MAILPORT, 10);

From above, it will take 10 seconds timeout time for the Read/Write operations after establishing the connection.

The only way to reduce the timeout from 75 seconds is to set TCP Keepalive parameters at OS level. It’s nice that we can still control it but at an expense of affecting all OS-level TCP connection.

Following parameters can be used to control the timeout in IBM-AIX:

1) net.ipv4.tcp_keepalive_time (seconds)

2) net.ipv4.tcp_keepalive_intvl (seconds)

3) net.ipv4.tcp_keepalive_probes

Be careful when setting the above parameters, as this will have a Global affect. Every new connection will be affected with the new timeout time. Now we are able to:

1) Control timeout while opening connection with the SMTP server using OS-level parameters, and

2) Pass a timeout in seconds for the subsequent Read/Write operations.

Are we done with our job? No, not yet!

What if the mail server goes down soon after establishing a connection and before starting the Read/Write operation, I mean during the handshake and Email verification procedures?

What happens is that, the sessions keep waiting forever for the SMTP server’s reply. Unfortunately, you have to locate and kill them.

We haven’t tried this ourselves but I have learned that using third party Java tool we can control the entire timeout issue.

Following is the procedure we use to send emails:

Procedure Test_Mail(
sender IN VARCHAR2, -- Mail Sender's Name
recipient IN VARCHAR2, -- Primary Mail ID of the Recipient
P_MESSAGE IN LONG, -- Any mail message
mailhost IN VARCHAR2, -- Mail host (IP Address)
MAILPORT IN NUMBER -- Port Number of the Mail Host
mail_conn Utl_Smtp.connection;
v_mail_reply Utl_Smtp.reply;

mail_conn := Utl_Smtp.open_connection(mailhost,MAILPORT, 10); -- Timeout after 20 seconds

Utl_Smtp.helo(mail_conn, mailhost);
Utl_Smtp.mail(mail_conn, sender);
v_mail_reply := utl_smtp.vrfy(mail_conn, recipient);

MAIL_REPLY_CODE := v_mail_reply.code ;
IF v_mail_reply.code <> 550 THEN
Utl_Smtp.rcpt(mail_conn, recipient);
utl_smtp.write_data( mail_conn, 'MIME-version: 1.0' utl_tcp.CRLF);
utl_smtp.write_data( mail_conn, 'Content-TYPE: text/plain; charset=Arabic(Windows)' utl_tcp.CRLF);
utl_smtp.write_data( mail_conn, 'Content-Transfer-Encoding: 8bit' utl_tcp.CRLF);
utl_smtp.write_raw_data(mail_conn, utl_raw.cast_to_raw(mesgUTL_TCP.CRLF));
Proc_Execlog(' Error in Test_mail :' SQLERRM);


Profiling your PL/SQL code with DBMS_PROFILER

The PL/SQL profiler provides information about PL/SQL code with regard to CPU usage and other resource usage information. When there is a noticeable gap between user elapsed time and SQL processing elapsed time, and there is PL/SQL code involved, the PL/SQL Profilerbecomes a very useful tool. The Profiler helps in identifying the lines of PL/SQL code which are taking longer to process. 

The profiler report consists of 
1) Top Ten profiled source lines in terms of Total Time
2) Total Time taken to process each line of your PL/SQL code.
3) Number of times each line executed in your PL/SQL code. 

Installation Procedure: 
1) Connect as SYS into SQL*Plus, and execute the below command to create the package (DBMS_PROFILER): 

SQL> @ORACLE_HOME/rdbms/admin/profload.sql; 

2) Once DBMS_PROFILER is installed, connect as application user into SQL*Plus, and create the following repository tables:
SQL> @ORACLE_HOME/rdbms/admin/proftab.sql; 

 3) Generating a report 
 The script profiler.sql (available on Metalink, Doc Id: 243755) generates a comprehensive HTML report on the performance data extracted by the DBMS_PROFILER package. 

Useful Procedures in DBMS_PROFILER 
 In Oracle 10g, the DBMS_PROFILER has more than 15 procedures/functions, but we will restrict to the following: .

Using DBMS_Profiler: To profile a PL/SQL Library (package, procedure, function or trigger), include in its body the two calls to actually start, and complete the profiling. Use the example below on any PL/SQL Library to profile. 

    DBMS_PROFILER.START_PROFILER('any comment to identify this execution'); 
    ... /* Your PL/SQL Code */ ... 

Example Usage of DBMS_PROFILER 
Create the following test procedure, execute the procedure and run the prolifer.sql script to produce the report.

create or replace procedure test_proc is 
    u_cnt number; 
    a_cnt number; 
    run_id number; 
    a number; 
    b number; 
    run_id := dbms_profiler.start_profiler(to_char(sysdate,'DD-MM-YYYY HH24:MI:SS')); 
    dbms_output.put_line(' RUN Id: ' run_id); 
    select count(*) into u_cnt from user_objects; 
    select count(*) into a_cnt from all_objects; 
    for i in 1..10000 loop 
        a := i; 
        b := a; 
    end loop; 
    dbms_output.put_line('User count: ' u_cnt); 
    dbms_output.put_line('All count: ' a_cnt); 

SQL> Exec Test_Proc;
SQL> c:\profiler\profiler.sql

Below is an excerpt from the report for the above procedure: