Sunday, February 20, 2011

Redo Log Switches at their Peak

One of our customers was having extreme performance issues with their Oracle database. During my investigations I found they had 215 redo log switches per hour. This means a 3.65 log switches per minute or a redo log switch every 16.7 seconds.

Although this was not the only reason of performance issues but this was certainly one of the causes. The redo log files were sized pretty small and the poor database was spending most of its time switch the redo log files.

Here’s a screenshot for you guys:

Below is the query I use to find redo log switches. This query lists redo log switches per hour for the last seven days.

column day format a10
column Switches_per_day format 9999
column 00 format 999
column 01 format 999
column 02 format 999
column 03 format 999
column 04 format 999
column 05 format 999
column 06 format 999
column 07 format 999
column 08 format 999
column 09 format 999
column 10 format 999
column 11 format 999
column 12 format 999
column 13 format 999
column 14 format 999
column 15 format 999
column 16 format 999
column 17 format 999
column 18 format 999
column 19 format 999
column 20 format 999
column 21 format 999
column 22 format 999
column 23 format 999

select to_char(first_time,'DD-MON') day,
sum(decode(to_char(first_time,'hh24'),'00',1,0)) "00",
sum(decode(to_char(first_time,'hh24'),'01',1,0)) "01",
sum(decode(to_char(first_time,'hh24'),'02',1,0)) "02",
sum(decode(to_char(first_time,'hh24'),'03',1,0)) "03",
sum(decode(to_char(first_time,'hh24'),'04',1,0)) "04",
sum(decode(to_char(first_time,'hh24'),'05',1,0)) "05",
sum(decode(to_char(first_time,'hh24'),'06',1,0)) "06",
sum(decode(to_char(first_time,'hh24'),'07',1,0)) "07",
sum(decode(to_char(first_time,'hh24'),'08',1,0)) "08",
sum(decode(to_char(first_time,'hh24'),'09',1,0)) "09",
sum(decode(to_char(first_time,'hh24'),'10',1,0)) "10",
sum(decode(to_char(first_time,'hh24'),'11',1,0)) "11",
sum(decode(to_char(first_time,'hh24'),'12',1,0)) "12",
sum(decode(to_char(first_time,'hh24'),'13',1,0)) "13",
sum(decode(to_char(first_time,'hh24'),'14',1,0)) "14",
sum(decode(to_char(first_time,'hh24'),'15',1,0)) "15",
sum(decode(to_char(first_time,'hh24'),'16',1,0)) "16",
sum(decode(to_char(first_time,'hh24'),'17',1,0)) "17",
sum(decode(to_char(first_time,'hh24'),'18',1,0)) "18",
sum(decode(to_char(first_time,'hh24'),'19',1,0)) "19",
sum(decode(to_char(first_time,'hh24'),'20',1,0)) "20",
sum(decode(to_char(first_time,'hh24'),'21',1,0)) "21",
sum(decode(to_char(first_time,'hh24'),'22',1,0)) "22",
sum(decode(to_char(first_time,'hh24'),'23',1,0)) "23",
count(to_char(first_time,'MM-DD')) Switches_per_day
from v$log_history
where trunc(first_time) between trunc(sysdate) - 6 and trunc(sysdate)
group by to_char(first_time,'DD-MON') 
order by to_char(first_time,'DD-MON') ;

Updated: The redo log files were sized at 25 MB. To relieve the database from this pain, I increased the size of the redo log files to 1 GB each.

Updated-2: Here is the post related to this.

7 comments:

  1. Just out of curiosity, could you please post the Top five wait events during this interval ?

    Thanks

    ReplyDelete
  2. As you mention you’re REDO size is 1GB now sure this will not make a log switch frequently but suppose my redo was got filled till 900MB and at this time my database was crashed then in this situation I will loss 900 MB of data So is it not to be on safe side to make REDO of 100 – 200 MB so we will face minimum loss of data

    Thanks & Regards,
    Sohail Siddiqui
    Oracle DBA
    Saudi Customs

    ReplyDelete
  3. @Amardeep

    I try to post the Top 5 Wait events ASAP.

    ReplyDelete
  4. @Sohail

    That's a good point. But do you think that the end users will be happy to loose 100-200 MB of data?

    Of course it is better to loose 100-200 MB compared to 1024 MB but it is still a data loss.

    This is the reason DBA's are recommended to multiplex control file and redo log files. If we loose one set then we have the other multiplexed set to recover from.

    Hope this helps and thanks for leaving your comment.

    ReplyDelete
  5. What caused the "sudden" increase in redo switches. A new application that became active?

    Seems to me the solution you have chosen is only attacking the symptom... and indeed increased the risk of loosing too much data.

    ReplyDelete
  6. @Anonymous
    Massive DELETE operations were responsible for sudden increase in the redo.

    Immediate solution was to increase the redo log file size and give some relief to the breathless database. Next step is to identify the cause and try to fix it.
    Massive DELETE's were converted to "ALTER TABLE xxx TRUNCATE PARTITION". This applied brake to the huge amount of redo generation. :)

    As far as the risk of losing too much data is concerned, the redo log files were multiplexed.

    ReplyDelete
  7. Regarding "As you mention you’re REDO size is 1GB now sure this will not make a log switch frequently but suppose my redo was got filled till 900MB and at this time my database was crashed then in this situation I will loss 900 MB of data So is it not to be on safe side to make REDO of 100 – 200 MB so we will face minimum loss of data"

    Here, Asif is talking about increasing the size of Redolog file size to 1GB not the redo log buffer.
    @Sohali :Are you talking about 900MB data in the redolog buffer or redolog file.If you are referring data in the redolog file, no need to worry about this as data is persistant(data in disk).
    Your comment make sense only when data is in redo buffer and crash happens.

    ReplyDelete