Monday, November 2, 2009

Recovery from missing or corrupted control file:

Case 1: A multiplexed copy of the control file is available.
--------------------------------------------------------------------

On startup Oracle must read the control file in order to find out where the datafiles and online logs are located. Oracle expects to find control files at locations specified in the CONTROL_FILE initialisation parameter. The instance will fail to mount the database if any one of the control files are missing or corrupt. A brief error message will be displayed, with further details recorded in the alert log. The exact error message will vary depending on what has gone wrong. Here's an example:

C:\>sqlplus /nolog

SQL*Plus: Release 10.2.0.1.0 - Production on Tue Nov 3 14:07:33 2009

Copyright (c) 1982, 2005, Oracle. All rights reserved.

SQL> connect / as sysdba;
Connected to an idle instance.
SQL> startup open;
ORACLE instance started.

Total System Global Area 1317011456 bytes
Fixed Size 1250740 bytes
Variable Size 973081164 bytes
Database Buffers 335544320 bytes
Redo Buffers 7135232 bytes
ORA-00205: error in identifying control file, check alert log for more info


On checking the alert log, as suggested, we find the following:

ALTER DATABASE MOUNT
Tue Nov 03 14:07:42 2009
ORA-00202: control file: 'D:\ORACLE\PRODUCT\10.2.0\ORADATA\EMSTS\CONTROL02.CTL'

Tue Nov 03 14:07:45 2009
ORA-205 signalled during: ALTER DATABASE MOUNT...

The above corruption was introduced by manually editing the control file when the database was closed.

The solution is simple, provided you have at least one uncorrupted control file - replace the corrupted control file with a copy using operating system commands. Remember to rename the copied file. The database should now start up without any problems.

SQL> shutdown immediate;
ORA-01507: database not mounted


ORACLE instance shut down.
SQL> connect / as sysdba;
Connected to an idle instance.
SQL> startup open;
ORACLE instance started.

Total System Global Area 1317011456 bytes
Fixed Size 1250740 bytes
Variable Size 973081164 bytes
Database Buffers 335544320 bytes
Redo Buffers 7135232 bytes
Database mounted.
Database opened.
SQL>


Case 2: All control files lost
------------------------------------

What if you lose all your control files? In that case you have no option but to use a backup control file. The recovery needs to be performed from within RMAN, and requires that all logs (archived and current online logs) since the last backup are available. The logs are required because all datafiles must also be restored from backup. The database will then have to be recovered up to the time the control files went missing. This can only be done if all intervening logs are available. Here's an annotated transcript of a recovery session (as usual, lines in bold are commands to be typed, lines in italics are explanatory comments, other lines are RMAN feedback):


C:\>sqlplus /nolog

SQL*Plus: Release 10.2.0.1.0 - Production on Tue Nov 3 14:39:34 2009

Copyright (c) 1982, 2005, Oracle. All rights reserved.

SQL> connect / as sysdba;
Connected to an idle instance.
SQL> startup open;
ORACLE instance started.

Total System Global Area 1317011456 bytes
Fixed Size 1250740 bytes
Variable Size 981469772 bytes
Database Buffers 327155712 bytes
Redo Buffers 7135232 bytes
ORA-00205: error in identifying control file, check alert log for more info


SQL> exit
Disconnected from Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Production
With the Partitioning, OLAP and Data Mining options


C:\>rman target /

Recovery Manager: Release 10.2.0.1.0 - Production on Tue Nov 3 14:41:46 2009

Copyright (c) 1982, 2005, Oracle. All rights reserved.

connected to target database: emsts (not mounted)

RMAN> set dbid 849865825

executing command: SET DBID

--restore controlfile from autobackup. The backup is not at the default
--location so the path must be specified

RMAN> restore controlfile from 'C:\OraBkup\emsts\C-849865825-20091102-01';

Starting restore at 03-NOV-09
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: sid=157 devtype=DISK

channel ORA_DISK_1: restoring control file
channel ORA_DISK_1: restore complete, elapsed time: 00:00:06
output filename=D:\ORACLE\PRODUCT\10.2.0\ORADATA\EMSTS\CONTROL01.CTL
output filename=D:\ORACLE\PRODUCT\10.2.0\ORADATA\EMSTS\CONTROL02.CTL
output filename=D:\ORACLE\PRODUCT\10.2.0\ORADATA\EMSTS\CONTROL03.CTL
Finished restore at 03-NOV-09

-- Now that control files have been restored, the instance can mount the
-- database.

RMAN> mount database;

database mounted
released channel: ORA_DISK_1

RMAN> restore database;

Starting restore at 03-NOV-09
Starting implicit crosscheck backup at 03-NOV-09
allocated channel: ORA_DISK_1
channel ORA_DISK_1: sid=156 devtype=DISK
Crosschecked 11 objects
Finished implicit crosscheck backup at 03-NOV-09

Starting implicit crosscheck copy at 03-NOV-09
using channel ORA_DISK_1
Finished implicit crosscheck copy at 03-NOV-09

searching for all files in the recovery area
cataloging files...
cataloging done

List of Cataloged Files
=======================
File Name: D:\ORACLE\PRODUCT\10.2.0\FLASH_RECOVERY_AREA\EMSTS\ARCHIVELOG\2009_11_03\O1_MF_1_4_5GZKBQMK_.ARC

using channel ORA_DISK_1

channel ORA_DISK_1: starting datafile backupset restore
channel ORA_DISK_1: specifying datafile(s) to restore from backup set
restoring datafile 00001 to D:\ORACLE\PRODUCT\10.2.0\ORADATA\EMSTS\SYSTEM01.DBF
restoring datafile 00002 to D:\ORACLE\PRODUCT\10.2.0\ORADATA\EMSTS\UNDOTBS01.DBF
restoring datafile 00003 to D:\ORACLE\PRODUCT\10.2.0\ORADATA\EMSTS\SYSAUX01.DBF
restoring datafile 00004 to D:\ORACLE\PRODUCT\10.2.0\ORADATA\EMSTS\USERS01.DBF
restoring datafile 00005 to D:\ORACLE\PRODUCT\10.2.0\ORADATA\EMSTS\EXAMPLE01.DBF
restoring datafile 00006 to D:\ORACLE\PRODUCT\10.2.0\ORADATA\EMSTS\ODDDAT01.DBF
restoring datafile 00007 to D:\ORACLE\PRODUCT\10.2.0\ORADATA\EMSTS\ODDTAB01.DBF
restoring datafile 00008 to D:\ORACLE\PRODUCT\10.2.0\ORADATA\EMSTS\SMTTAB01.DBF
restoring datafile 00009 to D:\ORACLE\PRODUCT\10.2.0\ORADATA\EMSTS\RMAN01.DBF
channel ORA_DISK_1: reading from backup piece C:\ORABKUP\EMSTS\BACKUPEMSTS_DB_0AKTC3D8_10_1
channel ORA_DISK_1: restored backup piece 1
piece handle=C:\ORABKUP\EMSTS\BACKUPEMSTS_DB_0AKTC3D8_10_1 tag=TAG20091102T183415
channel ORA_DISK_1: restore complete, elapsed time: 00:03:26
Finished restore at 03-NOV-09

--Database must be recovered because all datafiles have been restored from
-- backup otherwise the following error message will be given out


RMAN> open resetlogs database;

RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of alter db command at 11/03/2009 14:54:40
ORA-01152: file 1 was not restored from a sufficiently old backup
ORA-01110: data file 1: 'D:\ORACLE\PRODUCT\10.2.0\ORADATA\EMSTS\SYSTEM01.DBF'

RMAN> alter database open resetlogs;

RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of alter db command at 11/03/2009 14:56:56
ORA-01152: file 1 was not restored from a sufficiently old backup
ORA-01110: data file 1: 'D:\ORACLE\PRODUCT\10.2.0\ORADATA\EMSTS\SYSTEM01.DBF'

RMAN> recover database;

Starting recover at 03-NOV-09
using channel ORA_DISK_1

starting media recovery

archive log thread 1 sequence 3 is already on disk as file D:\ORACLE\PRODUCT\10.2.0\ORADATA\EMSTS\REDO02.LOG
archive log thread 1 sequence 4 is already on disk as file D:\ORACLE\PRODUCT\10.2.0\FLASH_RECOVERY_AREA\EMSTS\ARC
HIVELOG\2009_11_03\O1_MF_1_4_5GZKBQMK_.ARC
archive log thread 1 sequence 5 is already on disk as file D:\ORACLE\PRODUCT\10.2.0\ORADATA\EMSTS\REDO01.LOG
archive log filename=D:\ORACLE\PRODUCT\10.2.0\ORADATA\EMSTS\REDO02.LOG thread=1 sequence=3
archive log filename=D:\ORACLE\PRODUCT\10.2.0\FLASH_RECOVERY_AREA\EMSTS\ARCHIVELOG\2009_11_03\O1_MF_1_4_5GZKBQMK_
.ARC thread=1 sequence=4
archive log filename=D:\ORACLE\PRODUCT\10.2.0\ORADATA\EMSTS\REDO01.LOG thread=1 sequence=5
media recovery complete, elapsed time: 00:00:08
Finished recover at 03-NOV-09

RMAN> alter database open resetlogs;

database opened

RMAN>


Several points are worth emphasising.

1. Recovery using a backup controlfile should be done only if a current control file is unavailable.
2. All datafiles must be restored from backup. This means the database will need to be recovered using archived and online redo logs. These MUST be available for recovery until the time of failure.
3. As with any database recovery involving RESETLOGS, take a fresh backup immediately.
4. Technically the above is an example of complete recovery - since all committed transactions were recovered. However, some references consider this to be incomplete recovery because the database log sequence had to be reset.

Recovery from missing or corrupted redo log group

Case 1: A multiplexed copy of the missing log is available.

If a redo log is missing, it should be restored from a multiplexed copy, if possible. This is the only way to recover without any losses. Here's an example, where I attempt to startup from SQLPlus when a redo log is missing:

SQL> connect / as sysdba;
Connected to an idle instance.
SQL> startup open;
ORACLE instance started.

Total System Global Area 1317011456 bytes
Fixed Size 1250740 bytes
Variable Size 939526732 bytes
Database Buffers 369098752 bytes
Redo Buffers 7135232 bytes
Database mounted.
ORA-00313: open failed for members of log group 3 of thread 1
ORA-00312: online log 3 thread 1:
'D:\ORACLE\PRODUCT\10.2.0\ORADATA\EMSTS\REDO03A.LOG'

To fix this we simply copy REDO03A.LOG from its multiplexed location on E: to the above location on E:. After copying the file, we issue an "alter database open" from the above SQLPlus session:

SQL> alter database open;



Case 2: All members of a log group lost.

In this case an incomplete recovery is the best we can do. We will lose all transactions from the missing log and all subsequent logs. We illustrate using the same example as above. The error message indicates that members of log group 3 are missing. We don't have a copy of this file, so we know that an incomplete recovery is required. The first step is to determine how much can be recovered. In order to do this, we query the V$LOG view (when in the mount state) to find the system change number (SCN) that we can recover to (Reminder: the SCN is a monotonically increasing number that is incremented whenever a commit is issued):

--The database should be in the mount state for v$log access

SQL> select first_change# from v$log where group#=3 ;

SQL> connect / as sysdba;
Connected to an idle instance.
SQL> startup open;
ORACLE instance started.

Total System Global Area 1317011456 bytes
Fixed Size 1250740 bytes
Variable Size 956303948 bytes
Database Buffers 352321536 bytes
Redo Buffers 7135232 bytes
Database mounted.
ORA-00313: open failed for members of log group 3 of thread 1
ORA-00312: online log 3 thread 1:
'D:\ORACLE\PRODUCT\10.2.0\ORADATA\EMSTS\REDO03.LOG'


SQL> select first_change# from V$log where group#=3;

FIRST_CHANGE#
-------------
3240563848


The FIRST_CHANGE# is the first SCN stamped in the missing log. This implies that the last SCN stamped in the previous log is 370254 (FIRST_CHANGE#-1). This is the highest SCN that we can recover to. In order to do the recovery we must first restore ALL datafiles to this SCN, followed by recovery (also up to this SCN). This is an incomplete recovery, so we must open the database resetlogs after we're done. Here's a transcript of the recovery session (typed commands in bold, comments in italics, all other lines are RMAN feedback):

SQL> exit
Disconnected from Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Production
With the Partitioning, OLAP and Data Mining options

C:\>rman target=sys@emsts catalog=rman@emsts

Recovery Manager: Release 10.2.0.1.0 - Production on Mon Nov 2 18:11:07 2009

Copyright (c) 1982, 2005, Oracle. All rights reserved.

target database Password:
connected to target database: EMSTS (DBID=849865825, not open)
recovery catalog database Password:
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-00554: initialization of internal recovery manager package failed
RMAN-04004: error from recovery catalog database: ORA-01033: ORACLE initialization or shutdown in progress

C:\>rman target /

Recovery Manager: Release 10.2.0.1.0 - Production on Mon Nov 2 18:11:26 2009

Copyright (c) 1982, 2005, Oracle. All rights reserved.

connected to target database: EMSTS (DBID=849865825, not open)

RMAN> restore database until scn 3240563847;

Starting restore at 02-NOV-09
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: sid=151 devtype=DISK

channel ORA_DISK_1: starting datafile backupset restore
channel ORA_DISK_1: specifying datafile(s) to restore from backup set
restoring datafile 00001 to D:\ORACLE\PRODUCT\10.2.0\ORADATA\EMSTS\SYSTEM01.DBF
restoring datafile 00002 to D:\ORACLE\PRODUCT\10.2.0\ORADATA\EMSTS\UNDOTBS01.DBF
restoring datafile 00003 to D:\ORACLE\PRODUCT\10.2.0\ORADATA\EMSTS\SYSAUX01.DBF
restoring datafile 00004 to D:\ORACLE\PRODUCT\10.2.0\ORADATA\EMSTS\USERS01.DBF
restoring datafile 00005 to D:\ORACLE\PRODUCT\10.2.0\ORADATA\EMSTS\EXAMPLE01.DBF
restoring datafile 00006 to D:\ORACLE\PRODUCT\10.2.0\ORADATA\EMSTS\ODDDAT01.DBF
restoring datafile 00007 to D:\ORACLE\PRODUCT\10.2.0\ORADATA\EMSTS\ODDTAB01.DBF
restoring datafile 00008 to D:\ORACLE\PRODUCT\10.2.0\ORADATA\EMSTS\SMTTAB01.DBF
restoring datafile 00009 to D:\ORACLE\PRODUCT\10.2.0\ORADATA\EMSTS\RMAN01.DBF
channel ORA_DISK_1: reading from backup piece C:\ORABKUP\EMSTS\BACKUPEMSTS_DB_04KT1A8E_4_1
channel ORA_DISK_1: restored backup piece 1
piece handle=C:\ORABKUP\EMSTS\BACKUPEMSTS_DB_04KT1A8E_4_1 tag=TAG20091029T162341
channel ORA_DISK_1: restore complete, elapsed time: 00:03:18
Finished restore at 02-NOV-09

RMAN> recover database until scn 3240563847;

Starting recover at 02-NOV-09
using channel ORA_DISK_1

starting media recovery

archive log thread 1 sequence 109 is already on disk as file D:\ORACLE\PRODUCT\10.2.0\FLASH_RECOVERY_AREA\EMSTS\A
RCHIVELOG\2009_10_30\O1_MF_1_109_5GNOC06T_.ARC
archive log thread 1 sequence 110 is already on disk as file D:\ORACLE\PRODUCT\10.2.0\FLASH_RECOVERY_AREA\EMSTS\A
RCHIVELOG\2009_11_02\O1_MF_1_110_5GWOVN4W_.ARC
archive log thread 1 sequence 111 is already on disk as file D:\ORACLE\PRODUCT\10.2.0\FLASH_RECOVERY_AREA\EMSTS\A
RCHIVELOG\2009_11_02\O1_MF_1_111_5GX91FBQ_.ARC
channel ORA_DISK_1: starting archive log restore to default destination
channel ORA_DISK_1: restoring archive log
archive log thread=1 sequence=108
channel ORA_DISK_1: reading from backup piece C:\ORABKUP\EMSTS\BACKUPEMSTS_DB_05KT1AH4_5_1
channel ORA_DISK_1: restored backup piece 1
piece handle=C:\ORABKUP\EMSTS\BACKUPEMSTS_DB_05KT1AH4_5_1 tag=TAG20091029T162820
channel ORA_DISK_1: restore complete, elapsed time: 00:00:01
archive log filename=D:\ORACLE\PRODUCT\10.2.0\FLASH_RECOVERY_AREA\EMSTS\ARCHIVELOG\2009_11_02\O1_MF_1_108_5GXDFQV
C_.ARC thread=1 sequence=108
channel default: deleting archive log(s)
archive log filename=D:\ORACLE\PRODUCT\10.2.0\FLASH_RECOVERY_AREA\EMSTS\ARCHIVELOG\2009_11_02\O1_MF_1_108_5GXDFQV
C_.ARC recid=110 stamp=701893047
archive log filename=D:\ORACLE\PRODUCT\10.2.0\FLASH_RECOVERY_AREA\EMSTS\ARCHIVELOG\2009_10_30\O1_MF_1_109_5GNOC06
T_.ARC thread=1 sequence=109
media recovery complete, elapsed time: 00:00:18
Finished recover at 02-NOV-09

RMAN> alter database open resetlogs;

database opened

RMAN>

The following points should be noted:
  1. The entire database must be restored to the SCN that has been determined by querying v$log.
  2. All changes beyond that SCN are lost. This method of recovery should be used only if you are sure that you cannot do better. Be sure to multiplex your redo logs, and (space permitting) your archived logs!
  3. The database must be opened with RESETLOGS, as a required log has not been applied. This resets the log sequence to zero, thereby rendering all prior backups worthless. Therefore, the first step after opening a database RESETLOGS is to take a fresh backup. Note that the RESETLOGS option must be used for any incomplete recovery.


dbverify utility syntax

The primary tool for checking for corruption in an Oracle database is DBVERIFY. It can be used to perform a physical data structure integrity check on data files whether the database is online or offline. The big benefit of this is that DBVERIFY can be used to check backup data files without adding load to the database server. You invoke DBVERIFY from the operating system command line like this:
$ dbv file=dbfPATH/data01.dbf logfile=logfilepath/verify.log blocksize=8192 feedback=100
In this example data01.dbf is the data file to check, and the tablespace this file belongs to has a block size of 8192 bytes. The feedback parameter tells DBVERIFY to draw a period on the screen after every 100 pages (blocks) of the file are verified.
In the log file you’ll see output like this:

DBVERIFY: Release 10.2.0.1.0 - Production on Mon Nov 2 16:06:38 2009

Copyright (c) 1982, 2005, Oracle. All rights reserved.


DBVERIFY - Verification starting : FILE = d:\oracle\product\10.2.0\oradata\emsts\smttab01.dbf


DBVERIFY - Verification complete

Total Pages Examined : 262144
Total Pages Processed (Data) : 58669
Total Pages Failing (Data) : 0
Total Pages Processed (Index): 50674
Total Pages Failing (Index): 0
Total Pages Processed (Other): 2588
Total Pages Processed (Seg) : 0
Total Pages Failing (Seg) : 0
Total Pages Empty : 150213
Total Pages Marked Corrupt : 0
Total Pages Influx : 0
Highest block SCN : 3240352395 (0.3240352395)