5
Disaster Recovery

This chapter provides information about how you can use LSM to recover from different types of system disasters on your Oracle Server. It stresses the importance of preparing for a disaster. If you back up your Oracle data regularly and implement the planning procedures outlined in this chapter, you will be well prepared to recover from a disaster.

Major Topics:

Types of Disaster Recovery

You can use LSM to recover from different types of disaster on your Oracle Server machine. The degree of data loss during a disaster can range from one or more files lost when a disk crashes to an entire computer system. The degree of severity of the disaster determines the procedures you need to perform to recover data on your Oracle Server.

The information in this chapter explains how to recover from four different types of disaster on your Oracle Server:

Figure 5-1 illustrates how the operating system and LSM software can be damaged or destroyed. In this example, an Oracle Server on UNIX has several physical disks. A power outage has corrupted the filesystem on Disk 0, and the operating system and LSM software residing on Disk 0 has been lost. To recover from the disaster, you need to replace the disk, reinstall the operating system and LSM software, and use LSM to recover the lost server configuration and any data that was lost when the filesystem was corrupted.

Figure 5-1 Damaged Disk Containing OS and LSM Software

Note:
In a situation where the primary disk containing both the operating system and the LSM binaries has been destroyed, you must always reinstall the operating system, reinstall LSM, and then use LSM to recover the remainder of your data. You cannot recover data backed up by LSM without reinstalling the operating system and LSM software first.

Figure 5-2 shows how the directory containing the LSM online indexes and resource configuration files can be damaged or destroyed. In this example, a disaster has corrupted the disk on the Windows NT Oracle Server that contains the LSM indexes and configuration files. To recover from a disaster of this type, you need to recover the contents of the bootstrap save set.

Figure 5-2 Damaged Disk Containing LSM Indexes

Figure 5-3 illustrates how Oracle data can be damaged or destroyed. In this example, a Windows NT Oracle Server has two disks. The second disk containing the Oracle data has been destroyed by a disk crash. However, the disk containing the operating system and LSM software is still operational. To recover from this situation, you can use the Oracle backup/restore utility RMAN.

Figure 5-3 Damaged Disk Containing Oracle Data

In the example in Figure 5-4, the LSM Server is destroyed. To recover from this disaster, you need to recover all the data to a new system by the same name.

Figure 5-4 Destroyed LSM Server

Disaster Preparation

It is important to develop a plan for recovering from a disaster on your Oracle Server. Not only do you need to back up important data on a daily basis, but you need to develop and test a plan for recovering your data should you experience a disk crash or loss of data. The more time and effort you invest in creating and testing your disaster recovery plan, the better prepared you will be should disaster strike.

The section "LSM Server Bootstrap Backups" describes how the LSM Server is preconfigured to perform an automatic nightly bootstrap backup. The bootstrap is a special backup save set that includes the client index, media index, and resource database. You can also perform a manual bootstrap backup by using the procedure in "Manual Bootstrap Backup" .

The bootstrap information printed at the end of every bootstrap backup is essential for recovering LSM Server indexes and resource configuration files. As explained in "Maintaining Bootstrap Information" , you should keep the bootstrap printout in a safe place, ready for use during a disaster recovery. If you do not keep the bootstrap printout, you will need to determine the save set ID of the most recent bootstrap by means of the procedure in "Bootstrap Save Set ID" below.

For more information on using the bootstrap information, see the section "Recovering LSM Indexes and Configuration Files", under either "Disaster Recovery on UNIX" or "Disaster Recovery on Windows NT".

Along with the bootstrap information, you should keep accurate records of your network and system configurations and maintain all your original software in a safe location. For a comprehensive disaster recovery, you need the following items:

Original operating system media and patches
Original LSM media
Device drivers and media device names
Filesystem configuration
IP addresses and hostnames
Bootstrap information

Bootstrap Save Set ID

The most efficient way to recover the bootstrap is to be sure to save the bootstrap information prior to a disaster. However, if you do not have the information, you must scan the most recent backup volume to find the save set ID of the most recent bootstrap. Use the scanner -B command, as it will always find a valid bootstrap.

Finding the Bootstrap for UNIX

Use the following steps to find the most recent save set ID for a bootstrap on a UNIX system:

Place the most recent media used for backups in the LSM Server device.
At the system prompt, change to the directory where you originally installed LSM, typically, /usr/sbin.

Use the scanner -B command to locate the most recent bootstrap on the media. For example:

For Solaris systems:

/usr/sbin  scanner -B /dev/rmt/0hbn

For AIX systems:

/usr/bin   scanner -B /dev/rmt0.1

For DIGITAL UNIX systems:

/usr/opt/networker/bin  scanner -B /dev/nrmt0h

For HP-UX systems:

/opt/networker/bin  scanner -B /dev/rmt/0mnb

The scanner -B command displays the latest bootstrap save set information found on the backup volume, as illustrated in the following example:

scanner: scanning 8mm tape jupiter.001 /dev/rmt/0hbn
scanner: Bootstrap 1148869870 of 8/21/96 7:45:15 located on volume jupiter.001, 
file 88

After you locate the bootstrap with the most recent date, you can run the mmrecov command on a UNIX system to recover the LSM Server indexes and resource configuration. For more information, see "Recovering LSM Indexes and Configuration Files".

Finding the Bootstrap for Windows NT

Use the following steps to find the most recent save set ID for a bootstrap on a Windows NT system:

Place the most recent media used for backups in the LSM Server device.
In Command Prompt window, change to the directory where you originally installed LSM, typically, C:\win32app\nsr\bin.
Use the scanner -B command to locate the most recent bootstrap. For example:
```
C:\win32app\nsr\bin scanner -B \\.\Tape0
```

The scanner -B command displays the latest bootstrap save set information found on the backup volume, as show in this example:

scanner: scanning 8mm tape mars.006 on \\.\Tape0
scanner: Bootstrap 1148869870 8/11/96 6:29:58 mars.006, file 88

After you locate the bootstrap with the most recent date, you can run the mmrecov command on a Windows NT system to recover the LSM Server indexes and resource configuration. For more information, see "Recovering LSM Indexes and Configuration Files".

Disk Information

As an additional precautionary step to help you recover from loss of critical data, find out how each disk on your Oracle Server machine is partitioned and formatted--prior to a disaster recovery-- and print and save this information. If a disk is damaged or destroyed during a disaster, use the disk information to recreate the disk exactly as it was prior to the disk crash.

Note:
When you recreate your disk configuration, you will need to have partitions large enough to hold all the recovered data. Make the partitions at least as big as they were prior to the crash.

Disk Information on UNIX

Use the df command to find out how the LSM Server disks are partitioned and mounted. Use the appropriate operating system command to print disk partitioning information.

For Solaris, use the df and prtvtoc commands.
For AIX, use the df and lslv commands or the Logical Volume Manager in the System Management Interface Tool (SMIT).
For DIGITAL UNIX, use the df and fstab commands.
For HP-UX, use the df and bdf commands.

For example, the df information looks similar to the following:

Filesystem            kbytes      used    avail  capacity  Mounted on

/dev/dsk/c0t3d0s6     480919    414138    18691     96%    /usr

/dev/dsk/c0t3d0s0    1251422    183449   942833     17%    /

swap                  208112       380   207732      1%    /tmp

/dev/dsk/c0t3d0s5      96031     12799    73632     15%    /var

The prtvtoc command example below provides information about how each disk is partitioned for a Solaris system. The device name is the "raw" device corresponding to the device name used for the output from the df command shown previously.

/dev/dsk/c0t3d0s0 partition map


Dimensions:

     512 bytes/sector

      80 sectors/track

      19 tracks/cylinder

    1520 sectors/cylinder

    3500 cylinders

    2733 accessible cylinders


Flags:

  1: unmountable

 10: read-only


                          First     Sector      Last

Partition  Tag  Flags     Sector     Count      Sector   Mount Directory

    0       2    00           0    2663040    2663039    /

    1       3    01     2663040     261440    2924479     

    2       5    00           0    4154160    4154159     

    5       7    00     2924480     205200    3129679    /var

    6       4    00     3129680    1024480    4154159    /usr

The lslv command example below gives you information about the logical volumes on an AIX system.

OUTPUT of $ lslv hd6


LOGICAL VOLUME:      hd6                   VOLUME GROUP:    rootvg

LV IDENTIFIER:       00004421b56f747b.1    PERMISSION:      read/write

VG STATE:            active/complete       LV STATE:        opened/syncd

TYPE:                paging                WRITE VERIFY:    off

MAX LPs:             128                   PP SIZE          4 megabyte(s)

COPIES:              1                     SCHED POLICY:    parallel

LPs:                 8                     PPs:             8

STALE PPs:            0                     BB POLICY:       non-relocatable

INTER-POLICY:        minimum               RELOCATABLE:     yes

INTRA-POLICY:        middle                UPPER BOUND      32

MOUNT POINT:         N/A                   LABEL:           None

MIRROR WRITE CONSISTENCY: off

EACH LP COPY ON A SEPARATE PV ?: yes

If a disk was damaged, you will be able to restore it and recover the filesystems to their original state, using the hardcopy information from these disk information commands.

Disk Information on Windows NT

Prior to a disaster, copy the information that appears in the Windows NT Disk Administrator window, including the size of the partitions, the formatting methods, and the drive letters the partitions have been assigned to.

Disaster Recovery Procedures

The procedures to follow for disaster recovery depend on whether your Oracle Server machine is running UNIX or Windows NT. For more information, see either "Disaster Recovery on UNIX" or "Disaster Recovery on Windows NT" below.

For either UNIX or Windows NT, it is difficult to provide step-by-step disaster recovery instructions since every disaster situation is unique. The examples included in the following sections are designed to give you general principles on how to recover critical data and to help you understand the procedures.

Disaster Recovery on UNIX

Recovery Requirements

While performing any disaster recovery procedures on your UNIX system, keep in mind the following hardware, operating system, and LSM requirements. Fulfill the requirements that are pertinent to the disaster recovery procedure that you are following.

Hardware Requirements

Use the following list to install and configure your system hardware correctly:

Replace a damaged disk with a disk of the same size or larger.
When replacing hardware, use the same type of controller, driver, and SCSI ID as used prior to the disaster.
Recreate the disk partitions on the new system the same size or larger.
Format the disk partitions using the same formats as used by the original disk.

Operating System Requirements

Adhere to the following list when you reinstall the UNIX operating system:

Reinstall the same version of UNIX.
Use the same computer name, TCP/IP hostname, and DNS Domain name.
Reinstall any operating system patches that existed before the disaster.
Reinstall the device and SCSI drivers.
Make sure all network protocols are working properly.
After reinstalling UNIX, reboot your system and log in as "root" user. Make sure no error messages occur when you start up the system and that all devices are recognized by the operating system.

LSM Requirements

Fulfill the following requirements to ensure that you reinstall LSM successfully. Refer to the Oracle installation guide for your UNIX system for LSM installation instructions.

Reinstall the same version of the LSM software.
Reinstall LSM into the same directory where it originally resided.
Reinstall any patches that were installed prior to the disaster.
Be sure to follow the required procedures for retrieving the LSM Server's index and configuration files. Be sure to stop and restart LSM after you rename the configuration files directory. See "Recovering LSM Indexes and Configuration Files" below for complete details.

Recovering the Operating System and LSM Software

When a disk with the operating system and LSM binaries has been damaged or completely destroyed, you need to replace the damaged disk and reinstall both the operating system and the LSM software. If the disk was not completely destroyed and the operating system or LSM is still operational, use only those steps in this section that apply to your situation.

Note:
When you recover the operating system, you must do so in single-user mode from the system console, not from the X window system.

Use the following steps to guide you through recovering the operating system and LSM software:

Replace the damaged disk if necessary. Make sure the replacement disk is as large or larger than the original disk.
Use the saved disk partition information to recreate the disk partitions with the same structure as the original disk. See "Disk Information on UNIX".
Use the output from the disk information command to make a filesystem for each raw partition that you plan to recover, then mount the block partition. (LSM does not initialize or create filesystems; it recovers data into existing filesystems.)
Use the appropriate UNIX command to format the replacement disk. For Solaris systems, use newfs or mkfs. For AIX systems, use SMIT. For HP-UX systems, use mkfs.
Reinstall the operating system in the same location where it originally resided, using the original software and documentation. Use the same computer name, TCP/IP hostname, and DNS Domain name used prior to losing the operating system.

You need to fully configure the operating system by recreating any unique configurations that existed before you lost data or experienced a disk crash. If you use a device with a default configuration that is not directly supported by the operating system, you also need to modify the appropriate device configuration files during installation:
- On Solaris systems you might need to modify the /kernel/drv/st.conf file to support a DLT tape drive.
- On AIX, use SMIT to configure the devices.
Install and configure the SCSI controller and tape device drivers.
If you had a link to another disk that contains the LSM indexes and configuration files (/nsr/res) or any other LSM directories located on another disk, recreate it now. For example, on AIX systems, /nsr is a link to /usr/nsr.
Reinstall the LSM software, using the original software and accompanying documentation. Refer to the appropriate Oracle installation guide for your particular UNIX system. When you reinstall the LSM Server Software, LSM automatically rediscovers the index and configuration files if they are not corrupted.
Reboot the system and log in as "root" user.

If you lost the LSM indexes and resource configuration files that reside in the /nsr directory, you will need to follow the instructions in the next section to recover them.

Recovering LSM Indexes and Configuration Files

If the LSM Server indexes and configuration files that reside in the /nsr directory have been destroyed, you will need to use the mmrecov command to recover them.

If the operating system and LSM software were also destroyed, they must be reinstalled prior to recovering the /nsr directory contents. See "Recovering the Operating System and LSM Software".

When you use the mmrecov command to recover the /nsr directory, you actually recover the contents of three important directories:

/nsr/mm (media manager) directory - contains the LSM media index that tracks all of the LSM backup volumes.
/nsr/index/server-name directory - contains the LSM client index, which has a list of all the server files that were backed up prior to the disaster.
/nsr/res directory - contains special LSM resource configuration files. The nsr.res file contains the LSM Server configurations including device information. Unlike the indexes, the contents of this directory cannot be reliably overwritten while LSM is running. Therefore, mmrecov recovers the /nsr/res directory as /nsr/res.R. Later, you must change the directory name to /nsr/res.

Using the mmrecov Command

The mmrecov command asks you for the bootstrap save set identification number (ssid). If you followed the recommended procedures to prepare for loss of critical data, you have a copy of the bootstrap file as a hardcopy printout with the name of the backup media you need and the bootstrap save set ID (ssid).

In the following example, ssid "17851237" is the most recent bootstrap backup:

Jun 17  22:21 1997 mars's LSM bootstrap information
date     time      level  ssid      file  record  volume
6/14/97  23:46:13  full   17826163  48    0       mars.1
6/15/97  22:45:15  9      17836325  87    0       mars.2
6/16/97  22:50:34  9      17846505  134   0       mars.2
6/17/97  22:20:25  9      17851237  52    0       mars.3

If you do not have this information, you can still recover the indexes by finding the ssid using the scanner -B command. See "Bootstrap Save Set ID".

After you locate the bootstrap with the most recent date on your UNIX system, you can run the mmrecov command, supplying the save set ID and file number displayed by the scanner command, to recover the LSM Server indexes and resource configuration.

With the operating system and LSM software in place, recover the indexes and configuration files from the backup media by following these steps:

Find the bootstrap information, which you need for the next two steps.
Mount the backup media that contains the most recent backup named bootstrap in a storage device.

Use the mmrecov command to extract the contents of the bootstrap backup. (Never run the mmrecov command from the root directory (/); you can use any other directory.) For example:

# mmrecov

Doing mmrecov operation as root on the server !!!
mmrecov: Using mars as server 
NOTICE: mmrecov is used to recover the NetWorker server's on-line file and 
media indexes from media (backup tapes or disks) when either of the server's 
on-line file or media index has been lost or damaged.
Note that this command will OVERWRITE the server's existing on-line file and 
media indexes.  mmrecov is not used to recover NetWorker clients' on-line 
indexes; normal recover procedures may be used for this purpose.  See the 
mmrecov(8) and nsr_crash(8) man pages for more details.
  
rd=mars:/space1/DISKDEV1 rd=mars:/space1/DISKDEV2 /space1/DISKDEV1 
/space1/DISKDEV2 
What is the name of the device you plan on using [rd=mars:/space1/DISKDEV1]? 
/space1/DISKDEV1
Enter the latest bootstrap save set id []: 17851237
Enter starting file number (if known) [0]: 52
Enter starting record number (if known) [0]: 0
   
Please insert the volume on which save set id 17851237 started into 
/space1/DISKDEV1.  When you have done this, press <RETURN>: [Return]
    
Scanning /space1/DISKDEV1 for save set 17851237; this may take a while...
scanner: scanning optical disk TestBackup.199 on /space1/DISKDEV1
/nsr/res/nsr.res
/nsr/res/nsrjb.res
scanner: ssid 17851237: scan complete
scanner: ssid 17851237: 44 KB, 11 file(s)
/nsr/res/nsrla.res
/nsr/res/
/nsr/mm/
/nsr/index/mars/
/nsr/index/
/nsr/
/
nsrmmdbasm -r /nsr/mm/mmvolume/
nsrindexasm -r /nsr/index/mars/db/
/space1/DISKDEV1: mount operation in progress
/space1/DISKDEV1: mounted optical disk TestBackup.199 (write protected)
     
The bootstrap entry in the on-line index for mars has been recovered.The 
complete index is now being reconstructed from the various partial indexes 
which were saved during the normal save for this server.
      
If your resource files were lost, they are now recovered in the 
'res.R'directory.  Copy or move them to the 'res' directory, after the index 
has been reconstructed and you have shut down the daemons.  Then restart the 
daemons.
Otherwise, just restart the daemons after the index has been reconstructed.
     7 records recovered, 0 discarded.
nsrindexasm: Pursuing index pieces of /nsr/index/mars/db from mars.
Recovering files into their original locations.
nsrindexasm -r ./mars/db/
merging with existing mars index
mars: 2035 records recovered, 0 discarded.
Received 1 matching file(s) from NSR server `mars'
Recover completion time: Tue Jun 24 16:46:38 1997
Cross checking index for client mars to remove duplicate records
The index for `mars' is now fully recovered.

You can use LSM commands such as nsrwatch or nwadmin to watch the progress of the LSM Server during the recovery of the index and configuration files. Open a new window (shell tool) to monitor the recovery so that the mmrecov output does not display on top of the nsrwatch output.

mars# nsrwatch
Tue 16:36:11 server notice: started
Tue 16:36:30 index notice: The client index is missing, recover the index or run 
nsrck     -c
Tue 16:36:30 index notice: completed checking 2 client(s)
Tue 16:36:34 /space1/DISKDEV1 volume TestBackup.199 not found in media index
Tue 16:45:21 /space1/DISKDEV1 mount operation in progress
Tue 16:45:30 /space1/DISKDEV1 mounted optical disk TestBackup.199 (write 
protected)
Tue 16:45:33 index notice: nsrim has finished cross checking the media db
Tue 16:46:24 index notice: cross-checking index for mars
Tue 16:46:30 /space1/DISKDEV1 mounted optical disk TestBackup.199 (write 
protected)
Tue 16:46:31 mars:/nsr/index/mars (6/24/97) starting read from TestBackup.199 of     
397
Tue 16:46:35 mars:/nsr/index/mars (6/24/97) done reading 397 KB
Tue 16:46:38 index notice: cross-checking index for mars

Renaming the Configuration Files Directory

Unlike the /nsr/index directory, the /nsr/res directory containing the configuration files cannot be reliably overwritten while LSM is running. Therefore, mmrecov recovers the /nsr/res directory as /nsr/res.R.

To complete the recovery of the LSM configuration files:

Shut down LSM.
Rename the existing /nsr/res directory to /nsr/res.orig.
Rename the recovered /nsr/res.R directory to /nsr/res.
Restart LSM.

Complete these steps after mmrecov has finished and this final message appears:

The on-line index for 'server' is now fully recovered.

Shut down the LSM Server using the nsr_shutdown command:
```
# nsr_shutdown
```
Save the original /nsr/res directory as /nsr/res.orig, and rename the recovered directory (res.R) to res.
```
# cd /nsr
# mv res res.orig
# mv res.R res
```
Restart LSM. When it restarts, the Server uses the recovered configuration data residing in the recovered /nsr/res directory.
```
# nsrd
# nsrexecd
```
Once you verify the LSM configurations are correct, you can remove the res.orig directory.
```
# rm -r /nsr/res.orig
```

Restoring Oracle8 or Oracle8i Data on UNIX

This section describes how to recover from a crash in which one or more files of an Oracle8 or Oracle8i database were damaged on your UNIX Oracle Server.

The first sign of a disk crash will usually be an I/O error. Oracle will usually document the I/O error in the trace file and in the alert log.

If only one database file is affected, your database may have encountered a bad spot on the disk. This can be circumvented by reformatting the disk to make a new list of bad blocks. However, this can be time-consuming, so it is prudent to have a spare disk available to swap in while you reformat.

If several database files are affected, all on the same disk, you could have a disk controller problem or a disk head crash. A bad controller can be replaced, and data on the disk will often be in perfect shape. But after a head crash, you will need to use that spare disk as a replacement.

Keep an extra disk on hand as a "hot spare," in case a disk failure occurs. Format it and verify that it works. If a disk failure does occur, it is much faster to swap in a spare disk than it is to rename database files and update the control file accordingly.

After you have determined the Oracle data that needs to be recovered, you must first restore the relevant files.

You can restore and recover the Oracle database files by using one of these programs:

Command-line interface of the Recovery Manager (RMAN) utility
OEM Backup Manager

For more information about the Oracle Enterprise Manager, see "Using the Oracle Enterprise Manager Backup Manager". For complete details about using RMAN in Oracle8i, refer to the Oracle8i Backup and Recovery Guide or, for Oracle8, the comparable guide.

Recovering LSM to a New Machine

This section describes the situation where your original LSM machine is beyond repair, so you want to move LSM to a new machine. This procedure assumes that you are not updating the operating system or the LSM software.

Note:
Do not make major changes to the operating system or LSM software at the same time as you move to a new machine.

If you want to make changes to the operating system or the LSM software, we strongly suggest that you configure the new machine exactly like the original, using the same version of the operating system and LSM software. After configuring the new machine, make sure the system is operational, perform a couple of successful backups, and then, update or upgrade the operating system or the LSM software, one at a time.

To move LSM to a new machine, use the same steps for recovering a primary disk and the LSM indexes and configuration files. See "Recovering the Operating System and LSM Software" and "Recovering LSM Indexes and Configuration Files" for complete information.

However, you should be aware of the following requirements for configuring the software:

Use the original hostname for the new LSM machine. You must use the same hostname because the LSM Server indexes were created under the original LSM machine hostname.
Make sure the original server name is listed as an alias for the server in the Client window of the nwadmin program

After LSM is moved to another machine, you must recover the LSM resource database (nsr.res file) to have the same resource and attribute settings on your new machine as you had on the previous one.

After you successfully move your server, check the following:

Verify the LSM Server resource configurations by means of the LSM Administrator GUI.
Use the savegrp -O command to perform a manual bootstrap backup as soon as possible. See "Manual Bootstrap Backup" for more information.
Check the Recover window to make sure all the client indexes are browsable and, therefore, recoverable.

Disaster Recovery on Windows NT

Recovery Requirements

While performing any disaster recovery procedures on your Windows NT system, keep in mind the following hardware, operating system, and LSM requirements. Fulfill the requirements that are pertinent to the disaster recovery procedure that you are following.

Hardware Requirements

Use the following list to install and configure your system hardware correctly:

Replace a damaged disk with a disk of the same size or larger.
When replacing hardware, use the same type of controller, driver, and SCSI ID as used prior to the disaster.
Recreate the disk partitions on the new system to be the same size or larger.
Format the disk partitions using the same formats as the original disk (for example, FAT, NTFS, or HPFS).
Assign the same drive letters to each partition as used prior to the disaster.

Operating System Requirements

Adhere to the following list when you reinstall the Windows NT operating system:

Reinstall the same version of Windows NT.
Reinstall Windows NT in the same directory where it originally resided.
Use the same server name, TCP/IP hostname, and DNS Domain name.
Reinstall any Microsoft Service Packs or Hotfixes that existed before the disaster.
Reinstall the device and SCSI drivers.
Make sure all networks protocols are working properly.
After reinstalling Windows NT, reboot your system and log on as Administrator. Check the event viewer to make sure no errors occurred during startup. Also make sure that all the devices are recognized by the operating system.

LSM Requirements

Fulfill the following requirements to ensure that you reinstall LSM successfully. Refer to the Oracle installation guide for your Windows NT system for LSM installation instructions.

Reinstall the same version of the LSM software.
Reinstall LSM into the same drive and directory where it was originally installed.
Reinstall any patches that were installed prior to the disaster.
Be sure to stop and restart LSM after you rename the configuration files directory.

Recovering the Operating System and LSM Software

To recover the operating system and LSM software, follow these steps:

Replace the damaged disk. Make sure the replacement disk is as large or larger than the original disk.
Use the saved disk partition information to recreate the disk partitions with the same structure as the original disk. We recommend that you format each partition on the disk with the same filesystems as before, for example: FAT, NTFS, or HPFS. See "Disk Information on Windows NT" for more information.

Reinstall the operating system into the same directory where the operating system originally resided, using the original software and accompanying documentation. Be sure to use the same computer name, TCP/IP hostname, and DNS Domain name you used prior to losing the operating system.

Note:
Install the Windows NT operating system into a workgroup. Do not install the server in a Domain. When you recover the Registry later in this procedure, the server will be returned to its original Domain after the recovery is complete and you restart the system.

You need to fully configure the operating system by recreating any unique configurations that existed before you lost data or experienced a disk crash.

Install and configure the correct SCSI controller and tape device drivers.
If the system had a Microsoft Service Pack installed prior to the disk crash, reinstall it now.
Reboot the system, and log on as Administrator.
Reinstall the LSM software in the same location it was originally installed. Refer to the Oracle installation guide for your Windows NT system for LSM installation instructions. Also, at this time, reinstall any LSM patches you had installed prior to the disaster. When you reinstall the LSM Server software, LSM automatically rediscovers the indexes and configuration files if they are not corrupted. If you lost the LSM indexes and configuration files directory, you will need to follow the instructions in the next section to recover them.
To complete the recovery of the Windows NT operating system when the LSM index files are intact, start the NetWorker User program provided with the LSM software. For more details on the NetWorker User program, see the NetWorker User online help or refer to Appendix D, "Running the NetWorker User Program".
Click the Recover speedbar button to open the Recover window. The system's directory structure is displayed in the window.
Select and mark the Registry for recovery.
Click the Start speedbar button to begin the recovery.
Boot the system once the recovery is completed and log on as Administrator.

If you lost the LSM indexes and configuration files directory, you will need to follow the instructions in the next section to recover them.

Recovering LSM Indexes and Configuration Files

If the LSM indexes and configuration files that reside in the \nsr directory have been destroyed, you need to use the mmrecov command to recover them.

If the operating system and the LSM software were also destroyed, they must be reinstalled prior to recovering the \nsr directory contents. See the preceding section, "Recovering the Operating System and LSM Software".

When you use the mmrecov command to recover the \nsr directory, you actually recover the contents of three important directories:

\nsr\mm (media manager) directory - contains the LSM media index that tracks the LSM backup volumes and their save sets.
\nsr\index\server-name directory - contains the LSM client index, which has a list of all the server files that were backed up prior to the disaster.
\nsr\res directory - contains special LSM resource configuration files. The nsr.res file contains the LSM Server configurations including device information. Unlike the indexes, the contents of this directory cannot be reliably overwritten while LSM is running. Therefore, mmrecov recovers the \nsr\res directory as \nsr\res.R. Later, you must change the directory name to \nsr\res.

Using the mmrecov Command

In the following example, ssid "1148869870" is the most recent bootstrap backup:

August 20 03:30 1997 LSM bootstrap information Page 1
date     time     level  ssid        file  record  volume
8/08/97  7:44:38  full   1148869706  55    0       mars.004
8/09/97  6:12:09  full   1148869754  48    0       mars.005
8/10/97  6:14:23  full   1148869808  63    0       mars.006
8/11/97  6:29:58  full   1148869870  88    0       mars.006

If you do not have this information, you can still recover the indexes by finding the bootstrap ssid using the scanner -B command. See "Bootstrap Save Set ID".

After you locate the bootstrap with the most recent date on your Windows NT system, you can run the mmrecov command, supplying the save set ID and file number displayed by the scanner command, to recover the LSM Server indexes and resource configuration.

With the operating system and LSM software in place, recover the indexes and configuration files from the backup media:

Find the bootstrap information, which you need for the next two steps.
Retrieve the backup media that contains the most recent backup named bootstrap and mount it in a backup device.

Use the mmrecov command to extract the contents of the bootstrap save set. For example:

mmrecov
C:\win32app\nsr\bin>mmrecov
mmrecov: Using mars.universe.com as server
NOTICE: mmrecov is used to recover the LSM server's on-line file and media 
indexes from media (backup tapes or disks) when either of the server's 
on-line file or media index has been lost or damaged. Note that this command 
will OVERWRITE the server's existing on-line file and media indexes. mmrecov 
is not used to recover LSM clients' on-line indexes; normal recover 
procedures may be used for this purpose.
Enter the latest bootstrap save set id []: 15132
Enter starting file number (if known) [0]: 9
Enter starting record number (if known) [0]:
Please insert the volume on which save set id 15132 started into \\.\Tape0. 
When you have done this, press <RETURN>:
Scanning \\.\Tape0 for save set 15132; this may take a while..
scanner: scanning 4mm tape mars.universe.com.001 on \\.\Tape0
C:\win32app\nsr\res\nsr.res
C:\win32app\nsr\res\nsrjb.res
C:\win32app\nsr\res\nsrla.res
C:\win32app\nsr\res\
nsrmmdbasm -r C:\win32app\nsr\mm\mmvolume
C:\win32app\nsr\mm\mmvolume: file exists, 
overwriting nsrindexasm -r C:\win32app\nsr\index\mars.universe.com\db
C:\win32app\nsr\index\mars.universe.com\
C:\win32app\nsr\index\
C:\win32app\nsr\mm\
C:\win32app\nsr\
C:\win32app\
C:\
scanner: ssid 15132: scan complete
scanner: ssid 15132: 290 KB, 12 files
takin.legato.com: 2247 records recovered, 0 discarded.
Cross checking index for client mars.universe.com to remove duplicate 
records
The index for 'mars.universe.com' is now fully recovered.
\\.\Tape0: mount operation in progress
\\.\Tape0: verifying label, moving backward 2 files
\\.\Tape0: mounted 4mm tape mars.universe.com.001
The bootstrap entry in the on-line index for mars.universe.com has been 
recovered.
If your resource files were lost, they are now recovered in the 'res.R' 
directory. Copy or move them to the 'res' directory, after you have shut 
down the service. Then restart the service.
Otherwise, just recycle the service.

The LSM Server indexes and configuration files should be fully recovered.

Renaming the Configuration Files Directory

Unlike the \nsr\index directory, the \nsr\res directory that contains the configuration files cannot be reliably overwritten while LSM is running. Therefore, mmrecov recovers the \nsr\res directory as \nsr\res.R. To complete the recovery of the LSM configuration files, shut down LSM, rename the recovered \nsr\res.R directory to \nsr\res, and then restart LSM.

When the mmrecov program finishes recovering the indexes and configuration files, it displays this final message:

The on-line index for 'server' is now fully recovered.

Complete these steps after mmrecov completes:

Stop the LSM Backup and Recover Server service by using the Windows NT Service Control Panel.
Save the existing \nsr\res directory as \nsr\res.orig.
Rename the recovered directory \nsr\res.R to \nsr\res.
Restart the LSM Backup and Recover Server service by using the Windows NT Service Control Panel.
Once you have verified that the LSM configurations are correct, you can remove the \nsr\res.orig directory.

Completing the Recovery on Windows NT

Once you recover the LSM Server's indexes and configuration files, you can recover the Windows NT system Registry by using the NetWorker User program.

The NetWorker User program is provided as part of your LSM installation. You can start the NetWorker User program by selecting the NetWorker User icon from the taskbar or double-clicking the NetWorker User icon from the Program Manager. For more information about how to use the program, see the NetWorker User online help or refer to Appendix D, "Running the NetWorker User Program".

To recover the Windows NT system Registry on your LSM Server, follow these steps:

Log on as Administrator.
Start the NetWorker User program.
Click the Recover speedbar button to open the Recover window. LSM displays the system's directory structure in the Recover window.
Select and mark the Registry for recovery.
Click the Start speedbar button to begin the recovery.
Boot your computer once the recovery is completed and log on as Administrator.

The system should be restored to its status prior to the disk crash.

Restoring Oracle Data on Windows NT

This section describes how to recover from a crash in which one or more files of an Oracle database were damaged on your Windows NT Oracle Server.

The first sign of a disk crash will usually be an I/O error. Oracle will usually document the I/O error in the trace file and in the alert log.

After you have determined the Oracle data that needs to be recovered, you must first restore the relevant files.

You can restore and recover the Oracle database files by using one of these programs:

Command-line interface of the Oracle restore utility on the Oracle Server
OEM Backup Manager

For more information about the Oracle Enterprise Manager, see "Using the Oracle Enterprise Manager Backup Manager" .

To recover an Oracle8i or Oracle8 database using Recovery Manager, please see the Oracle8i Backup and Recovery Guide, or, for Oracle8, the comparable level guide.

Recovering LSM to a New Machine

Note:
Do not make major changes to the operating system or LSM software at the same time as you move to a new machine.

If you want to make changes to the operating system or the LSM software, we strongly suggest that you configure the new server exactly like the original, using the same version of the operating system and LSM software. After configuring the new server, make sure the system is operational, perform a couple of successful backups, and then, update or upgrade the operating system or the LSM software, one at a time.

However, you should be aware of the following requirements for configuring the software:

Use the original hostname for the new LSM machine. You must use the same hostname because the LSM Server indexes were created under the original LSM machine hostname.
Make sure the original server name is listed as an alias for the server in the Create Client dialog box of the LSM Administrator program.

After LSM is moved to another machine, you must recover the LSM resource database (nsr.res file) to have the same resource and attribute settings on your new machine as you had on the previous one.

After you successfully move your server, check the following:

Verify the LSM Server resource configurations by means of the LSM Administrator GUI.
Use the savegrp -O command to perform a manual bootstrap backup as soon as possible. See "Manual Bootstrap Backup" for more information.
Check the Recover window in the NetWorker User program to make sure all the client indexes are browsable and, therefore, recoverable.

Recovering Oracle Data

The methods for restoring and recovering Oracle database files are outlined in the Oracle8i Backup and Recovery Guide (or comparable Oracle8 guide) for RMAN.

5 Disaster Recovery

Major Topics:

Types of Disaster Recovery

Figure 5-1 Damaged Disk Containing OS and LSM Software

Figure 5-2 Damaged Disk Containing LSM Indexes

Figure 5-3 Damaged Disk Containing Oracle Data

Figure 5-4 Destroyed LSM Server

Disaster Preparation

Bootstrap Save Set ID

Finding the Bootstrap for UNIX

Finding the Bootstrap for Windows NT

Disk Information

Disk Information on UNIX

Disk Information on Windows NT

Disaster Recovery Procedures

Disaster Recovery on UNIX

Recovery Requirements

Hardware Requirements

Operating System Requirements

LSM Requirements

Recovering the Operating System and LSM Software

Recovering LSM Indexes and Configuration Files

Using the mmrecov Command

Renaming the Configuration Files Directory

Restoring Oracle8 or Oracle8i Data on UNIX

Recovering LSM to a New Machine

Disaster Recovery on Windows NT

Recovery Requirements

Hardware Requirements

Operating System Requirements

LSM Requirements

Recovering the Operating System and LSM Software

Recovering LSM Indexes and Configuration Files

Using the mmrecov Command

Renaming the Configuration Files Directory

Completing the Recovery on Windows NT

Restoring Oracle Data on Windows NT

Recovering LSM to a New Machine

Recovering Oracle Data

5
Disaster Recovery