System Administration Guide
Chapter 8, Administering virtual disks

Possible problems

Possible problems

A virtual disk can be online or offline. In the online state, the virtual disk is active and all data is accessible.

If a failure occurs, appropriate console warning messages and status information will be displayed. If a single drive fails on a RAID array (levels 1, 4 and 5), the virtual disk will remain online and all data will be accessible; in other circumstances, the virtual disk may go offline. Simple, concatenated and stripe virtual disk types always remain online as disk errors are passed back to the application.

Status information is displayed in the Virtual Disk Manager after the disk or piece information. Possible error states are:



OUT-OF-SERVICE (OOS)
identifies a disk failure. One of the pieces in the array is inaccessible. If a single piece is in the OOS state, the array remains online. 

OUT-OF-DATE (OOD)
identifies a piece with corrupt parity. The array will remain online, unless a disk failure occurs. A restore operation should be performed as soon as possible. If a disk fails when one of the disk pieces in the array is in the OOD state, the entire array will go offline. 

spare
indicates which disk piece is the hot spare (when configured). If the hot spare is in use, the status indicator IN USE is added. The hot spare piece will automatically replace the piece identified as OOS.

BAD PARITY
indicates a virtual disk with bad parity. This might also mean a new disk which has not had a restore operation, or an array that has just been brought online.
The array enters the offline state when virtual disk I/O accesses fail. When an array goes offline, repair the virtual disk and restore data from a backup.

See also:




Invalid timestamp on root device mirror

The Virtual Disk Manager uses timestamps on RAID virtual disk configurations to ensure proper operation and data integrity. If a timestamp on one of the mirror virtual disk pieces becomes invalid, the piece will not be fully configured. You cannot set the timestamps to a known state on a mirror virtual disk that is online.

If this happens, the out-of-service piece on the mirror root device cannot be restored.

  1. Unmirror the root device.

  2. Shut down the system and reboot, as described in Chapter 10, ``Starting and stopping the system'' in the SCO OpenServer Handbook.

  3. Mirror the root device again.



Mirror root failure

If the primary disk fails during the system reboot (when mirroring the root disk for the first time), the array will go offline and the system boot will fail. At this point in the boot sequence, the system cannot switch over to the secondary disk if it has not been completely restored. Before replacing the primary drive or rebuilding the system, remove power from the secondary disk and try to boot the system. If the primary disk is not completely bad, the system will boot. When the system boots, unmirror the root device. Once the problem has been corrected, try to mirror the root device again.


Offline disk array

When an array or mirrored virtual disk has a mounted filesystem and the array or mirror goes offline due to error conditions, the filesystem becomes unusable. At this point, the filesystem cannot be unmounted (much the same as a hard disk failure). The system must be rebooted to clear this condition.

An array or mirror may go offline when more than one piece is out of service or one piece is out of service and parity is out of date. To rectify this:

  1. Disable the virtual disk.

  2. Force the virtual disk online.

  3. Restore the parity.

  4. Restore the filesystem data from the backup as described in ``Restoring a scheduled filesystem backup''.
If the system fails (crashes) while running on a RAID array, the parity data will be automatically regenerated when the system is booted. This will ensure that parity data accurately reflects the data on the other drives in the array. If an I/O error is encountered while the parity information is being restored, the array will go offline. It is recommended that a UPS (uninterruptable power supply) be used to reduce the risk of power outages on systems using virtual drives. 

Kernel virtual memory shortage

When the system has many RAID virtual disk configurations, with large cluster sizes or a heavy I/O load, the performance of the array may be reduced due to the high contention for system resources (buffers, kernel virtual memory, and so on). By increasing the total amount of physical memory, the system and array performance can be improved. See ``Warning messages'' for more information on driver error messages related to kernel virtual memory. 

Warning messages

Warning messages (error messages that begin with WARNING:) alert you to virtual disk failures that require immediate attention. These messages are recorded in /usr/adm/messages and are displayed on the console.

vdisk n: too many levels of virtual disks

The virtual disk configuration is not supported. The driver failed the virtual disk because only eight levels of nesting are allowed by the disk array driver. The desired configuration exceeds that level.

vdisk n: failed to open piece m

There are two possibilities:

vdisk n: spare will not be operational

There are three possibilities:


vdisk n: is being taken offline vdisk n: piece m and piece p are out-of-service

The array or mirror is no longer functional. The driver was unable to access two disk pieces in the virtual disk or the driver was unable to access one disk piece in the virtual disk while the parity data was out of date. You must:

  1. Disable the virtual disk.

  2. Repair or replace the disabled drives.

  3. Force the virtual disk online.

  4. Restore the parity.

  5. Restore the data for this virtual disk from backups as described in ``Restoring a scheduled filesystem backup''.

vdisk: insufficient memory to ...

The system does not have enough memory to initialize the virtual disk driver or perform a driver operation. If this error message persists and there is no ``memory leak'' (a process that progressively uses up memory without releasing it), add additional physical memory to the system or reduce the virtual disk or other system resources. See ``Virtual disks'' in the Performance Guide.

vdisk n: failed to allocate kernel virtual memory

The system does not have enough memory to allocate the necessary buffers for I/O. If this error message persists, consider adding additional physical memory to the system or reducing system resources.

vdisk n: failed to read/write timestamp on piece m

The driver failed to access disk piece m when reading or writing the timestamp. Piece m will be taken out of service, but the array or mirror will still be functional.

Restore parity on the out-of-service piece. If the restore fails, the out-of-service drive should be repaired/replaced as soon as possible and the parity restored.

vdisk n: new configuration offline, repair out-of-service drive

The virtual disk in not functional. The driver failed to open or access a disk piece when initializing a new configuration. The disk drive is either disabled or the disk piece is not defined properly. Repair or replace the failed drive or ensure the disk piece is available, then force the virtual disk online. If the virtual disk is an array or mirror, restore the parity.

vdisk n: reconfiguration failed, restore from previous backup

There was a system crash while the reconfigure operation was in progress. The array is offline. You should:

  1. Force the virtual disk online.

  2. Restore the parity.

  3. Restore the data for this virtual disk from backups as described in ``Restoring a scheduled filesystem backup''.

vdisk n: too many pieces out of service [ run dkconfig -cf ]

The array or mirror is no longer functional. The driver was unable to access two disk pieces in the virtual disk. You should force the array or mirror online.

If the configuration fails, you should:

  1. Repair or replace the two disabled drives.

  2. Force the array or mirror online again.

  3. Restore the parity.

  4. Restore the data for this virtual disk from backups as described in ``Restoring a scheduled filesystem backup''.

vdisk n: read invalid timestamp on piece m

The timestamp on disk piece m of the array or mirror has become desynchronized. You should:

  1. Disable the virtual disk or mirror.

  2. Force it online.

  3. Restore the parity.

vdisk n: timestamps are not valid [run dkconfig -cf]

Timestamps for the disk pieces, which make up the array or mirror, are out of synchronization. One or more of the disk pieces, which make up the array or mirror, were accessed individually prior to reboot or an enable. You should:

  1. Force the array or mirror online.

  2. Restore the parity.

vdisk n: failed to bring spare online

The driver was unable to access a disk piece and attempted to replace the failed disk (the disk piece will be taken out of service) with the spare disk piece. While updating the data on the spare piece, an I/O error occurred. The spare piece will not replace the out-of-service piece in the virtual disk. The array or mirror is still functional, but you should replace the out-of-service data drive and spare drive as soon as possible.

vdisk n: timestamps not closed properly, parity must be restored

The virtual disk was fully operational prior to a system crash. Data on the parity piece may not be accurate. The system will automatically restore parity during boot, so no intervention is necessary in this case.

vdisk n: timestamps not closed properly, parity may be out-of-date

The virtual disk had one piece out of service prior to a system crash. Data on the virtual disk may not be accurate. Repair or replace the out-of-service drive and restore the virtual disk data from backups as described in ``Restoring a scheduled filesystem backup''.

vdisk: failed to spawn vddaemon error = x

The daemon was not spawned during reboot. RAID virtual disks may not be operational. Reboot the system and use ps -aef to make sure the vddaemon is running. If the problem occurs again, the system may be corrupted. Deconfigure the virtual disks and reinstall the Virtual Disk Manager package.

At least one vddaemon must be running. On multiprocessor systems, one is started per CPU for extra performance. If only one is running, then virtual disk performance may be reduced.

vdisk n: piece m is out of service

The driver was unable to configure disk piece m of the virtual disk. Piece m will be taken out of service. The array or mirror is functional, but you should repair or replace the disk drive as soon as possible and restore the parity of the failed drive. See ``Repairing a failed drive'' for more information.

vdisk n: piece m is being taken out of service

The driver failed to access disk piece m of the virtual disk. This disk piece will be taken out of service. The array or mirror is functional, but you should repair or replace the out-of-service disk drive as soon as possible and restore the parity for the failed drive.

vdisk n: not enough jobs to restore from spare

The maximum number of outstanding jobs for the specified virtual device has been reached. Restore the parity again. See ``Virtual disks'' in the Performance Guide.

vdisk n: restart of reconfiguration failed, restore from previous backup

A reconfiguration on vdisk n was in progress and was interrupted. The driver attempted to restart the reconfiguration from the interrupted point, but could not. The virtual disk will be offline. You should:

  1. Force the disk online.

  2. Restore the data for this virtual disk from backups as described in ``Restoring a scheduled filesystem backup''.

vdisk: write to data base piece m failed err = x

A write to dktab piece m of vdisk0 failed. Configuration information is not valid on that backup piece. You can either:

vdisk n: cannot restart reconfiguration if interrupted

Reconfiguration of the indicated virtual disk cannot be restarted if it is interrupted before completion. Verify that enough spool pieces are configured in vdisk0 for as many simultaneous fail-safe reconfigurations as will be executed. See ``Adding a configuration backup'' for information on modifying the virtual disk database.

vdisk: piece x is too small to backup configuration information

The piece number x of vdisk0 is too small to back up the current configuration of configured virtual disks. Increase the length of that piece or reconfigure vdisk0 to include a larger disk piece. See ``Adding a configuration backup'' for information on modifying the virtual disk database. 

Notice messages

Notice messages (error messages that begin with NOTICE:) alert you to error conditions where recovery has already occurred, but some operator intervention is necessary. Notice messages are recorded in /usr/adm/messages and are displayed on the console.

vdisk n: new configuration online, parity must be restored

A virtual disk has been configured for the first time.

vdisk n: bringing spare online

One of the data pieces failed in the virtual disk. The spare piece will be brought online to replace the failed drive.

vdisk n: spare is online

One of the data pieces failed in the virtual disk. The spare piece was brought online to replace the failed drive. Replace or repair the out-of-service data drive and restore the parity for the out-of-service disk piece. See ``Repairing a failed drive'' for more information.

vdisk n: cannot bring spare online during reconfiguration

A disk piece was taken out of service in the current or new configuration during the reconfigure operation. Either the current or new configuration contains a spare. The spare piece will not be brought online during the reconfigure operation. After the reconfiguration completes, the spare will be operational.

vdisk n: corrected error on piece m, no data lost

A bad block was detected and the disk array driver was unable to access piece m of the virtual disk during its first attempt. The driver recreated the data, retried the job, and was able to access the disk piece successfully on the second attempt. The array or mirror is functional. Ignore all previous bad block messages displayed by the underlying disk driver.

vdisk: job time-out; restarting job processing

During error recovery, there was a job time-out for one of the disk pieces in the virtual disk. The disk array driver will take the disk piece out of service and continue processing all remaining jobs. Restore the parity on the out-of-service disk piece.

vdisk: job pool is empty

The maximum number of outstanding jobs for the disk array driver has been reached. The driver will not process any more jobs until most of the outstanding jobs are complete. This limit is controlled by the VDJOBS kernel parameter. See ``Using configure to change kernel resources'' in the Performance Guide for information on changing kernel parameters. 

vdisk n: job queue is full vdisk n: piece pool is empty

The maximum number of outstanding jobs for the specified virtual device has been reached. The driver will not process any more jobs for this device until most of the outstanding jobs for the virtual disk are complete. This limit is controlled by the VDUNITJOBS kernel parameter under the Virtual disks item of performance tunables. See ``Virtual disks'' in the Performance Guide.