VMware vSAN Cluster Maintenance: Remediation Steps for Compliance

Remediation Steps for Compliance

Prev Question Next Question

Question

During planned maintenance of a four-node vSAN cluster, an outsourced IT contractor accidentally removed a 2.5" SSD cache disk from one of the vSAN nodes.

The storage policy has been configured with FTT=1 RAID 1, and the disk management UI marked the disk group as absent.

Which remediation steps should the administrator select to ensure VMs become compliant with the storage policy as soon as possible?

Answers

A. replace the SSD cache disk > rescan > add disk back in disk group

B. enter host in maintenance > remove from disk group > shutdown host > replace disk > power on host

C. remove disk group > enter host in maintenance > fully evacuate all data > exit maintenance

D. vSAN Health Check > retest > object health > repair objects immediately.

Show Answer

Explanations

Click on the arrows to vote for the correct answer

A. B. C. D.

The accidental removal of an SSD cache disk from a vSAN node can cause VMs to become non-compliant with the storage policy. In this scenario, the storage policy is configured with FTT=1 RAID 1, and the disk group has been marked as absent in the disk management UI. To remediate this issue, the administrator can choose from the following options:

A. Replace the SSD cache disk > rescan > add disk back in disk group: This option involves physically replacing the SSD cache disk and then rescanning the disk group to detect the new disk. Once the new disk is detected, it can be added back to the disk group. This option is feasible if the failed disk is still available and the administrator has the replacement disk on hand.

B. Enter host in maintenance > remove from disk group > shutdown host > replace disk > power on host: This option involves placing the affected host in maintenance mode, removing the disk group, shutting down the host, replacing the failed disk, and then powering on the host. Once the host is back online, the administrator can add the disk group back and ensure that it is compliant with the storage policy. This option is feasible if the administrator has a spare disk and can afford to take the affected host offline.

C. Remove disk group > enter host in maintenance > fully evacuate all data > exit maintenance: This option involves removing the disk group and placing the affected host in maintenance mode. The administrator then fully evacuates all data from the host and exits maintenance mode. Once the host is back online, the administrator can add the disk group back and ensure that it is compliant with the storage policy. This option is feasible if the administrator does not have a spare disk and can afford to migrate all data from the affected host.

D. vSAN Health Check > retest > object health > repair objects immediately: This option involves running a vSAN Health Check to detect any issues with the vSAN cluster. The administrator can then retest and check the object health of the vSAN cluster. If any objects are found to be non-compliant, the administrator can repair them immediately. This option is feasible if the failed disk has not caused any data loss or VM downtime.

In conclusion, the appropriate remediation step will depend on the specific circumstances of the failure, including the availability of replacement hardware, the tolerance for downtime, and the potential impact on data integrity. The administrator should carefully evaluate each option and select the one that best balances the need for data availability with the need for system integrity.

Prev Question Next Question