VMware ESXi 5.5 host doesn't mount VMFS 5 datastore

Yesterday I stumbled over a forum post in a german VMware forum. A user noticed after a vSphere 5.5 update, that a newly updated ESXi 5.5 hosts wasn’t able to mount some datastores. The host was updated with a HP customized ESXi 5.5 Image. The other two hosts, ESXi 5.1 installed from a HP customized image, had no problems. A HP P2000 G3 MSA Array with iSCSI was used as shared storage. The datastores with VMFS version 5.54 were mounted. Only datastores with VMFS 5.58 were not mouted. The user evacuated the VMs off one of the datastores, and then deleted and recreated the datastore. The recreated datastore appeared for a short moment and than disappered again.

 I knew the problem. It’s a mixture of a problem caused by a change in the HP customized images and a known behaviour of ESXi with VMFS 5 datastores.

Changes in the HP customized ESXi images

HP removed the P2000G3 VAAI-Plug-in for ESXi 5.x and 4.1 in September 2013 due to an incompatibility to HP Smart Array P711M-, P712M and P721M SAS RAID controllers and HP P200 G3 arrays running firmware TS230 or TS240. The incompatibility can cause messages like this:

012-10-25T15:22:34.347Z cpu28:8220)WARNING: NMP: nmp_DeviceRetryCommand:133:Device
2012-10-25T15:22:35.249Z cpu24:8839)WARNING: NMP: nmpDeviceAttemptFailover:599:Retry world failover device "naa.600c0ff0001344cecd99725001000000" - issuing command 0x4124419f0100
2012-10-25T15:22:35.249Z cpu4:8196)<4>hpsa 0000:0c:00.0: Device:C1:B0:T2:L1 Command:0xc2 Command Invalid. 
2012-10-25T15:22:35.249Z cpu28:8220)WARNING: NMP: nmpCompleteRetryForPath:348:Retry cmd 0xc2 (0x4124419f0100) to dev "naa.600c0ff0001344cecd99725001000000" failed on path "vmhba8:C0:T2:L1" H:0x1 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.
2012-10-25T15:22:35.249Z cpu28:8220)WARNING: NMP: nmpCompleteRetryForPath:378:Logical device "naa.600c0ff0001344cecd99725001000000": awaiting fast path state update before retrying failed command again...

These messages are associated with problems regarding the creation of datastores or mounting datastores. HP published a customer advisory in February 2013 to address this issue.

VMFS Locking Mechanisms

VMFS supports two locking mechanisms:

  • SCSI reservations
  • Atomic Test and Set (ATS)

Locking is necessary in an environment, where multiple hosts writes to the same filesystem. It should prevent a situation, where multiple hosts concurrently writing to the same blocks. SCSI reservations are the good old way. SCSI reservations are used, if a storage system doesn’t support VAAI. A SCSI reservation locks a whole LUN. No other host can write to it until the reservation is removed. This can lead to performance problems. Many of you know the problem with to many SCSI reservations. Atomic Test and Set (ATS) is more intelligent. ATS is capable to lock only a specific sector of a LUN. But the storage system has support this feature. For more information about VAAI, I recommend a blog article written by Chris Wahl.

You maybe know that there are different VMFS versions. And because of this, there are several situations where SCSI reservations are used instead of ATS, but also where ATS-only locking is used. I took this table from the vSphere documentation.

Storage Devices New VMFS5 Upgraded VMFS5 VMFS3
Single extent ATS only ATS, but can revert to SCSI reservations ATS, but can revert to SCSI reservations
Multiple extents Spans only over ATS-capable devices ATS except when locks on non-head ATS except when locks on non-head

An extend is a LUN which is used to expand a VMFS datastore by concatanating multiple LUNs together. Multiple extends can served from different storage systems (with limitations…). Some words to the VMFS 5 versions:

ESXi Release ESXi 5.0 ESXi 5.1 ESXi 5.5
VMFS 5 Version 5.54 5.58 5.60

Putting the pieces together

We have to assume, that the datastores, that couldn’t be mounted, were created on a ESXi 5.1 host with VAAI plug-in for the P2000 G3 installed. So it was single-extend and ATS-only. One host was updated with a HP customized image to ESXi 5.5. This image doesn’t include the VAAI plug-in. So the new 5.5 host doesn’t support VAAI. No VAAI, no ATS. Because of this, the the host was able to mount the older, VMFS 5.54 datastores, but not the newer 5.58. I assume that the 5.54 datastores were updated from an older VMFS, so that they were ATS capable, but can also revert to SCSI reservation. As the user deleted the datastore and recreated it on a 5.1 host, the datastore was again version 5.58 and ATS-only.

The workaround and the solution

The workaround is to install the HP P2000 Software Plug-in for VMware VAAI. It isn’t a workaround to disable the Atomic Test and Set (ATS) primitive. If you have VMFS 5 ATS-only datastores, you wouldn’t be able to mount them. In this case you also have to disable the ATS-only mode. A solution is to do a firmware update on the P2000 G3. A few weeks ago HP released a new firmware for the P2000 G3. With the firmware release TS251R004 the P2000 G3 VAAI plug-in is no longer supported, because T10-compliance for VAAI was added.

I like to outline, that this problem could occur with any other storage that needs a VAAI plugin. If you update you hosts and you forget to install this plugin, you would have the same problems.