Out of space - first steps when a datastore runs out of space

This is a situation that never should happen, and I had to deal with it only a couple of times in more than 10y working with VMware vSphere/ ESXi. In most cases, the reason for this was the usage of thin-provisioned disks together with small datastores. Yes, that’s a bad design. Yes, this should never happen.

There is a nearly 100% chance that this setup will fail one day. Either because someone dumps much data into the VMs, or because of VM snapshots. But such a setip WILL FAIL one day.

Yesterday was one of these days and five VMs have stopped working on a small ESXi in a site of one of my customers. A quick look into the vCenter confirmed my first assumption. The datastore was full. My second thought: Why are there so many VMs on that small ESXi host, and why they are thin-provisioned?

The vCenter showed me the following message on each VM:

There is no more space for virtual disk $VMNAME.vmdk. You might be able to continue this session by freeing disk space on the relevant volume, and clicking Retry. Click Cancel to terminate this session.

Okay, what to do? First things first:

  1. Is there any unallocated space left on the RAID group? If yes, expand the VMFS.
  2. Are there any VM snapshots left? If yes, remove them
  3. Configure 100% memory reservation for the VMs. This removes the VM memory swap files and releases a decent amout of disk space
  4. Remove ISO files from the datastore
  5. Remove VMs (if you have a backup and they are not necessary for the business)

This should allow you to continue the operation of the VMs. To solve the problem permanently:

  1. Add disks to the server and expand the VMFS, or create a new datastore
  2. Add a NFS datastore
  3. Remove unnecessary VMs
  4. Setup a working monitoring , setup alarms, do not overprovision datastores, or switch to eager-zeroed disks

Such an issues should not happen. It is not rude to say here: This is simply due to bad design and lack of operational processes.