Category Archives: Virtualization

VMware Certified Advanced Professional — Data Center Virtualization Design 2019 Study Guide

Last year in September I’ve passed the VCAP6-DCV Deployment exam. After a busy first half of 2019 it’s time to start preparing the VMware Certified Advanced Professional — Data Center Virtualization Design 2019 exam.

There are many great study guides out there, but in most cases I need “my own study guide” to feel well prepared. I hope the step to publish my notes helps me to stay focused and motivated.

Image by Pexels from Pixabay

In opposite to the Deploy exam, the Design exam is a MC exam. 135 Minutes for 60 questions. Sounds easy, but it’s told that it’s one of the hardest exams available by VMware.

The exam is split into three sections:

  • Section 1 – Create a vSphere 6.5 Conceptual Design
  • Section 2 – Create a vSphere 6.x Logical Design from an Existing Conceptual Design
  • Section 3 – Create a vSphere 6.x Physical Design from an Existing Logical Design

Each section contains several objects.

  • Objective 3.1 – Transition from a logical design to a vSphere 6.x physical design
  • Objective 3.2 – Create a vSphere 6.x physical network design from an existing logical design
  • Objective 3.3 – Create a vSphere 6.x physical storage design from an existing logical design
  • Objective 3.4 – Determine appropriate computer resources for a vSphere 6.x physical design
  • Objective 3.5 – Determine virtual machine configuration for a vSphere 6.x physical design
  • Objective 3.6 – Determine data center management options for a vSphere 6.x physical design

I will try to cover each objective in a blog post and add a link here. Feel free to add comments, corrections and questions. :)

Out of space – first steps when a datastore runs out of space

This is a situation that never should happen, and I had to deal with it only a couple of times in more than 10y working with VMware vSphere/ ESXi. In most cases, the reason for this was the usage of thin-provisioned disks together with small datastores. Yes, that’s a bad design. Yes, this should never happen.

There is a nearly 100% chance that this setup will fail one day. Either because someone dumps much data into the VMs, or because of VM snapshots. But such a setip WILL FAIL one day.

Yesterday was one of these days and five VMs have stopped working on a small ESXi in a site of one of my customers. A quick look into the vCenter confirmed my first assumption. The datastore was full. My second thought: Why are there so many VMs on that small ESXi host, and why they are thin-provisioned?

The vCenter showed me the following message on each VM:

There is no more space for virtual disk $VMNAME.vmdk. You might be able to continue this session by freeing disk space on the relevant volume, and clicking Retry. Click Cancel to terminate this session.

Okay, what to do? First things first:

  1. Is there any unallocated space left on the RAID group? If yes, expand the VMFS.
  2. Are there any VM snapshots left? If yes, remove them
  3. Configure 100% memory reservation for the VMs. This removes the VM memory swap files and releases a decent amout of disk space
  4. Remove ISO files from the datastore
  5. Remove VMs (if you have a backup and they are not necessary for the business)

This should allow you to continue the operation of the VMs. To solve the problem permanently:

  1. Add disks to the server and expand the VMFS, or create a new datastore
  2. Add a NFS datastore
  3. Remove unnecessary VMs
  4. Setup a working monitoring , setup alarms, do not overprovision datastores, or switch to eager-zeroed disks

Such an issues should not happen. It is not rude to say here: This is simply due to bad design and lack of operational processes.

User vdcs does not have the expected uid 1006

Sorry for the long delay since my last blog post – busy times, but with lots of vSphere. :) Today, I did an upgrade of a standalone vCenter Server Appliance at one of my healthcare customers. The vCenter was on 6.0 U3 and I had to upgrade it to 6.7 U2. It was only a small deployment with three hosts, so nothing fancy. And as with in many other vSphere upgrades, I came across this warning message:

Warning User vdcs does not have the expected uid 1006
Resolution Please refer to the corresponding KB article.

I saw this message multiple times, but in the past, there was no KB article about this, only a VMTN thread. And this thread mentioned, that you can safely ignore this message, if you don’t use a Content Library. Confirmation enough to proceed with the upgrade. :)

Meanwhile, there is a KB article:

Uploading content to the library fails with error: Content Library Service does not have write permission on this storage backing (52559)

This is a statement from the KB article:

Note: You can safely ignore this message if you are not using Content Library Service before the upgrade, or using it only for libraries not backed by NFS storage.

Currently, I don’t have cusomters with NFS backed Content Libraries, but if you do, you might want to take a look at it. Especially if you have done an upgrade from 6.0 to 6.5 or 6.7 and you want to start using Content Libraries now.

Poor performance with Windows 10/ 2019 1809 on VMFS 6

THIS IS FIXED in ESXi 6.5 U3 and 6.7 U3.

See KB67426 (Performance issues with Windows 10 version 1809 VMs running on snapshots) for more information.

TL;DR: This bug is still up to date and has not been fixed yet! Some user in the VMTN thread mentioned a hotpatch from VMware, which seems to be pulled. A fix for this issue will be available with ESXi 6.5 U3 and 6.7 U3. The only workaround is to place VMs on VMFS 5 datastores, or avoid the use of snapshots if you have to use VMFS 6. I can confirm, that Windows 1903 is also affected.

One of my customers told me that they have massive performance problems with a Horizon View deployment at one of their customers. We talked about this issue and they mentioned, that this was related to Windows 10 1809 and VMFS 6. A short investigation showed, that this issue was well known, and even VMware is working on this. In their case, another IT company installed the Cisco HyperFlex solution and the engineer was unaware of this issue.

Image by Manfred Antranias Zimmer from Pixabay

What do we know so far? In October 2018 (!), shortly after the release of Windows 10 1809, a thread came up in the VMTN (windows 10 1809 slow). According to the posted test results, the issue occurs under the following conditions.

  • Windows 10 1809
  • VMware ESXi 6.5 or 6.7 (regardless from build level)
  • VM has at least one snapshot
  • VM is placed on a VMFS 6 datastore
  • Space reclamation is enabled or disabled

The “official” statement of the VMware support is:

The issue is identified to be due to some guest OS behavior change in this version of windows 10, 1809 w.r.t thin provisioned disks and snapshots, this has been confirmed as a bug and will be fixed in the following releases – 6.5 U3 and 6.7U3, which will be released within End of this year (2019).

https://communities.vmware.com/message/2848206#2848206

I don’t care if the root cause is VMFS 6 or Windows 10. But VMware and Microsoft needs to get this fixed fast! Just to make this clear: You will face the same issues, regardless if you run Windows 10 in a VM, use Windows 10 with Horizon View, or Windows 10 with Citrix. When VMFS 6 and Snapshots comes into play, you will ran into this performance issue.

I will update this blog post when I get some news.

Vembu BDR Essentials – Now up to 10 CPU Sockets

It is pretty common that vendors offer their products in special editions for SMB customers. VMware offers VMware vSphere Essentials and Essentials Plus, Veeam offers Veeam Backup Essentials, and Vembu has Vembu BDR Essentials.

Now Vembu has extended their Vembu BDR Essentials package significantly to address the needs of mid-sized businesses.

Vembu Technologies/ Vembu BDR Essentials/ Copyright by Vembu Technologies

Affordable backup for SMB customers

Most SMB virtualization deployments consists of two or three hosts, which makes 4 or 6 used CPU sockets. Because of this, Vembu BDR Essentials supportes up to 6 sockets or 50 VMs. Yes, 6 sockets OR 50 VMs. Vembu has no rised this limit to 10 Sockets OR 100 VMs! This allows customers to use up to five 2-socket hosts or 100 VMs with less than 10 sockets.

Feature Highlights

Vembu BDR Essentials support all important features:

  • Agentless VMBackup to backup VMs
  • Continuous Data Protection with support for RPOs of less than 15 minutes
  • Quick VM Recovery to get failed VMs up and running in minutes
  • Vembu Universal Explorer to restore individual items from Microsoft applications like Exchange, SharePoint, SQL and Active Directory
  • Replication of VMs Vembu OffsiteDR and Vembu CloudDR

Needless to say that Vembu BDR Essentials support VMware vSphere and Microsoft Hyper-V. If necessary, customer can upgrade to the Standard or Enterprise edition.

Securing VMs – vTPM, VBS, KMS and why you should not simply add a vTPM

Yesterday, I got one of these mails from a customer that make you think “Ehm, no”.

Can you please enable the TPM on all VMs.

The customer

The short answer is “Ehm, no!”. But I’m a kind guy, so I added some explanation to my answer.

Let’s add some context around this topic. The Trusted Platform Module (TPM) is a cryptoprocessor that offers various functions. For example, BitLocker uses the TPM to protect encryption keys. But there are another pretty interesting Windows features that require a TPM: “Virtualization-based Security“, or VBS. In contrast to BitLocker, VBS might be a feature that you want to use inside a VM.

VBS, uses virtualization features to create an isolated and secure region of memory, that is separated from the normal operating system. VBS is required if you want to use Windows Defender Credential Guard, which protects secrets like NTLM password hashes or Kerberos ticket-granting tickets against block pass-the-hash or pass-the-ticket (PtH) attacks. VBS is also required when you want to use Windows Defender Exploit Guard, or Windows Defender Application Control.

Credential Guard, Exploit Guard, and Application Control require a TPM 2.0 (and some other stuff, like UEFI, and some CPU extensions).

So, just add the vTPM module to a VM and you are ready to go? Ehm… no.

Prerequisites – or pitfalls

There are some prerequisites that must be met to use a vTPM:

  • the guest OS you use must be either Windows Server 2016, 2019 or Windows 10
  • the ESXi hosts must be at least ESXi 6.7, and
  • the virtual machine must use UEFI firmware

Okay, no big deal. But there is a fourth prerequisite that must be met:

  • your vSphere environment is configured for virtual machine encryption
imgflip.com

And now things might get complicated… or expensive… or both.

Why do you need VM encryption when you want to add a vTPM?

The TPM can be used to securly store encryption keys. So the vTPM must offer a similar feature. In case of the vTPM, the data is written to the “Non-Volatile Secure Storage” of the VM. This is the .nvram file in the VM directory. To protect this data, the .nvram file is encrypted using the vSphere VM Encryption feature. In addition to the .nvram file, parts of the VMX file, the swap file, the vmware.log, and some other files are also encrypted. But not the VMDKs, except you decide to encrypt them.

Before you can start using VM encryption, you have to add a Key Management Server (KMS) to your vCenter. And you better you add a KMS cluster to your vCenter, because you don’t want that the KMS is a single point of failure. The vCenter Server requests keys from the KMS. The KMS generates and stores the keys, and passes them to third party systems, like the vCenter, using the Key Management Interoperability Protocol (KMIP) 1.1

The KMS is not a part of the vCenter or of the PSC. It is a seperate solution you have to buy. The KMS must support KMIP 1.1. Take a look into the Key Management Server (KMS) compatibility documentation offered by VMware for supported KMS products.

Make sure that you think about administrator permissions, role-based access control (RBAC), or disaster recovery. When you have to deal with security, you don’t want to have users use a general, high priviedge administrator account. And think about disaster recovery! You won’t be able to start encrypted VMs, until you have re-established trust between your vCenter and your KMS (cluster). So be prepared, and do not implement a single KMS.

Summary

And this is why vTPM is nothing you simply enable on all VMs. Because it’s security. And security has to be done right.

Mike Foley has written two awesome blog posts about this topic. Make sure that you read them.

vSphere 6.7 – Virtual Trusted Platform Modules
Introducing support for Virtualization Based Security and Credential Guard in vSphere 6.7

Vembu VMBackup Deployment Scenarios

This posting is ~1 year years old. You should keep this in mind. IT is a short living business. This information might be outdated.

Vembu was founded in 2002 and has over 60,000 customers worldwide. One of their core products is the Vembu BDR Suite, which is an one stop solution to all your Backup and DR needs. I wrote a longer blog post about the Vembu BDR Suite.

One part of this suite is Vembu VMBackup, which is a data protection solution that is designed to backup VMware and Microsoft Hyper-V virtual machines secure and simple way. The offered features are compareable to Veeam Backup & Replication.

The core component of Vembu VMBackup is the Vembu BDR Backup server, which can be deployed in two ways:

  • On-premises Deployment
  • Hybrid Deployment

virnuls/ pixabay.com/ Creative Commons CC0

On-premises Deployment

In this deployment setup, customers deploy the product in their local environment. I think this is the most typical deployment type, where you install VMBackup on a physical server, in a VM or deployed as virtual appliance. Backup data is transferred  over LAN or SAN, and is written to the storage repositories. The Vembu BDR server acts as a centralized management point, where user can configure and manage backup and replication jobs.

In a simple deployment, the Vembu BDR Backup Server will act as backup proxy and management server instance. It is perfect for a small number of VMs with less simultaneous backup traffic and for VMBackup evaluation. The typical SMB environment.

If you seperate the management server from the backup proxy, the deployment changes to a distributed deployment. If necessary, multiple backup proxies can be deployed on physical hosts or in virtual machines. Customers can also deploy multiple BDR backups servers, which allows load balancing across a cluster of BDR backup servers. Pretty cool for bigger and/ or distributed environments. It allows customers to scale their backup solution over time.

On-Premises Deployment/ Vembu Technologies/ Copyright by Vembu Technologies

Hybrid Deployment

Backup is good, but having a backup copy offsite is better. Vembu OffsiteDR allows customers to create a copy of their backup data and transfer it to a DR location over LAN/ WAN. OffsiteDR instantly transfers backup data from a BDR Backup Server to an OffsiteDR server. Customers can restore failed VMs or missing files and application data in their DR site, or they can rebuild a failed BDR Backup Server from an OffsiteDR server.

Vembu Technologies/ OffsiteDR/ Copyright by Vembu Technologies

If customers don’t have a DR site, they can use Vembu CloudDR push a backup copy to the Vembu cloud. The data stored in the Vembu Cloud can easily be restored at anytime and to any location. Vembu uses AWS across all continents to asure the availability of their cloud services.

Vembu Technologies/ CloudDR/ Copyright by Vembu Technologies

Customers have the choice

It is obvious that customers have the freedom of choice how they deploy Vembu VMBackup.I like the virtual appliance approach, which eliminates the need for additional Windows Server licenses. More and more vendors tend to offer appliances for their products, just think about VMware vCenter Server Appliance, vRealize Orchestrator etc. So why not offer a backup server appliance? I wish other vendors would adopt this…

Another nice feature is the scale-out capability of Vembu. Start small and grow over time. Perfect for SMBs that want to start small and grow over time.

“Cannot execute upgrade script on host” during ESXi 6.5 upgrade

This posting is ~1 year years old. You should keep this in mind. IT is a short living business. This information might be outdated.

I was onsite at one of my customers to update a small VMware vSphere 6.0 U3 environment to 6.5 U2c. The environment consists of three hosts. Two hosts in a cluster, and a third host is only used to run a HPE StoreVirtual Failover Manager.

The update of the first host, using the Update Manager and a HPE custom ESX 6.5 image, was pretty flawless. But the update of the second host failed with “Cannot execute upgrade script on host”

typographyimages/ pixabay.com/ Creative Commons CC0

I checked the host and found it with ESXi 6.5 installed. But I was missing one of the five iSCSI datastores. Then I tried to patch the host with the latest patches and hit “Remidiate”. The task failed with “Cannot execute upgrade script on host”. So I did a rollback to ESXi 6.0 and tried the update again, but this time using ILO and the HPE custom ISO. But the result was the same: The host was running ESXi 6.5 after the update, but the upgrade failed with the “Upgrade Script” error. After this attempt, the host was unable to mount any of the iSCSI datastores. This was because the datastores were mounted ATS-only on the other host, and the failed host was unable to mount the datastores in this mode. Very strange…

I checked the vua.log and found this error message:

Focus on this part of the error message:

The upgrade script failed due to an illegal character in the output of esxcfg-info. First of all, I had to find out what this 0x80 character is. I checked UTF-8 and the windows1252 encoding, and found out, that 0x80 is the € (Euro) symbol in the windows-1252 encoding. I searched the output of esxcfg-info for the € symbol – and found it.

But how to get rid of it? Where does it hide in the ESXi config? I scrolled a bit up and down around the € symbol. A bit above, I found a reference to HPE_SATP_LH . This took immidiately my attention, because the customer is using StoreVirtual VSA and StoreVirtual HW appliances.

Now, my second educated guess of the day came into play. I checked the installed VIBs, and found the StoreVirtual Multipathing Extension installed on the failed host – but not on the host, where the ESXi 6.5 update was successful.

I removed the VIB from the buggy host, did a reboot, tried to update the host with the latest patches – with success! The cross-checking showed, that the € symbol was missing in the esxcfg-info output of the host that was upgraded first. I don’t have a clue why the StoreVirtual Multipathing Extension caused this error. The customer and I decided to not install the StoreVirtual Multipathing Extension again.

High CPU usage on Citrix ADC VPX

This posting is ~1 year years old. You should keep this in mind. IT is a short living business. This information might be outdated.

While building a small Citrix NetScaler… ehm… ADC VPX (I really hate this name…) lab environment, I noticed that the fan of my Lenovo T480s was spinning up. I was wondering why, because the VPX VM was just running for a couple of minutes – without any load. But the task manager told me, that the VMware Workstation Process was consuming 25% (I have a Intel i5 Quad Core CPU) CPU. So VMware Workstation was just eating a whole CPU core without doing anything. I would not care, but the fan… And it reminded me, that I’ve seen an similar behaviour in various VPX deployments on VMWare ESXi.

Fifaliana/ pixabay.com/ Creative Commons CC0

A quick search lead me to this Citrix Support Knowledge Center article: High CPU Usage on NetScaler VPX Reported on VMware ESXi Version 6.0. That’s exactly what I’ve observed.

The solution is setting the parameter cpuyield to yes.

The VPX does not need a reboot. Short after setting the parameter, the fan stopped spinning. Have I mentioned how I love silence on my desk? I’m pretty happy that my T480s is a really quiet laptop.

But what does this parameter is used for? In pretty simple words: To allocate CPU cycles, that are not used by other VMs. Until ADC VPX 11.1, the VPX was sharing CPU with other VMs. This changed with ADC VPX 12.0. Since this release, the VPX was like a child, that was playing with their favorite toy just to make sure, that no other child can play with it. Not very polite…

This is a quote from the Support Knowledge Center article:

Set ns vpxparam parameters:
-cpuyield: Release or do not release of allocated but unused CPU resources.

YES: Allow allocated but unused CPU resources to be used by another VM.

NO: Reserve all CPU resources for the VM to which they have been allocated. This option shows higher percentage in hypervisor for VPX CPU usage.
DEFAULT: NO

I don’t think that I would change this in production. But for lab environments, especially if you run this on VMware Workstation, I would set -cpuyield to yes .

Powering on a VM with shared VMDK fails after extending a EagerZeroedThick VMDK

This posting is ~1 year years old. You should keep this in mind. IT is a short living business. This information might be outdated.

I hope that you are not reading this blog post while searching for a solution for a failed cluster. If so, feel free to leave a comment if this blog post saved your evening or weekend. :)

Last friday, a change at one of my customers went horribly wrong. I was not onsite, but they contacted me during the night from friday to saturday, because their most important Windows Server Failover Cluster was unable to start after extending a shared VMDK.

cripi/ pixabay.com/ Creative Commons CC0

They tried something pretty simple: Extending an virtual disk of a VM. That is something most of us doing pretty often. The customer did this also pretty often. It was a well known task… Except the fact, that the VM was part of a Windows Server Failover Cluster. With shared VMDKs. And the disks were EagerZeroedThick, because this is a requirement for shared VMDKs.

They extended the disk using the vSphere Web Client. And at this point, the change was doomed to fail. They tried to power-on the VMs, but all they got was this error:

VMware ESX cannot open the virtual disk, “/vmfs/volumes/4c549ecd-66066010-e610-002354a2261b/VMNAME/VMDKNAME.vmdk” for clustering. Please verify that the virtual disk was created using the ‘thick’ option.

A shared VMDK is a VMDK in multiwriter mode. This VMDK has to be created as Thick Provision Eager Zeroed. And if you wish to extend this VMDK, you must use vmkfstools with the option -d eagerzeroedthick. If you extend the VMDK using the Web Client, the extended portion of the disk will become LazyZeroed!

VMware has described this behaviour in the KB1033570 (Powering on the virtual machine fails with the error: Thin/TBZ disks cannot be opened in multiwriter mode). There is also a blog post by Cormac Hogan at VMware, who has described this behaviour.

That’s a screenshot from the failed cluster. Check out the type of the disk (Thick-Provision Lazy-Zeroed).

Patrick Terlisten/ vcloudnine.de/ Creative Commons CC0

You must use vmkfstools to extend a shared VMDK – but vmkfstools is also the solution, if you have trapped into this pitfall. Clone the VMDK with option -d eagerzeroedthick.

Another solution, which was new to me, is to use Storage vMotion. You can migrate the “broken” VMDK to another datastore and change the the disk format during Storage vMotion. This solution is described in the “Notes” section of KB1033570.

Both ways will fix the problem. The result will be a Thick Provision Eager Zeroed VMDK, which will allow the VMs to be successfully powered on.