Category Archives: Virtualization

Exam prep & experience: VMware Certified Advanced Professional 6 – Data Center Virtualization Deployment Exam (VCAP6-DCV Deploy)

TL;DR: I have passed the VCAP6-DCV Deploy exam today. :) I want to thank Fred, Dominik, Frank and Jens-Henrik for kicking my ass. Without you, I would have taken the VCP 6.5 delta exam. Thank you!

As often, the whole thing started with a tweet. A tweet about my expiring VMware Certified Professional (VCP) certification.

To my surprise, several of my followers recommended to go for the VCAP6-DCV Deployment instead. Okay, so many smart people can’t be wrong.

I booked the exam, prepared for the exam, took the exam today – and passed!

27 questions in 205 minutes (25 minutes extension for non-native speaker) is a pretty challenging task. I was able to answer all questions in the given time. I left the test center with a good feeling, and after an hour I got the mail that I have passed the exam! Woohoo!

Preparation is everything

Preparation and time management. That’s all. Easier said than done. ;)

Make sure that you have read the exam guide. This document is intended to provide detailed information about the objectives covered by the exam. It was crucial for me to get a feeling about what I have to learn. I have been working with VMware vSphere since ESX 2.5, that’s a pretty long time, yet I do not know everything. Especially things like vSphere Data Protection, Auto Deploy or some certificate-related tasks are not day-to-day tasks.

I premillary worked with Kyle Jenners VCAP6-DCV Deployment Study Guide and and the VMware Hands-on-Lab. The VCAP6-DCV Deployment is not a MC test, like the VCP exams. You have to do real tasks. So experience is crucial to pass the exam.

Because I don’t have a lab, I used the VMware Hands-on Lab instead. I can recommend these three courses:

  • HOL-1911-01-SDC (What’s New in VMware vSphere 6.7)
  • HOL-1808-01-HCI (vSAN v6.6.1 – Getting Started), and
  • HOL-1827-01-HCI (VMware Storage – Virtual Volumes and Storage Policy Based Management)

Unfortunately, there is no course available that covers vSphere Data Protection and vSphere Replication.

But there was also another reason, why I have used the HOL: The VCAP exam environment is based on the interface of the VMware HOL. This was pretty helpful, because I was able to get in touch with the interface prior the exam.

Due to security restrictions, the exam environment does not support some keys and shortcuts, e.g. CRTL and ALT. To my surprise, the Backspace key worked in my enviornment. Many people stated, that the Backspace key isn’t working. Because of this, VMware has published an Interface Guide. Make sure to read it! And learn how to get around these limitations. There is also a pretty handy YouTube video with tips and tricks:

To test yourself, you can use this free VCAP-DCV simulator. The simulation provides scenarios that are equal to the scenarios from the exam. This is pretty handy to get a feeling of how good you are prepared for the exam.

VCAP6-DCV Deploy Exam Simulator – FREE

You have ~ 7 minutes per questions. If you don’t have an idea how to answer a question, move on! Write down the number and some keywords, then move onto the next question. Instead of waiting for tasks to finish, move onto the next question and come back later to check the task result.

I took the exam at Blue Consult in Krefeld (Germany). This was a recommendation of one of my followers (Thanks Dominik!). Fortunately, Blue Consult has keyboards with US layout in their test center, which makes it much easier for me. The performance of the exam environment was quite good. No lags or hanging sessions.

What’s next?

I will book the VMware Certified Advanced Professional 6.5 – Data Center Virtualization Design exam as soon as I passed the NetScaler CCP-N exam, which I have to take until end of December 2018 (Thank you Citrix… NOT!).

VCIX6.5-DCV FTW! :)

Veeam backups fails because of time differences

Last week I had an interesting incident at a customer. The customer reported that one of multiple Veeam backup jobs jobs constantly failed.

jarmoluk/ pixabay.com/ Creative Commons CC0

The backup job included two VMs, and the backup of one of these VMs failed with this error:

The verified the used credentials for that job, but re-entering the password does not solved the issue. I then checked the Veeam backup logs located under %ProgramData%\Veeam\Backup (look for the Agent.Job_Name.Source.VM_Name.vmdk.log) and found VDDK Error 3014:

The user, that was used to connect to the vCenter, was an Active Directory located account. The account were granted administrator privileges root of the vCenter. Switching from an AD located account to Administrator@vsphere.local solved the issue. Next stop: vmware-sts-idmd.log on the vCenter Server appliance. The error found in this log confirmed my theory, that there was an issue with the authentication itself, not an issue with the AD located account.

To make a long story short: Time differences. The vCenter, the ESXi hosts and some servers had the wrong time. vCenter and ESXi hosts were using the Domain Controllers as time source.

This is the ntpq  output of the vCenter. You might notice the jitter values on the right side, both noted in milliseconds.

After some investigation, the root cause seemed to be a bad DCF77 receiver, which was connected to the domain controller that was hosting the PDC Emulator role. The DCF77 receiver was connected using an USB-2-LAN converter. Instead of using a DCF77 receiver, the customer and I implemented a NTP hierarchy using a valid NTP source on the internet (pool.ntp.org).

vSphere Distributed Switch health check fails on HPE Comware switches

During the replacement of some VMware ESXi hosts at a customer, I discovered a recurrent failure of the vSphere Distributed Switch health checks. A VLAN and MTU mismatch was reported. On the physical side, the ESXi hosts were connected to two HPE 5820 switches, that were configured as an IRF stack. Inside the VMware bubble, the hosts were sharing a vSphere Distributed Switch.

cre8tive / pixelio.de

The switch ports of the old ESXi hosts were configured as Hybrid ports. The switch ports of the new hosts were configured as Trunk ports, to streamline the switch and port configuration.

Some words about port types

Comware knows three different port types:

  • Access
  • Hybrid
  • Trunk

If you were familiar with Cisco, you will know Access and Trunk ports. If you were familiar with HPE ProCurve or Alcatel-Lucent Enterprise, these two port types refer to untagged and tagged ports.

So what is a Hybrid port? A Hybrid port can belong to multiple VLANs where they can be untagged and tagged. Yes, multiple untagged VLANs on a port are possible, but the switch will need additional information to bridge the traffic into correct untagged VLANs. This additional information can be  MAC addresses, IP addresses, LLDP-MED etc. Typically, hybrid ports are used for in VoIP deployments.

The benefit of a Hybrid port is, that I can put the native VLAN of a specific port, which is often referred as Port VLAN identifier (PVID), as a tagged VLAN on that port. This configuration allows, that all dvPortGroups have a VLAN tag assigned, even if the VLAN tag represents the native VLAN of a switch port.

Failing health checks

A failed health check rises a vCenter alarm. In my case, a VLAN and MTU alarm was reported. In both cases, VLAN 1 was causing the error. According to VMware, the three main causes for failed health checks are:

  • Mismatched VLAN trunks between a vSphere distributed switch and physical switch
  • Mismatched MTU settings between physical network adapters, distributed switches, and physical switch ports
  • Mismatched virtual switch teaming policies for the physical switch port-channel settings.

Let’s take a look at the port configuration on the Comware switch:

As you can see, this is a normal trunk port. All VLANs will be passed to the host. This is an except from the  display interface Ten-GigabitEthernet1/0/9  output:

The native VLAN is 1, this is the default configuration. Traffic, that is received and sent from a trunk port, is always tagged with a VLAN id of the originating VLAN – except traffic from the default (native) VLAN! This traffic is sent without a VLAN tag, and if frames were received with a VLAN tag, this frames will be dropped!

If you have a dvPortGroup for the default (native) VLAN, and this dvPortGroup is sending tagged frames, the frames will be dropped if you use a “standard” trunk port. And this is why the health check fails!

Ways to resolve this issue

In my case, the dvPortGroup was configured for VLAN 1, which is the default (native) VLAN on the switch ports.

There are two ways to solve this issue:

  • Remove the VLAN tag from the dvPortGroup configuration
  • Change the PVID for the trunk port

To change the PVID for a trunk port, you have to enter the following command in the interface context:

You have to change the PVID on all ESXi facing switch ports. You can use a non-existing VLAN ID for this.

vSphere Distributed Switch health check will switch to green for VLAN and MTU immediately.

Please note, that this is not the solution for all VLAN-related problems. You should make sure that you are not getting any side effects.

Unsupported hardware family ‘vmx-06’

A customer of mine got an appliance from a software vendor. The appliance was delivered as ZIP file with a VMDK, a MF, and an OVF file. Unfortunately, the appliance was created with VMware Workstation 6.0 with virtual machine hardware version 6, which is incompatible with VMware ESXi (Virtual machine hardware versions). During deployment, my customer got this error:

The OVF file includes a line with the VM hardware version.

If you change this line from vmx-06 to vmx-07, the hash of the OVF changes, and you will get an error during the deployment of the appliance because of the wrong file hash.

Solution

You have to change the SHA256 hash of the OVF, which is included in the MF file.

To create the new SHA256 hash, you can use the PowerShell cmdlet Get-FileHash .

Replace the hash and save the MF file. Then re-deploy the appliance.

Andreas Lesslhumer wrote a similar blog post in 2015:
“Unsupported hardware family vmx-10” during OVF import

Workaround for broken Windows 10 Start Menus with floating desktops

Last month, I wrote about a very annoying issue, that I discovered during a Windows 10 VDI deployment: Roaming of the AppData\Local folder breaks the Start Menu of Windows 10 Enterprise (Roaming of AppData\Local breaks Windows 10 Start Menu). During research, I stumbled over dozens of threads about this issue.

Today, after hours and hours of testing, troubleshooting and reading, I might have found a solution.

The environment

Currently I don’t know if this is a workaround, a weird hack, or no solution at all. Maybe it was luck that none of my 2074203423 logins at different linked-clones resulted in a broken start menu. The customer is running:

  • Horizon View 7.1
  • Windows 10 Enterprise N LTSB 2016 (1607)
  • View Agent 7.1 with enabled Persona Management

Searching for a solution

During my tests, I tried to discover WHY the TileDataLayer breaks. As I wrote in my earlier blog post, it is sufficient to delete the TileDataLayer folder. The folder will be recreated during the next logon, and the start menu is working again. Today, I added path for path to “Files and folders excluded from roamin” GPO setting, and at some point I had a working start menu. With this in mind, I did some research and stumbled over a VMware Communities thread (Vmware Horizon View 7.0.3 – Linked clone – Persistent mode – Persona management – Windows 10 (1607) – -> Windows 10 Start Menu doesn’t work)

User oliober did the same: He roamed only a couple of folders, one of them is the TileDataLayer folder, but not the whole Appdata\Local folder.

The “solution”

To make a long story short: You have to enable the roaming of AppData\Local, but then you exclude AppData\Local, and add only necessary folders to the exclusion list of the exclusion. Sounds funny, but it seems to work.

Feedback is welcome!

I am very interested in feedback. It would be great if you have the chance to verify this behaviour. Please leave a comment with your results.

As I already said: I don’t know if this is a workaround, a hack, a solution, or no solution at all. But for now, it seems to work. Microsoft deprecated TileDataLayer in Windows 10 1703. So for this new Windows 10 build, we have to find another working solution. The above described “solution” only works for 1607. But if you are using the Long Term Service Branch, this solution will work for the next 10 years. ;)

Some thoughts about using Windows Server 2012 R2 instead of Windows 10 for VDI

Disclaimer: The information from this blog post is provided on an “AS IS” basis, without warranties, both express and implied.

Last week, I had an interesting discussion with a customer. Some months back, the customer has decided to kick-off a PoC for a VMware Horizon View based virtual desktop infrastructure (VDI). He is currently using fat-clients with Windows 8.1, and the new environment should run on Windows 10 Enterprise. Last week, we discussed the idea of using Windows Server 2012 R2 as desktop OS.

Horizon View with Windows Server as desktop OS?

My customer has planned to use VMware Horizon View. The latest release is VMware Horizon View 7.2. VMware KB article 2150295 (Supported Guest Operating Systems for Horizon Agent and Remote Experience) lists all supported (non-Windows 10) Microsoft operating systems for different Horizon VIew releases. This article  shows, that Windows Server 2012 R2 (Standard and Datacenter) are both supported with all Horizon View releases, starting with Horizon View 7.0. The installation of a View Agent is supported, and you can create full- and linked-clone desktop pools. But there is also another important KB article: 2150305 (Feature Support Matrix for Horizon Agent). This article lists all available features, and whether they are compatible with a specific OS or not. According to this artice, the

  • Windows Media MMR,
  • VMware Client IP Transparency, and the
  • Horizon Virtualization Pack for Skype for Business

are not supported with Windows Server 2012 R2 and 2016.

From the support perspective, it’s safe to use Windows Server 2012 R2, or 2016, as desktop OS for a VMware Horizon View based virtual desktop infrastructure.

Licensing

Licensing Microsoft Windows for VDI is PITA. It’s all about the virtual desktop access rights, that can be acquired on two different ways:

  • Software Assurance (SA), or
  • Windows Virtual Desktop Access (VDA)

SA and VDA are available per-user and per-device.

Source: Microsoft

You need a SA or a VDA for each accessing device or user. There is no need for additional licenses for your virtual desktops! You will get the right to install Windows 10 Enterprise on your virtual desktops. This includes the  LTSB (Long Term Servicing Branch). LTSP offers updates without delivery of new features for the duration of mainstream support (5 years), and extended support (5 years). Another side effect is, that LTSB does not include most of the annoying Windows apps.

Do yourself a favor, and do not try to setup a VDI with Windows 10 Professional…

Service providers, that offer Desktop-as-a-Service (DaaS), are explicitly excluded from this licensing! They must license their stuff according to Microsofts Services Provider Licensing Agreement (SPLA).

How do I have to license Windows Server 2012 R2, if I want to use Windows Server as desktop OS? Windows Server datacenter licensing allows you to run an unlimited number of server VMs on your licensed hardware. To be clear: Windows Server is licensed per physical server, and there is nothing like license mobility! To license the access to the server, your need two different licenses:

  • Windows Server CAL (device or user), and
  • Remote Desktop Services (RDS) CAL (device or user)

The Windows Server CAL is needed for any access to a Microsoft Windows Server from a client, regardless what service is used (even for DHCP). The RDS CAL must be asssigned to any user or device, that is directly or indirectly interacting with the Windows Server desktop, or using a remote desktop access technology (RDS, PCoIP, Blast Extreme etc.) to access the Windows Server desktop.

With this license setup, you have licensed the Windows Server VM itself, and also the access to this VM. There is no need to purchase a SA or VDA.

Do the math

With this in mind, you have to do the math. Compare the licensing costs for Windows 10 and Windows Server 2012 R2/ 2016 in your specific situation. Setup a PoC to verify your requirements, and the support of your software on Windows Server.

Windows Server can be an interesting alternative compared to Windows 10. Maybe some of you, that already use it with Horizon View, have time to add some comments to this blog post. It would be nice to get some feedback about this topic.

Roaming of AppData\Local breaks Windows 10 Start Menu

One of my customers has started a project to create a Windows 10 Enterprise (LTSB 2016) master for their VMware Horizon View environment. Beside the fact (okay, it is more a personal feeling), that Windows 10 is a real PITA for VDI, I noticed an interesting issue during tests.

The issue

For convenience, I adopted some settings of the current Persona Management GPO for Windows 7 for the new Windows 10 environment. During the tests, the customer and I noticed a strange behaviour: After login, the start menu won’t open. The only solution was to logoff and delete the persona folder (most folders are redirected using native Folder Redirections, not the redirection feature of the View Persona Management). While debugging this issue, I found this error in the eventlog.

If you google this, you will find many, many threads about this. Most solutions describe, that you have to delete the profile due to wrong permissions on profile folders and/ or registry hives. I used Microsofts Procmon to verify this, but I was unable to confirm that. After further investigations, I found hints, that the TileDataLayer database could be the problem. The database is located in AppData\Local\TileDataLayer\Database and stores the installed apps, programs, and tiles for the Start Menu. AFAIK it also includes the Start Menu layout.

The database is part of the local part of the profile. A quick check proved, that it’s sufficient to delete the TileDataLayer folder. It will be recreated during the next logon.

The solution

It’s simple: Don’t roam AppData\Local. It should not be necessary to roam the local part of a users profile. The View Persona Management offers an option to roam the local part the profile. You can configure this behaviour with a GPO setting.

You can find this setting under Computer Configuration > Administrative Templates > VMware View Agent Configuration > Persona Management > Roaming & Synchronization

I was able to reproduce the observed behavior in my lab with Windows 10 Enterprise (LTSB 2016) and Horizon View 7.0.3. Because of this, I tend to recommend not to roam AppData\Local.

Wrong iovDisableIR setting on ProLiant Gen8 might cause a PSOD

TL;DR: There’s a script at the bottom of the page that fixes the issue.

Some days ago, this HPE customer advisory caught my attention:

Advisory: (Revision) VMware – HPE ProLiant Gen8 Servers running VMware ESXi 5.5 Patch 10, VMware ESXi 6.0 Patch 4, Or VMware ESXi 6.5 May Experience Purple Screen Of Death (PSOD): LINT1 Motherboard Interrupt

And there is also a corrosponding VMware KB article:

ESXi host fails with intermittent NMI PSOD on HP ProLiant Gen8 servers

It isn’t clear WHY this setting was changed, but in VMware ESXi 5.5 patch 10, 6.0  patch 4, 6.0 U3 and, 6.5 the Intel IOMMU’s interrupt remapper functionality was disabled. So if you are running these ESXi versions on a HPE ProLiant Gen8, you might want to check if you are affected.

To make it clear again, only HPE ProLiant Gen8 models are affected. No newer (Gen9) or older (G6, G7) models.

Currently there is no resolution, only a workaround. The iovDisableIR setting must set to FALSE. If it’s set to TRUE, the Intel IOMMU’s interrupt remapper functionality is disabled.

To check this setting, you have to SSH to each host, and use esxcli  to check the current setting:

I have written a small PowerCLI script that uses the Get-EsxCli cmdlet to check all hosts in a cluster. The script only checks the setting, it doesn’t change the iovDisableIR setting.

Here’s another script, that analyzes and fixes the issue.

Horizon View: Server certificate does not match the external url

Certificates are always fun… or should I say PITA?  Whatever… During a small Horizon View PoC, I noticed an error message for the View Connection Server.

That’s right, Mr. Connection Server. The certificate subject name does not match the servers external URL, as this screenshot clearly shows.

But both settings are unused, because a VMware Access Point appliance is in place. If I remove the certificate, that was issued from a public certificate authority, I get an error message because of an invalid, self signed certificate.

I want to use the certificate on the Horizon View Connection Server, but I also want to get rid of the error message, caused by the wrong subject name. The customer uses split DNS, so he is using the same URL internally and externally, and the certificate uses the external URL as subject name.

Change the URLs

The solution is easy:

  1. Enable the checkboxes for the Secure Tunnel connection and the Blast Secure Gateway
  2. Change the hostname to the name, that matches the subject name of the certificate
  3. Uncheck the checkboxes again, and apply the settings

After a couple of secons, and a refresh of the dashboard, the error for the Connection Server should be gone.

VMware EUC Access Point appliance – Name resolution not working after deployment

As part of a project, I had to deploy a VMware EUC Access Point appliance. Nothing fancy, because the awesome VMware Access Point Deployment Utility makes it easy to deploy.

Unfortunately, the deployed Access Point appliance was not working as expected. When I tried to access my Horizon View infrastructure behind the Access Point appliance, I got a HTTP 504 error. The REST API interface was working. I was able to exclude invalid certificates, routing, or firewall policies. I re-deployed the appliance using the the IP address of the connection server, instead of the FQDN. And this worked… I checked the name resolution with nslookup and the name resolution failed. So that was probably the problem.

One per line

To make a long story short: The DNS server, I entered in the VMware Access Point Deployment Utility, were added in a single line to the /etc/resolv.conf

This is wrong, even if the VMware Access Point Deployment Utility claims something different.

euc_deployment_dnsThere must be a single “nameserver” entry for each DNS server.

You can easily change this after the deployment. Add only one DNS server during the deployment, and then add the second DNS server after the deployment.

I would like to highlight, that Chris Halstead mentioned this behaviour a year ago in his blog post “VMware Access Point Deployment Utility“. Chris is the author of the Deployment Utility.