Category Archives: Virtualization

Roaming of AppData-Local breaks Windows 10 Start Menu

This posting is ~2 years years old. You should keep this in mind. IT is a short living business. This information might be outdated.

One of my customers has started a project to create a Windows 10 Enterprise (LTSB 2016) master for their VMware Horizon View environment. Beside the fact (okay, it is more a personal feeling), that Windows 10 is a real PITA for VDI, I noticed an interesting issue during tests.

The issue

For convenience, I adopted some settings of the current Persona Management GPO for Windows 7 for the new Windows 10 environment. During the tests, the customer and I noticed a strange behaviour: After login, the start menu won’t open. The only solution was to logoff and delete the persona folder (most folders are redirected using native Folder Redirections, not the redirection feature of the View Persona Management). While debugging this issue, I found this error in the eventlog.

If you google this, you will find many, many threads about this. Most solutions describe, that you have to delete the profile due to wrong permissions on profile folders and/ or registry hives. I used Microsofts Procmon to verify this, but I was unable to confirm that. After further investigations, I found hints, that the TileDataLayer database could be the problem. The database is located in AppData\Local\TileDataLayer\Database and stores the installed apps, programs, and tiles for the Start Menu. AFAIK it also includes the Start Menu layout.

Windows 10 Tile Data Layer Files

Patrick Terlisten/ www.vcloudnine.de/ Creative Commons CC0

The database is part of the local part of the profile. A quick check proved, that it’s sufficient to delete the TileDataLayer folder. It will be recreated during the next logon.

The solution

It’s simple: Don’t roam AppData\Local. It should not be necessary to roam the local part of a users profile. The View Persona Management offers an option to roam the local part the profile. You can configure this behaviour with a GPO setting.

Horizon View Persona Management Roaming GPO Settings

Patrick Terlisten/ www.vcloudnine.de/ Creative Commons CC0

You can find this setting under Computer Configuration > Administrative Templates > VMware View Agent Configuration > Persona Management > Roaming & Synchronization

I was able to reproduce the observed behavior in my lab with Windows 10 Enterprise (LTSB 2016) and Horizon View 7.0.3. Because of this, I tend to recommend not to roam AppData\Local.

Wrong iovDisableIR setting on ProLiant Gen8 might cause a PSOD

This posting is ~3 years years old. You should keep this in mind. IT is a short living business. This information might be outdated.

TL;DR: There’s a script at the bottom of the page that fixes the issue.

Some days ago, this HPE customer advisory caught my attention:

Advisory: (Revision) VMware – HPE ProLiant Gen8 Servers running VMware ESXi 5.5 Patch 10, VMware ESXi 6.0 Patch 4, Or VMware ESXi 6.5 May Experience Purple Screen Of Death (PSOD): LINT1 Motherboard Interrupt

And there is also a corrosponding VMware KB article:

ESXi host fails with intermittent NMI PSOD on HP ProLiant Gen8 servers

It isn’t clear WHY this setting was changed, but in VMware ESXi 5.5 patch 10, 6.0  patch 4, 6.0 U3 and, 6.5 the Intel IOMMU’s interrupt remapper functionality was disabled. So if you are running these ESXi versions on a HPE ProLiant Gen8, you might want to check if you are affected.

To make it clear again, only HPE ProLiant Gen8 models are affected. No newer (Gen9) or older (G6, G7) models.

Currently there is no resolution, only a workaround. The iovDisableIR setting must set to FALSE. If it’s set to TRUE, the Intel IOMMU’s interrupt remapper functionality is disabled.

To check this setting, you have to SSH to each host, and use esxcli to check the current setting:

I have written a small PowerCLI script that uses the Get-EsxCli cmdlet to check all hosts in a cluster. The script only checks the setting, it doesn’t change the iovDisableIR setting.

Here’s another script, that analyzes and fixes the issue.

Horizon View: Server certificate does not match the external url

This posting is ~3 years years old. You should keep this in mind. IT is a short living business. This information might be outdated.
Horizon View Certificate Error

Patrick Terlisten/ www.vcloudnine.de/ Creative Commons CC0

Certificates are always fun… or should I say PITA?  Whatever… During a small Horizon View PoC, I noticed an error message for the View Connection Server.

That’s right, Mr. Connection Server. The certificate subject name does not match the servers external URL, as this screenshot clearly shows.

Horizon View Secure Tunnel Settings

Patrick Terlisten/ www.vcloudnine.de/ Creative Commons CC0

But both settings are unused, because a VMware Access Point appliance is in place. If I remove the certificate, that was issued from a public certificate authority, I get an error message because of an invalid, self signed certificate.

I want to use the certificate on the Horizon View Connection Server, but I also want to get rid of the error message, caused by the wrong subject name. The customer uses split DNS, so he is using the same URL internally and externally, and the certificate uses the external URL as subject name.

Change the URLs

The solution is easy:

  1. Enable the checkboxes for the Secure Tunnel connection and the Blast Secure Gateway
  2. Change the hostname to the name, that matches the subject name of the certificate
  3. Uncheck the checkboxes again, and apply the settings
Horizon View Secure Tunnel Settings

Patrick Terlisten/ www.vcloudnine.de/ Creative Commons CC0

After a couple of secons, and a refresh of the dashboard, the error for the Connection Server should be gone.

Horizon View Connection Server Details

Patrick Terlisten/ www.vcloudnine.de/ Creative Commons CC0

VMware EUC Access Point appliance – Name resolution not working after deployment

This posting is ~3 years years old. You should keep this in mind. IT is a short living business. This information might be outdated.

As part of a project, I had to deploy a VMware EUC Access Point appliance. Nothing fancy, because the awesome VMware Access Point Deployment Utility makes it easy to deploy.

Unfortunately, the deployed Access Point appliance was not working as expected. When I tried to access my Horizon View infrastructure behind the Access Point appliance, I got a HTTP 504 error. The REST API interface was working. I was able to exclude invalid certificates, routing, or firewall policies. I re-deployed the appliance using the the IP address of the connection server, instead of the FQDN. And this worked… I checked the name resolution with nslookup and the name resolution failed. So that was probably the problem.

One per line

To make a long story short: The DNS server, I entered in the VMware Access Point Deployment Utility, were added in a single line to the /etc/resolv.conf

This is wrong, even if the VMware Access Point Deployment Utility claims something different.

euc_deployment_dns

Patrick Terlisten/ www.vcloudnine.de/ Creative Commons CC0

There must be a single “nameserver” entry for each DNS server.

You can easily change this after the deployment. Add only one DNS server during the deployment, and then add the second DNS server after the deployment.

I would like to highlight, that Chris Halstead mentioned this behaviour a year ago in his blog post “VMware Access Point Deployment Utility“. Chris is the author of the Deployment Utility.

VCP7-DTM certification beta exam experience

This posting is ~3 years years old. You should keep this in mind. IT is a short living business. This information might be outdated.

Nearly a month ago, a tweet caught my attention:

These beta exams are a cost-effective way to achieve certifications. The last beta exam I took, was the VCP6-DCV beta. Because I already had the VCP6-DTM on my to-do list, the new VCP7-DTM beta exam was released just in the right moment.

As already mentioned in the blog post of the VMware Education and Certification Blog, there are primarly three reasons to take this beta exam:

  • get certified
  • low costs (only 50 USD)
  • identify strengths and weaknesses

Beside of this, VMware can test the questions and is getting feedback to increase the quality of their exams.

Exam preparation

The beta exam preparation guide is quite comprehensive.  Desktop and Mobility (DTM) is not only about VMWare Horizon View. VMware Horizon Mirage, App Volumes, User Environment Manager, Thin App, IDM/ Workspace are also part of the exam.

Section 1 – Install and Configure Horizon Server Components

  • Objective 1.1 – Describe techniques to prepare environment for Horizon
  • Objective 1.2 Determine procedures to install Horizon Components
  • Objective 1.3 – Determine steps to configure Horizon Components
  • Objective 1.4 – Analyze End User Requirements for Display Protocol Performance Knowledge
  • Objective 1.5 – Diagnose and solve issues related to connectivity between Horizon Server Components

Section 2 – Create and Configure Pools

  • Objective 2.1 – Configure and Manage Horizon Pools
  • Objective 2.2 – Build and Customize RDSH Server and Desktop Images

Section 3 – Configure and Administer VMware Mirage

  • Objective 3.1 – Install and Configure Mirage Components
  • Objective 3.2 – Configure and Manage Mirage layers
  • Objective 3.3 – Configure and Manage Mirage Endpoints

Section 4 – Configure and Manage Identity Manager

  • Objective 4.1 – Install and Configure VMware Identity Manager
  • Objective 4.2 – Manage VMware Identity Manager

Section 5 – Configure and Manage User Environment Manager

  • Objective 5.1 – Install and Configure VMware User Environment Manager
  • Objective 5.2 – Manage VMware User Environment Manager

Section 6 – Configure and Manager App Volumes

  • Objective 6.1 – Install and Configure VMware App Volumes
  • Objective 6.2 – Manage VMware AppStacks and writeable Volumes

Section 7 – Configure vRealize Operations for Horizon

  • Objective 7.1 – Manage VMware Workspace Portal

The preparation guide outlines some documents, which can be used to preapre for the exam. Although I’m working with Horizon View on a regular base, I had some “blind spots”. I used the official documentation and my lab to prepare for the exam.

The exam

The exam contained 175 questions, and I had 245 minutes to answer all the questions. I arrived early at the test center, because I had booked the first available slot for that day. I did not expect to be able to answer all the questions. View, Mirage, App Volumes, Workspace and IDM were the main topics, only a few questions about ThinApp and vROps for Horizon. Many questions were about administrative topics, where to click to achieve something, or where a specific option is located. There were also some questions about requirements, supported databases etc. As far as I can judge, these were all fair questions. If you have intensivly studied the documentation, you have do not have to fear this exam. Experience in administration is a great plus.

I really do not know if I have passed it. It will take some time. The results will be available after the beta phase. If I don not passed, I have at least gained experience.

Why I moved from NFS to vSAN… and why it went wrong

This posting is ~3 years years old. You should keep this in mind. IT is a short living business. This information might be outdated.

I wanted to retire my Synology DS414slim, and switch completely to vSAN. Okay, no big deal. Many folks use vSAN in their lab. But I’d like to explain why I moved to vSAN and why this move failed. I think some of my thoughts are also applicable for customer environments.

So far, I used a Synology DS414slim with three Crucial M550 480 GB SSDs (RAID 5) as my main lab storage. The Synology was connected with two 1 GbE uplinks (LAG) to my  network, and each host was connected with 4x 1 GbE uplinks (single distributed vSwitch). The Synology was okay from the capacity perspective, but the performance was horrible. RAID 5, SSDs and NFS were not the best team, or to be precise, the  CPU of the Synology was the main bottleneck.

nas_ds414slim

Patrick Terlisten/ www.vcloudnine.de/ Creative Commons CC0

1,2 GHz is not enough, if you want to use NFS or iSCSI. I never got more than 60 MB/s (sequential). The random IO performance was okay, but as soon as the IO increased, the latencies went through the roof. Not because the SSDs were to slow, but because the CPU of the Synology was not powerful enough to handle the NFS requests.

Workaround: Add more flash storage

The workaround for the poor random IO performance was adding more flash storage. This time, the flash storage was added to the hosts. I used PernixData FVP to boost my lab. FVP was a quite cool product (unfortunately it was a cool product.) PernixData granted me, as a PernixPro, some licenses for my lab.

End of an era

The acquisition of PernixData by Nutanix, the missing support für vSphere 6.5, and the end of availability of all PernixData products led to the decision to remove PernixData FVP from my lab. Without PernixData FVP, my lab was again a slow train crawling up a hill. Four HPE ProLiant, with enough CPU (40 cores) and memory resources (384 GB RAM) were tied down by slow IO.

Redistribution of resources

I had

  • three 480 GB SSDs, and
  • three 40 GB SSDs

in stock. The 40 GB SSDs were to small and slow, so I replaced them with 120 GB SSDs. I was able to equip three of my four hosts with SSDs. Three hosts with flash storage were enough to try VMware vSAN.

Fortunately, not all hosts have to add capacity to a vSAN cluster. Hosts can also only consume storage from a vSAN cluster. With this in mind, vSAN appeared to be a way out of my IO dilemma. In addition, using the 480 GB SSDs as capacity tier, a vSAN all-flash config was possible.

Migration

It took me a little time to move around VMs to temporary locations, while keeping my DC and my VCSA available. I had to remove my datastore on the Synology to free up the 480 GB SSDs. The necessary vSAN licenses were granted by VMware (vExpert licenses).

The creation of the vSAN cluster itself was easy. Fortunately, wiping partitions from disks is easy. You can use the vSphere Web Client to do this.

vsan_wipe_partitions

Patrick Terlisten/ www.vcloudnine.de/ Creative Commons CC0

The initial performance was quite good, much better than expected and much better than the NFS performance of the old Synology NAS. I enabled deduplication and compression, but as soon as I moved VMs to the vSAN datastore, the throughput dropped and latencies went through the roof. It was totally unusable. Furthermore, I got health alarms:

vsan_congestion_error

Patrick Terlisten/ www.vcloudnine.de/ Creative Commons CC0

As the load increased, the errors became more severe.

vsan_performance_error

Patrick Terlisten/ www.vcloudnine.de/ Creative Commons CC0


I was able to solve this with a blog post of Cormac Hogan (VSAN 6.1 New Feature – Handling of Problematic Disks). Even without compression and deduplication, the performance was not as expected and most times to low to work with. At this point, I got an idea what was causing my vSAN problems.

Do not use consumer-grade hardware with vSAN

To be honest: The budget is the problem. I had to take consumer-grade SSDs.

This is a screenshot from the vSAN Observer. esx1 to esx3 are equipped with SSDs, esx4 is only consuming storage from the vSAN cluster.

vsan_observer_perf

Patrick Terlisten/ www.vcloudnine.de/ Creative Commons CC0

Red is not the color to highlight good things…

An explanation attempt

This blog post of Duncan Epping (Why Queue Depth matters!) is a bit older, but still valid in my case. The controller I use  (HPE Smart Array P410i) has a a deep queue (1011), the RAID device has a queue length of 1024, but the SATA SSDs have only a queue length of 32. Here’s the disk adapter and disk device view of ESXTOP.

The consumer-grade SSDs drowned in IOs, unable to handle parallel read and write operations. There’s nothing much that I can do. Currently there are two options:

  • Replacing the SSDs with devices, that have a deeper queue depth
  • Replace the Synology NAS with a more powerful NAS and move back to NFS

I don’t know which way I will go. To get this clear:

  • This is my lab, not a customer environment
  • It is not a vSAN related problem
  • It is because of consumer-grade hardware

Do not try this at production kids. Go vSAN, but please use the right hardware.

Replacing an expired lookup service SSL certificate on a vSphere PSC

This posting is ~3 years years old. You should keep this in mind. IT is a short living business. This information might be outdated.

A few days ago, I ran into a very nasty problem. Fortunately, it was in my lab. Some months ago, I replaced the certificates of my vCenter Server Appliance (VCSA), and I’ve chosen to use the VMware Certificate Authority (VMCA) as a subordinate of my AD-based enterprise CA. The VMCA was used as intermediate CA. The certificates were replaced using the  vSphere 6.0 Certificate Manager (/usr/lib/vmware-vmca/bin/certificate-manager), and I followed the instructions of KB2112016 (Configuring VMware vSphere 6.0 VMware Certificate Authority as a subordinate Certificate Authority).

The VCSA was migrated from vSphere 5.5, and with vSphere 5.5 I was also using custom certificates. These certificates were also issued by my AD-based enterprise CA, and these certificates were migration during the vSphere 5.5 > 6.0 migration. So at the end, I replaced custom certificates with VMCA (as an intermediate CA) certificates.

Everything was fine, until a power outage. After powering-on my VMs, I noticed several errors. After logging into the vSphere Web Client, I got an error message at the top of the page:

While searching for the cause, I checked the URL of the Platform Services Controller (https://vcsa1.lab.local/psc/login) and got this:

psc_error_1

Patrick Terlisten/ www.vcloudnine.de/ Creative Commons CC0


This error led me to KB2144086 (Updating certificates using certificate manager on vCenter Server or PSC 6.0 Update 1b fails), but was able to proof, that I have used different subject names for the different solution user certificates.

While digging in the PSC logs, I found this error in the /var/log/vmware/psc-client/psc-client.log:

Finally, I found Aaron Smiths blog post “Troubleshooting Expired PSC Certificates with vSphere 6“, who had the same problem. I checked the certificate of the Lookup Service and there it was:

psc_error_2

Patrick Terlisten/ www.vcloudnine.de/ Creative Commons CC0

This was the original custom certificate, issued by my AD-based enterprise CA, and installed on my vSphere 5.5 VCSA.

Aaron also offered the solution by referencing KB2118939 (Replacing the Lookup Service SSL certificate on a Platform Services Controller 6.0). I followed the instructions in KB2118939 and replaced the certificate of the Lookup Service with a certificate of the VMCA.

Take care of your certificates

With vSphere 6.0, the Lookup Service should be accessed through the HTTP Reverse Proxy. This proxy uses the machine certificate. Therefore, an expired Lookup Certificate is not obvious. If you connect directly to the Lookup Service using port 7444, you will see the expired certificate. The Lookup Service certificate is not replaced with a custom certificate, if you replace the different solution user certificates.

If you have a vSphere 6.0 VCSA, which was migrated from vSphere 5.5, and you have replaced the certificates on that vSphere 5.5 VCSA with custom certificates, you should check your Lookup Service certificate immidiately! Follow KB2118939 for further instructions.

Credit to Aaron Smith for this blog post. Thank you!

Monitoring hardware status with Python and vSphere API calls

This posting is ~3 years years old. You should keep this in mind. IT is a short living business. This information might be outdated.

Apparently it’s “how to monitor hardware status” week on vcloudnine.de. Some days ago, I wrote an article about using SNMP for hardware monitoring. You can also use the vSphere Web Client to get the status of the host hardware. A third way is through the vSphere API. I just want to share a short example how to use vSphere API calls and pyVmomi. pyVmomi is the Python SDK for the VMware vSphere API.

Get hardware status with vSphere API calls

I just want to share a small example, that shows the basic principle. The script gathers the temperature sensor data of a ProLiant DL360 G7 running ESXi 6.0 U2 using vSphere API calls.

The output of the script looks like this:

Nothing fancy. You can easily loop through numericSensorInfo to gather data from other sensors. Use the Managed Object Browser (MOB) to navigate through the API. This is handy if you search for specific sensors. If you need accurate data, the vSphere API is the way to go. If you focus on something lightweight, try SNMP.

Missing hardware status tab in the vSphere Client

This posting is ~3 years years old. You should keep this in mind. IT is a short living business. This information might be outdated.

I thought, everyone knows it, but I’m always being asked “Where’s the hardware status tab?” after an update from vSphere 5.x to 6. Many customers still use the vSphere Client (C # client), and steer clear of the vSphere Web Client. To be honest: Me too. I often use the C# client, especially if I do mass operations, or for a quick look at something.

This is really nothing new, the answer is clear. But I think it’s a good idea to write it down. At least for myself. As a reminder to use the vSphere Web Client.

The hardware status tab

Many customers used the hardware status tab to get a quick overview about the health of the ESXi host hardware.

vsphere_client_hw_status_tab

Patrick Terlisten/ www.vcloudnine.de/ Creative Commons CC0

But after an update to vSphere 6 the hardware status tab is missing in the vSphere Client. This is an expected behaviour! VMware has published an knowledge base article about this (The Hardware Status Tab is no longer available in the vSphere Client after upgrading to vCenter Server 6.0). The only solution is to use the vSphere Web Client.

Use the vSphere Web Client

Meanwhile, the old vSphere Client has many downsides. All features introduced in vSphere 5.5 and later are only available through the vSphere Web Client. This also applies to the hardware status tab.

vsphere_web_client_hw_status_tab

Patrick Terlisten/ www.vcloudnine.de/ Creative Commons CC0

You find the hardware status on the “Monitor” tab of a host. It offers the same information as the legacy hardware status tab from the vSphere Client.

Do yourself a favor and use the vSphere Web Client. Always, any time.

VMware Update Manager reports “error code 99” during scan operation

This posting is ~3 years years old. You should keep this in mind. IT is a short living business. This information might be outdated.

After updating my lab to VMware vSphere 6.0 U2, one of my hosts continuously thrown an error during an update scan.

The first thing I’ve checked was the esxupdate.log on the affected ESXi host. This is the output, that was logged during a scan operation.

You might notice the “Unrecognized file vendor-index.xml in Metadata file” error. I also found this error message on the other hosts, so I excluded it from further research. It was unlikely, that this error was related to the observed problem. I started searching differences between the hosts and found out, that the output of “esxcli software vib list” was different on the faulty host.

This is the output on the faulty host:

This is the output on a working host. You see the difference?

Doesn’t look right… I investigated further, still searching for differences. And then I found two empty directories under /var/db/esximg.

The same directory was populated on other, working hosts.

One possible solution was therefore to copy the missing files to the faulty host. I used SCP for this. To get SCP working, you have to enable the SSH Client in the ESXi firewall.

After that, I’ve copied the files from a working host to the faulty host. Please make sure that the hosts have the same build! In my case, both hosts had the same build. Don’t try to copy files from an older or newer build to the host!!

And because we are pros, we disable the SSH Client after using it.

As expected, “esxcli software vib list” was working again.

A rescan operation in the vSphere Client was also successful. It seems that the root cause for the problem were missing files under /var/db/esximg.

Please don’t ask why this has happened. I really have no idea. But VMware KB2043170 (Initializing the VMware vCenter Update Manager database without reinstalling it) isn’t always the solution for “error code 99”, as sometimes written somewhere in the internet. Always try to analyze the problem and try to filter out unlikely and likely solutions.