Tag Archives: bug

“Cannot execute upgrade script on host” during ESXi 6.5 upgrade

I was onsite at one of my customers to update a small VMware vSphere 6.0 U3 environment to 6.5 U2c. The environment consists of three hosts. Two hosts in a cluster, and a third host is only used to run a HPE StoreVirtual Failover Manager.

The update of the first host, using the Update Manager and a HPE custom ESX 6.5 image, was pretty flawless. But the update of the second host failed with “Cannot execute upgrade script on host”

typographyimages/ pixabay.com/ Creative Commons CC0

I checked the host and found it with ESXi 6.5 installed. But I was missing one of the five iSCSI datastores. Then I tried to patch the host with the latest patches and hit “Remidiate”. The task failed with “Cannot execute upgrade script on host”. So I did a rollback to ESXi 6.0 and tried the update again, but this time using ILO and the HPE custom ISO. But the result was the same: The host was running ESXi 6.5 after the update, but the upgrade failed with the “Upgrade Script” error. After this attempt, the host was unable to mount any of the iSCSI datastores. This was because the datastores were mounted ATS-only on the other host, and the failed host was unable to mount the datastores in this mode. Very strange…

I checked the vua.log and found this error message:

Focus on this part of the error message:

The upgrade script failed due to an illegal character in the output of esxcfg-info. First of all, I had to find out what this 0x80 character is. I checked UTF-8 and the windows1252 encoding, and found out, that 0x80 is the € (Euro) symbol in the windows-1252 encoding. I searched the output of esxcfg-info for the € symbol – and found it.

But how to get rid of it? Where does it hide in the ESXi config? I scrolled a bit up and down around the € symbol. A bit above, I found a reference to HPE_SATP_LH . This took immidiately my attention, because the customer is using StoreVirtual VSA and StoreVirtual HW appliances.

Now, my second educated guess of the day came into play. I checked the installed VIBs, and found the StoreVirtual Multipathing Extension installed on the failed host – but not on the host, where the ESXi 6.5 update was successful.

I removed the VIB from the buggy host, did a reboot, tried to update the host with the latest patches – with success! The cross-checking showed, that the € symbol was missing in the esxcfg-info  output of the host that was upgraded first. I don’t have a clue why the StoreVirtual Multipathing Extension caused this error. The customer and I decided to not install the StoreVirtual Multipathing Extension again.

Wrong iovDisableIR setting on ProLiant Gen8 might cause a PSOD

TL;DR: There’s a script at the bottom of the page that fixes the issue.

Some days ago, this HPE customer advisory caught my attention:

Advisory: (Revision) VMware – HPE ProLiant Gen8 Servers running VMware ESXi 5.5 Patch 10, VMware ESXi 6.0 Patch 4, Or VMware ESXi 6.5 May Experience Purple Screen Of Death (PSOD): LINT1 Motherboard Interrupt

And there is also a corrosponding VMware KB article:

ESXi host fails with intermittent NMI PSOD on HP ProLiant Gen8 servers

It isn’t clear WHY this setting was changed, but in VMware ESXi 5.5 patch 10, 6.0  patch 4, 6.0 U3 and, 6.5 the Intel IOMMU’s interrupt remapper functionality was disabled. So if you are running these ESXi versions on a HPE ProLiant Gen8, you might want to check if you are affected.

To make it clear again, only HPE ProLiant Gen8 models are affected. No newer (Gen9) or older (G6, G7) models.

Currently there is no resolution, only a workaround. The iovDisableIR setting must set to FALSE. If it’s set to TRUE, the Intel IOMMU’s interrupt remapper functionality is disabled.

To check this setting, you have to SSH to each host, and use esxcli  to check the current setting:

I have written a small PowerCLI script that uses the Get-EsxCli cmdlet to check all hosts in a cluster. The script only checks the setting, it doesn’t change the iovDisableIR setting.

Here’s another script, that analyzes and fixes the issue.

HPE 3PAR OS updates that fix VMware VAAI ATS Heartbeat issue

Customers that use HPE 3PAR StoreServs with 3PAR OS 3.2.1 or 3.2.2 and VMware ESXi 5.5 U2 or later, might notice one or more of the following symptoms:

  • hosts lose connectivity to a VMFS5 datastore
  • hosts disconnect from the vCenter
  • VMs hang during I/O operations
  • you see the messages like these in the vobd.log or vCenter Events tab

  • you see the following messages in the vmkernel.log

Interestingly, not only HPE is affected by this. Multiple vendors have the same issue. VMware described this issue in KB2113956. HPE has published a customer advisory about this.

Workaround

If you have trouble and you can update, you can use this workaround. Disable ATS heartbeat for VMFS5 datastores. VMFS3 datastores are not affected by this issue. To disable ATS heartbeat, you can use this PowerCLI one-liner:

Solution

But there is also a solution. Most vendors have published firwmare updates for their products. HPE has released

  • 3PAR OS 3.2.2 MU3
  • 3PAR OS 3.2.2 EMU2 P33, and
  • 3PAR OS 3.2.1 EMU3 P45

All three releases of 3PAR OS include enhancements to improve ATS heartbeat. Because 3PAR OS 3.2.2 has also some nice enhancements for Adaptive Optimization, I recommend to update to 3PAR OS 3.2.2.

Data Protector: Copy sessions to encrypted devices fail after update to 9.07

Recently, a customer has informed me, that copy sessions to encrypted devices failed, after he has made an update to Data Protector 9.07. The copy sessions failed with this error:

The customer uses tape encryption. The destination for the backups is a HPE StoreOnce, and a post-backup copy creates a copy of the data on tape. Backup to disk was running fine, but the copy to tape failed immediately.

The customer has opened a ticket at the HPE support and got instantly a hotfix to resolve this issue. HPE has documented this error in QCCR2A69192. If you run into the same issue, please request hotfix QCCR2A69802. This hotfix consolidates QCCR2A69192 and QCCR2A69318 (The BMA ends abnormally during backup/copy to tape).

Thanks to Stefan for the hint!

Receive Connector role not selectable in Exchange 2016 CU2

Another bug in Exchange 2016 CU2. The Role of a new receive connector is greyed out. You can select “Front-End-Transport”. This is a screenshot from a german Exchange 2016 CU2.

receive_connect_role_not_selectable

Patrick Terlisten/ www.vcloudnine.de/ Creative Commons CC0

Solution

Use the Exchange Management Shell to create a new receive connector. Afterwards, you can modify it with the Exchange Control Panel (ECP).

Microsoft has confirmed, that this is a bug in Exchange 2016 CU2.

WSUS on Windows 2012 (R2) and KB3159706 – WSUS console fails to connect

As any other environment, my lab needs some maintenance from time to time. I use a Windows 2012 R2 VM with the Windows Server Update Service (WSUS) role to keep my Windows VMs up to date. Like many others, I was surprised by KB3148812 (Update enables ESD decryption provision in WSUS in Windows Server 2012 and Windows Server 2012 R2), which broke my WSUS. But the fix was easy: Uninstall KB3148812 and reboot the server. The WSUS product team published an artice about this known issue in their blog: Known Issues with KB3148812. In the meantime, Microsoft has published a new update, which supersedes KB3148812: KB3159706.

WSUS dead again?

Today I wanted to check the update status of my VMs. Unfortunately, the WSUS console was unable to connect to the WSUS server.

wsus_2

Patrick Terlisten/ www.vcloudnine.de/ Creative Commons CC0

I checked the status of the service and found the WSUS service stopped. But even after I had started the service, the WSUS console was unable to connect to the server. I found an error in the event logs (ID 507, source Windows Server Update Services), but the message “Update Services failed its initialization and stopped” wasn’t helpful. More helpful was a log entry:

After some searching and examination of the recently installed updates, I came across KB3159706.

Manual steps required to complete the installation of KB3159706

Open an elevated CMD and run this command:

The output should look similar to this:

Then you have to install the “HTTP Activation” feature under “.NET Framework 4.5” features.

wsus_3

Patrick Terlisten/ www.vcloudnine.de/ Creative Commons CC0

After a restart of the WSUS service, the WSUS should work again.

Summary

The installation of KB3148812 on a WSUS server will break the WSUS installation. Because of this, Microsoft has published KB3159706. If you install this update (in my case it was installed automatically over WSUS…), you have to execute some manual steps to ensure that WSUS works as expected. The WSUS product team is aware of this and they pointed this out in their blog article “The long-term fix for KB3148812 issues” (you will find a hint directly at top of the blog article).

Guest customizations fails after upgrade to VMware vSphere 6

VMware vSphere 6 is now an year old and it was time to update my lab to vSphere 6. The update went smooth, and everything has worked as expected. Some days later, I updated the master VM of a small automated desktop pool. I’m using VMware Horizon 6.2.1 in my lab to deploy a small number of Windows 8.1 VMs for tests, administration etc. The recompose of the pool failed during the guest customization.

view_error_decrypt_password

Patrick Terlisten/ www.vcloudnine.de/ Creative Commons CC0

I checked the customization specification immediately and got an error in the vSphere C# client.

vcsa_error_decrypt_password

Patrick Terlisten/ www.vcloudnine.de/ Creative Commons CC0

Interestingly, I got no error in the vSphere Web Client:

vcsa_error_decrypt_password_web_client

Patrick Terlisten/ www.vcloudnine.de/ Creative Commons CC0

After re-entering the Administrator password, the  customization specification was usable again. No errors so far.

A quick search in the VMware KB lead me to the article “Virtual machines with customizations fail to deploy when using Custom SSL Certificates (1019893)“. But this article doesn’t apply to vCenter 6.0. For the notes: I’m using CA-signed certificates in my environment. It seems to be a good idea to re-enter the passwords in customization specifications after a vCenter migration/ upgrade (5.x to 6.x or from VCSA 5.x to 6.x).

Screen resolution scaling has stopped working after Horizon View agent update

Another inconvenience that I noticed during the update process from VMware Horizon View 6.1.1 to 6.2 was, that the automatic screen resizing stopped working. When I connected to a desktop pool with the VMware Horizon client, I only got the screen resolution of the VM (the resolution that is used when connecting to the VM with the vSphere console)), not 1920×1200 as expected. This issue only occured with PCoIP, not with RDP. I had this issue with a static desktop and a dynamic desktop pool, and it occurred after updating the Horizon View agent. The resolution scaling worked with a Windows 2012 R2 RDS host, when I connected to a RDS with PCoIP.

VMware KB1018158 (Configuring PCoIP for use with View Manager) did not solved the problem. I checked the VMX version, the video RAM config etc. Nothing has changed, everything was configured as expected. At this point it was clear to me, that this must be an issue with the Horizon View agent. I took some snapshots and tried to reinstall the Horizon View agent. I removed the Horizon View agent and the VMware tools from one of my static desktops. After a reboot, I installed the VMware tools and then the Horizon agent. To my surprise, this first attempt has solved the problem. I tried the same with my second static desktop pool VM and with the master VM of my dynamic desktop pool (don’t forget to recompose the VMs…). This workaround has fixed the problem in each case.

I don’t know if this is a bug. I haven’t found any hints in the VMware Community forum or blogs. Maybe someone knows the answer.

VMware Horizon View agent update on RDS host fails with “Internal Error 25030”

I’m running a small VMware Horizon View environment in my lab. Nothing fancy, but all you need to show what Horizon View can do for you. This environment includes a Windows Server 2012 R2 RDS host. During the update process from Horizon View 6.1.1 to 6.2, I had to update the View agent on this RDS host. This update installation failed with an “Internal Error 25030”, followed by a rollback. Fortunately I had a snapshot, so I went back to the previous state and tried the update again. This attempt also went awry.

To make a long story short: Read the fscking release notes! This quote is taken from the Horizon View 6.2 release notes:

When you upgrade View Agent 6.1.1 to View Agent 6.2 on an RDS host running on Windows Server 2012 or 2012 R2, the upgrade fails with an “Internal Error 25030” message.
Workaround: Uninstall View Agent 6.1.1, restart the RDS host, and install View Agent 6.2.

And this is not the first time that this error occurred. I found this quote in the the Horizon View 6.1.1 release notes:

When you upgrade View Agent 6.1 to View Agent 6.1.1 on an RDS host running on Windows Server 2012 or 2012 R2, the upgrade fails with an “Internal Error 25030” message.
Workaround: Uninstall View Agent 6.1, restart the RDS host, and install View Agent 6.1.1

If you take a closer look at these two statements, you might notice some similarities… But I do not want to be spiteful. The workaround did the trick. Simply uninstall the View agent (if it’s still installed after the rollback… that was not the case with me), reboot and reinstall the View agent.

Error 1325: VBRCatalog is not a valid short file name

While upgrading a rather old (but very stable) Veeam Backup & Replication 6.1 installation to 8.0 Update 3 (with intermediate step to 6.5), I ran into a curious error. Right after the welcome screen, this error message

veeam_error_1325

Patrick Terlisten/ www.vcloudnine.de/ Creative Commons CC0

appeared. A closer look into the BackupSetup.log (you can find this log in the %temp% dir. Just enter %temp% into the Explorer address bar) resulted in this very interesting log entry:

First, the VBRCatalog folder was located under D:\Veeam, so why the hell was the CATALOGPATH property changed to E:\VBRCatalog? I searched the registry for for E:\VBRCatalog and found multiple entries for it. One of the entries was located under “HKLM\SOFTWARE\Wow6432Node\Veeam\Veeam Backup Catalog”. The entry under “HKLM\SOFTWARE\Veeam\Veeam Backup Catalog” pointed to the correct path. I found some other entries, e.g. in connection with Windows Installer.

After changing all found entries to the correct path, the update went smooth. The reason for this error was that the VBRCatalog was moved after the installation. I did this more than 3 years ago and followed Veeam KB1453. But this article only describes the change of the CatalogPath entry under “HKLM\SOFTWARE\Veeam\Veeam Backup Catalog”. You have to change all references to the old VBRCatalog path! Otherwise you will run into the same error as I.