Category Archives: Virtualization

Vembu VMBackup Deployment Scenarios

Vembu was founded in 2002 and has over 60,000 customers worldwide. One of their core products is the Vembu BDR Suite, which is an one stop solution to all your Backup and DR needs. I wrote a longer blog post about the Vembu BDR Suite.

One part of this suite is Vembu VMBackup, which is a data protection solution that is designed to backup VMware and Microsoft Hyper-V virtual machines secure and simple way. The offered features are compareable to Veeam Backup & Replication.

The core component of Vembu VMBackup is the Vembu BDR Backup server, which can be deployed in two ways:

  • On-premises Deployment
  • Hybrid Deployment

virnuls/ pixabay.com/ Creative Commons CC0

On-premises Deployment

In this deployment setup, customers deploy the product in their local environment. I think this is the most typical deployment type, where you install VMBackup on a physical server, in a VM or deployed as virtual appliance. Backup data is transferred  over LAN or SAN, and is written to the storage repositories. The Vembu BDR server acts as a centralized management point, where user can configure and manage backup and replication jobs.

In a simple deployment, the Vembu BDR Backup Server will act as backup proxy and management server instance. It is perfect for a small number of VMs with less simultaneous backup traffic and for VMBackup evaluation. The typical SMB environment.

If you seperate the management server from the backup proxy, the deployment changes to a distributed deployment. If necessary, multiple backup proxies can be deployed on physical hosts or in virtual machines. Customers can also deploy multiple BDR backups servers, which allows load balancing across a cluster of BDR backup servers. Pretty cool for bigger and/ or distributed environments. It allows customers to scale their backup solution over time.

On-Premises Deployment/ Vembu Technologies/ Copyright by Vembu Technologies

Hybrid Deployment

Backup is good, but having a backup copy offsite is better. Vembu OffsiteDR allows customers to create a copy of their backup data and transfer it to a DR location over LAN/ WAN. OffsiteDR instantly transfers backup data from a BDR Backup Server to an OffsiteDR server. Customers can restore failed VMs or missing files and application data in their DR site, or they can rebuild a failed BDR Backup Server from an OffsiteDR server.

Vembu Technologies/ OffsiteDR/ Copyright by Vembu Technologies

If customers don’t have a DR site, they can use Vembu CloudDR push a backup copy to the Vembu cloud. The data stored in the Vembu Cloud can easily be restored at anytime and to any location. Vembu uses AWS across all continents to asure the availability of their cloud services.

Vembu Technologies/ CloudDR/ Copyright by Vembu Technologies

Customers have the choice

It is obvious that customers have the freedom of choice how they deploy Vembu VMBackup.I like the virtual appliance approach, which eliminates the need for additional Windows Server licenses. More and more vendors tend to offer appliances for their products, just think about VMware vCenter Server Appliance, vRealize Orchestrator etc. So why not offer a backup server appliance? I wish other vendors would adopt this…

Another nice feature is the scale-out capability of Vembu. Start small and grow over time. Perfect for SMBs that want to start small and grow over time.

“Cannot execute upgrade script on host” during ESXi 6.5 upgrade

I was onsite at one of my customers to update a small VMware vSphere 6.0 U3 environment to 6.5 U2c. The environment consists of three hosts. Two hosts in a cluster, and a third host is only used to run a HPE StoreVirtual Failover Manager.

The update of the first host, using the Update Manager and a HPE custom ESX 6.5 image, was pretty flawless. But the update of the second host failed with “Cannot execute upgrade script on host”

typographyimages/ pixabay.com/ Creative Commons CC0

I checked the host and found it with ESXi 6.5 installed. But I was missing one of the five iSCSI datastores. Then I tried to patch the host with the latest patches and hit “Remidiate”. The task failed with “Cannot execute upgrade script on host”. So I did a rollback to ESXi 6.0 and tried the update again, but this time using ILO and the HPE custom ISO. But the result was the same: The host was running ESXi 6.5 after the update, but the upgrade failed with the “Upgrade Script” error. After this attempt, the host was unable to mount any of the iSCSI datastores. This was because the datastores were mounted ATS-only on the other host, and the failed host was unable to mount the datastores in this mode. Very strange…

I checked the vua.log and found this error message:

Focus on this part of the error message:

The upgrade script failed due to an illegal character in the output of esxcfg-info. First of all, I had to find out what this 0x80 character is. I checked UTF-8 and the windows1252 encoding, and found out, that 0x80 is the € (Euro) symbol in the windows-1252 encoding. I searched the output of esxcfg-info for the € symbol – and found it.

But how to get rid of it? Where does it hide in the ESXi config? I scrolled a bit up and down around the € symbol. A bit above, I found a reference to HPE_SATP_LH . This took immidiately my attention, because the customer is using StoreVirtual VSA and StoreVirtual HW appliances.

Now, my second educated guess of the day came into play. I checked the installed VIBs, and found the StoreVirtual Multipathing Extension installed on the failed host – but not on the host, where the ESXi 6.5 update was successful.

I removed the VIB from the buggy host, did a reboot, tried to update the host with the latest patches – with success! The cross-checking showed, that the € symbol was missing in the esxcfg-info  output of the host that was upgraded first. I don’t have a clue why the StoreVirtual Multipathing Extension caused this error. The customer and I decided to not install the StoreVirtual Multipathing Extension again.

High CPU usage on Citrix ADC VPX

While building a small Citrix NetScaler… ehm… ADC VPX (I really hate this name…) lab environment, I noticed that the fan of my Lenovo T480s was spinning up. I was wondering why, because the VPX VM was just running for a couple of minutes – without any load. But the task manager told me, that the VMware Workstation Process was consuming 25% (I have a Intel i5 Quad Core CPU) CPU. So VMware Workstation was just eating a whole CPU core without doing anything. I would not care, but the fan… And it reminded me, that I’ve seen an similar behaviour in various VPX deployments on VMWare ESXi.

Fifaliana/ pixabay.com/ Creative Commons CC0

A quick search lead me to this Citrix Support Knowledge Center article: High CPU Usage on NetScaler VPX Reported on VMware ESXi Version 6.0. That’s exactly what I’ve observed.

The solution is setting the parameter  cpuyield  to yes.

The VPX does not need a reboot. Short after setting the parameter, the fan stopped spinning. Have I mentioned how I love silence on my desk? I’m pretty happy that my T480s is a really quiet laptop.

But what does this parameter is used for? In pretty simple words: To allocate CPU cycles, that are not used by other VMs. Until ADC VPX 11.1, the VPX was sharing CPU with other VMs. This changed with ADC VPX 12.0. Since this release, the VPX was like a child, that was playing with their favorite toy just to make sure, that no other child can play with it. Not very polite…

This is a quote from the Support Knowledge Center article:

Set ns vpxparam parameters:
-cpuyield: Release or do not release of allocated but unused CPU resources.

YES: Allow allocated but unused CPU resources to be used by another VM.

NO: Reserve all CPU resources for the VM to which they have been allocated. This option shows higher percentage in hypervisor for VPX CPU usage.
DEFAULT: NO

I don’t think that I would change this in production. But for lab environments, especially if you run this on VMware Workstation, I would set  -cpuyield  to yes .

Powering on a VM with shared VMDK fails after extending a EagerZeroedThick VMDK

I hope that you are not reading this blog post while searching for a solution for a failed cluster. If so, feel free to leave a comment if this blog post saved your evening or weekend. :)

Last friday, a change at one of my customers went horribly wrong. I was not onsite, but they contacted me during the night from friday to saturday, because their most important Windows Server Failover Cluster was unable to start after extending a shared VMDK.

cripi/ pixabay.com/ Creative Commons CC0

They tried something pretty simple: Extending an virtual disk of a VM. That is something most of us doing pretty often. The customer did this also pretty often. It was a well known task… Except the fact, that the VM was part of a Windows Server Failover Cluster. With shared VMDKs. And the disks were EagerZeroedThick, because this is a requirement for shared VMDKs.

They extended the disk using the vSphere Web Client. And at this point, the change was doomed to fail. They tried to power-on the VMs, but all they got was this error:

VMware ESX cannot open the virtual disk, “/vmfs/volumes/4c549ecd-66066010-e610-002354a2261b/VMNAME/VMDKNAME.vmdk” for clustering. Please verify that the virtual disk was created using the ‘thick’ option.

A shared VMDK is a VMDK in multiwriter mode. This VMDK has to be created as Thick Provision Eager Zeroed. And if you wish to extend this VMDK, you must use  vmkfstools  with the option -d eagerzeroedthick. If you extend the VMDK using the Web Client, the extended portion of the disk will become LazyZeroed!

VMware has described this behaviour in the KB1033570 (Powering on the virtual machine fails with the error: Thin/TBZ disks cannot be opened in multiwriter mode). There is also a blog post by Cormac Hogan at VMware, who has described this behaviour.

That’s a screenshot from the failed cluster. Check out the type of the disk (Thick-Provision Lazy-Zeroed).

Patrick Terlisten/ vcloudnine.de/ Creative Commons CC0

You must use vmkfstools  to extend a shared VMDK – but vmkfstools is also the solution, if you have trapped into this pitfall. Clone the VMDK with option -d eagerzeroedthick.

Another solution, which was new to me, is to use Storage vMotion. You can migrate the “broken” VMDK to another datastore and change the the disk format during Storage vMotion. This solution is described in the “Notes” section of KB1033570.

Both ways will fix the problem. The result will be a Thick Provision Eager Zeroed VMDK, which will allow the VMs to be successfully powered on.

Exam prep & experience: VMware Certified Advanced Professional 6 – Data Center Virtualization Deployment Exam (VCAP6-DCV Deploy)

TL;DR: I have passed the VCAP6-DCV Deploy exam today. :) I want to thank Fred, Dominik, Frank and Jens-Henrik for kicking my ass. Without you, I would have taken the VCP 6.5 delta exam. Thank you!

As often, the whole thing started with a tweet. A tweet about my expiring VMware Certified Professional (VCP) certification.

To my surprise, several of my followers recommended to go for the VCAP6-DCV Deployment instead. Okay, so many smart people can’t be wrong.

I booked the exam, prepared for the exam, took the exam today – and passed!

27 questions in 205 minutes (25 minutes extension for non-native speaker) is a pretty challenging task. I was able to answer all questions in the given time. I left the test center with a good feeling, and after an hour I got the mail that I have passed the exam! Woohoo!

Preparation is everything

Preparation and time management. That’s all. Easier said than done. ;)

Make sure that you have read the exam guide. This document is intended to provide detailed information about the objectives covered by the exam. It was crucial for me to get a feeling about what I have to learn. I have been working with VMware vSphere since ESX 2.5, that’s a pretty long time, yet I do not know everything. Especially things like vSphere Data Protection, Auto Deploy or some certificate-related tasks are not day-to-day tasks.

I premillary worked with Kyle Jenners VCAP6-DCV Deployment Study Guide and and the VMware Hands-on-Lab. The VCAP6-DCV Deployment is not a MC test, like the VCP exams. You have to do real tasks. So experience is crucial to pass the exam.

Because I don’t have a lab, I used the VMware Hands-on Lab instead. I can recommend these three courses:

  • HOL-1911-01-SDC (What’s New in VMware vSphere 6.7)
  • HOL-1808-01-HCI (vSAN v6.6.1 – Getting Started), and
  • HOL-1827-01-HCI (VMware Storage – Virtual Volumes and Storage Policy Based Management)

Unfortunately, there is no course available that covers vSphere Data Protection and vSphere Replication.

But there was also another reason, why I have used the HOL: The VCAP exam environment is based on the interface of the VMware HOL. This was pretty helpful, because I was able to get in touch with the interface prior the exam.

Due to security restrictions, the exam environment does not support some keys and shortcuts, e.g. CRTL and ALT. To my surprise, the Backspace key worked in my enviornment. Many people stated, that the Backspace key isn’t working. Because of this, VMware has published an Interface Guide. Make sure to read it! And learn how to get around these limitations. There is also a pretty handy YouTube video with tips and tricks:

To test yourself, you can use this free VCAP-DCV simulator. The simulation provides scenarios that are equal to the scenarios from the exam. This is pretty handy to get a feeling of how good you are prepared for the exam.

VCAP6-DCV Deploy Exam Simulator – FREE

You have ~ 7 minutes per questions. If you don’t have an idea how to answer a question, move on! Write down the number and some keywords, then move onto the next question. Instead of waiting for tasks to finish, move onto the next question and come back later to check the task result.

I took the exam at Blue Consult in Krefeld (Germany). This was a recommendation of one of my followers (Thanks Dominik!). Fortunately, Blue Consult has keyboards with US layout in their test center, which makes it much easier for me. The performance of the exam environment was quite good. No lags or hanging sessions.

What’s next?

I will book the VMware Certified Advanced Professional 6.5 – Data Center Virtualization Design exam as soon as I passed the NetScaler CCP-N exam, which I have to take until end of December 2018 (Thank you Citrix… NOT!).

VCIX6.5-DCV FTW! :)

Veeam backups fails because of time differences

Last week I had an interesting incident at a customer. The customer reported that one of multiple Veeam backup jobs jobs constantly failed.

jarmoluk/ pixabay.com/ Creative Commons CC0

The backup job included two VMs, and the backup of one of these VMs failed with this error:

The verified the used credentials for that job, but re-entering the password does not solved the issue. I then checked the Veeam backup logs located under %ProgramData%\Veeam\Backup (look for the Agent.Job_Name.Source.VM_Name.vmdk.log) and found VDDK Error 3014:

The user, that was used to connect to the vCenter, was an Active Directory located account. The account were granted administrator privileges root of the vCenter. Switching from an AD located account to Administrator@vsphere.local solved the issue. Next stop: vmware-sts-idmd.log on the vCenter Server appliance. The error found in this log confirmed my theory, that there was an issue with the authentication itself, not an issue with the AD located account.

To make a long story short: Time differences. The vCenter, the ESXi hosts and some servers had the wrong time. vCenter and ESXi hosts were using the Domain Controllers as time source.

This is the ntpq  output of the vCenter. You might notice the jitter values on the right side, both noted in milliseconds.

After some investigation, the root cause seemed to be a bad DCF77 receiver, which was connected to the domain controller that was hosting the PDC Emulator role. The DCF77 receiver was connected using an USB-2-LAN converter. Instead of using a DCF77 receiver, the customer and I implemented a NTP hierarchy using a valid NTP source on the internet (pool.ntp.org).

vSphere Distributed Switch health check fails on HPE Comware switches

During the replacement of some VMware ESXi hosts at a customer, I discovered a recurrent failure of the vSphere Distributed Switch health checks. A VLAN and MTU mismatch was reported. On the physical side, the ESXi hosts were connected to two HPE 5820 switches, that were configured as an IRF stack. Inside the VMware bubble, the hosts were sharing a vSphere Distributed Switch.

cre8tive / pixelio.de

The switch ports of the old ESXi hosts were configured as Hybrid ports. The switch ports of the new hosts were configured as Trunk ports, to streamline the switch and port configuration.

Some words about port types

Comware knows three different port types:

  • Access
  • Hybrid
  • Trunk

If you were familiar with Cisco, you will know Access and Trunk ports. If you were familiar with HPE ProCurve or Alcatel-Lucent Enterprise, these two port types refer to untagged and tagged ports.

So what is a Hybrid port? A Hybrid port can belong to multiple VLANs where they can be untagged and tagged. Yes, multiple untagged VLANs on a port are possible, but the switch will need additional information to bridge the traffic into correct untagged VLANs. This additional information can be  MAC addresses, IP addresses, LLDP-MED etc. Typically, hybrid ports are used for in VoIP deployments.

The benefit of a Hybrid port is, that I can put the native VLAN of a specific port, which is often referred as Port VLAN identifier (PVID), as a tagged VLAN on that port. This configuration allows, that all dvPortGroups have a VLAN tag assigned, even if the VLAN tag represents the native VLAN of a switch port.

Failing health checks

A failed health check rises a vCenter alarm. In my case, a VLAN and MTU alarm was reported. In both cases, VLAN 1 was causing the error. According to VMware, the three main causes for failed health checks are:

  • Mismatched VLAN trunks between a vSphere distributed switch and physical switch
  • Mismatched MTU settings between physical network adapters, distributed switches, and physical switch ports
  • Mismatched virtual switch teaming policies for the physical switch port-channel settings.

Let’s take a look at the port configuration on the Comware switch:

As you can see, this is a normal trunk port. All VLANs will be passed to the host. This is an except from the  display interface Ten-GigabitEthernet1/0/9  output:

The native VLAN is 1, this is the default configuration. Traffic, that is received and sent from a trunk port, is always tagged with a VLAN id of the originating VLAN – except traffic from the default (native) VLAN! This traffic is sent without a VLAN tag, and if frames were received with a VLAN tag, this frames will be dropped!

If you have a dvPortGroup for the default (native) VLAN, and this dvPortGroup is sending tagged frames, the frames will be dropped if you use a “standard” trunk port. And this is why the health check fails!

Ways to resolve this issue

In my case, the dvPortGroup was configured for VLAN 1, which is the default (native) VLAN on the switch ports.

There are two ways to solve this issue:

  • Remove the VLAN tag from the dvPortGroup configuration
  • Change the PVID for the trunk port

To change the PVID for a trunk port, you have to enter the following command in the interface context:

You have to change the PVID on all ESXi facing switch ports. You can use a non-existing VLAN ID for this.

vSphere Distributed Switch health check will switch to green for VLAN and MTU immediately.

Please note, that this is not the solution for all VLAN-related problems. You should make sure that you are not getting any side effects.

Unsupported hardware family ‘vmx-06’

A customer of mine got an appliance from a software vendor. The appliance was delivered as ZIP file with a VMDK, a MF, and an OVF file. Unfortunately, the appliance was created with VMware Workstation 6.0 with virtual machine hardware version 6, which is incompatible with VMware ESXi (Virtual machine hardware versions). During deployment, my customer got this error:

The OVF file includes a line with the VM hardware version.

If you change this line from vmx-06 to vmx-07, the hash of the OVF changes, and you will get an error during the deployment of the appliance because of the wrong file hash.

Solution

You have to change the SHA256 hash of the OVF, which is included in the MF file.

To create the new SHA256 hash, you can use the PowerShell cmdlet Get-FileHash .

Replace the hash and save the MF file. Then re-deploy the appliance.

Andreas Lesslhumer wrote a similar blog post in 2015:
“Unsupported hardware family vmx-10” during OVF import

Workaround for broken Windows 10 Start Menus with floating desktops

Last month, I wrote about a very annoying issue, that I discovered during a Windows 10 VDI deployment: Roaming of the AppData\Local folder breaks the Start Menu of Windows 10 Enterprise (Roaming of AppData\Local breaks Windows 10 Start Menu). During research, I stumbled over dozens of threads about this issue.

Today, after hours and hours of testing, troubleshooting and reading, I might have found a solution.

The environment

Currently I don’t know if this is a workaround, a weird hack, or no solution at all. Maybe it was luck that none of my 2074203423 logins at different linked-clones resulted in a broken start menu. The customer is running:

  • Horizon View 7.1
  • Windows 10 Enterprise N LTSB 2016 (1607)
  • View Agent 7.1 with enabled Persona Management

Searching for a solution

During my tests, I tried to discover WHY the TileDataLayer breaks. As I wrote in my earlier blog post, it is sufficient to delete the TileDataLayer folder. The folder will be recreated during the next logon, and the start menu is working again. Today, I added path for path to “Files and folders excluded from roamin” GPO setting, and at some point I had a working start menu. With this in mind, I did some research and stumbled over a VMware Communities thread (Vmware Horizon View 7.0.3 – Linked clone – Persistent mode – Persona management – Windows 10 (1607) – -> Windows 10 Start Menu doesn’t work)

User oliober did the same: He roamed only a couple of folders, one of them is the TileDataLayer folder, but not the whole Appdata\Local folder.

The “solution”

To make a long story short: You have to enable the roaming of AppData\Local, but then you exclude AppData\Local, and add only necessary folders to the exclusion list of the exclusion. Sounds funny, but it seems to work.

Horizon View GPO AppData Roaming

Patrick Terlisten/ www.vcloudnine.de/ Creative Commons CC0

Feedback is welcome!

I am very interested in feedback. It would be great if you have the chance to verify this behaviour. Please leave a comment with your results.

As I already said: I don’t know if this is a workaround, a hack, a solution, or no solution at all. But for now, it seems to work. Microsoft deprecated TileDataLayer in Windows 10 1703. So for this new Windows 10 build, we have to find another working solution. The above described “solution” only works for 1607. But if you are using the Long Term Service Branch, this solution will work for the next 10 years. ;)

Some thoughts about using Windows Server 2012 R2 instead of Windows 10 for VDI

Disclaimer: The information from this blog post is provided on an “AS IS” basis, without warranties, both express and implied.

Last week, I had an interesting discussion with a customer. Some months back, the customer has decided to kick-off a PoC for a VMware Horizon View based virtual desktop infrastructure (VDI). He is currently using fat-clients with Windows 8.1, and the new environment should run on Windows 10 Enterprise. Last week, we discussed the idea of using Windows Server 2012 R2 as desktop OS.

Horizon View with Windows Server as desktop OS?

My customer has planned to use VMware Horizon View. The latest release is VMware Horizon View 7.2. VMware KB article 2150295 (Supported Guest Operating Systems for Horizon Agent and Remote Experience) lists all supported (non-Windows 10) Microsoft operating systems for different Horizon VIew releases. This article  shows, that Windows Server 2012 R2 (Standard and Datacenter) are both supported with all Horizon View releases, starting with Horizon View 7.0. The installation of a View Agent is supported, and you can create full- and linked-clone desktop pools. But there is also another important KB article: 2150305 (Feature Support Matrix for Horizon Agent). This article lists all available features, and whether they are compatible with a specific OS or not. According to this artice, the

  • Windows Media MMR,
  • VMware Client IP Transparency, and the
  • Horizon Virtualization Pack for Skype for Business

are not supported with Windows Server 2012 R2 and 2016.

From the support perspective, it’s safe to use Windows Server 2012 R2, or 2016, as desktop OS for a VMware Horizon View based virtual desktop infrastructure.

Licensing

Licensing Microsoft Windows for VDI is PITA. It’s all about the virtual desktop access rights, that can be acquired on two different ways:

  • Software Assurance (SA), or
  • Windows Virtual Desktop Access (VDA)

SA and VDA are available per-user and per-device.

Windows VDI Licensing

Microsoft/ microsoft.com

You need a SA or a VDA for each accessing device or user. There is no need for additional licenses for your virtual desktops! You will get the right to install Windows 10 Enterprise on your virtual desktops. This includes the  LTSB (Long Term Servicing Branch). LTSP offers updates without delivery of new features for the duration of mainstream support (5 years), and extended support (5 years). Another side effect is, that LTSB does not include most of the annoying Windows apps.

Do yourself a favor, and do not try to setup a VDI with Windows 10 Professional…

Service providers, that offer Desktop-as-a-Service (DaaS), are explicitly excluded from this licensing! They must license their stuff according to Microsofts Services Provider Licensing Agreement (SPLA).

How do I have to license Windows Server 2012 R2, if I want to use Windows Server as desktop OS? Windows Server datacenter licensing allows you to run an unlimited number of server VMs on your licensed hardware. To be clear: Windows Server is licensed per physical server, and there is nothing like license mobility! To license the access to the server, your need two different licenses:

  • Windows Server CAL (device or user), and
  • Remote Desktop Services (RDS) CAL (device or user)

The Windows Server CAL is needed for any access to a Microsoft Windows Server from a client, regardless what service is used (even for DHCP). The RDS CAL must be asssigned to any user or device, that is directly or indirectly interacting with the Windows Server desktop, or using a remote desktop access technology (RDS, PCoIP, Blast Extreme etc.) to access the Windows Server desktop.

With this license setup, you have licensed the Windows Server VM itself, and also the access to this VM. There is no need to purchase a SA or VDA.

Do the math

With this in mind, you have to do the math. Compare the licensing costs for Windows 10 and Windows Server 2012 R2/ 2016 in your specific situation. Setup a PoC to verify your requirements, and the support of your software on Windows Server.

Windows Server can be an interesting alternative compared to Windows 10. Maybe some of you, that already use it with Horizon View, have time to add some comments to this blog post. It would be nice to get some feedback about this topic.