Tag Archives: virtualization

HPE Data Protector 9.05: SAN backups failing back to NBDSSL

 

Last year in December, I updated the first customer from HPE Data Protector 9.04 to 9.05. Immediately after the first tests I noticed, that backups were made using the NBDSSL transport. I expected that the SAN transport would be used, because the prerequisites were met and it has worked until the update. I opened a case at the HPE support und I was advised to install the hotfix QCIM2A65619. With this hotfix, several files were replaced:

x8664\A.09.00\VEPA\DP_HOME_DIR\bin\components\DpSessionLogger.dll
x8664\A.09.00\VEPA\DP_HOME_DIR\bin\ViAPI.dll
x8664\A.09.00\VEPA\DP_HOME_DIR\bin\vCloudAPI.dll
x8664\A.09.00\VEPA\DP_HOME_DIR\bin\DPComServer.exe
x8664\A.09.00\VEPA\DP_HOME_DIR\bin\components\vepalib_vmware.dll
x8664\A.09.00\VEPA\DP_HOME_DIR\bin\vepa_util.exe
x8664\A.09.00\VEPA\DP_HOME_DIR\bin\vepa_bar.exe
x8664\A.09.00\VEPA\DP_HOME_DIR\bin\components\vepalib_vcd.dll
x8664\A.09.00\VEPA\DP_HOME_DIR\bin\components\DPHostingEnvironmentComponent.dll
x8664\A.09.00\VEPA\DP_HOME_DIR\bin\components\CDpDataMoverComponent.dll
x8664\A.09.00\VEPA\DP_HOME_DIR\bin\components\vepalib_hyperv.dll
x8664\A.09.00\VEPA\DP_HOME_DIR\bin\components\DpBackendService.dll
x8664\A.09.00\VEPA\DP_HOME_DIR\lib\vddk

The hotfix solved the issue. And to be honest: I didn’t care why it has worked after applying the hotfix. I had the same issue at multiple customers and applying the hotfix solved the issue in each case.

Today, I was reading through the HPE Data Protector 9.06 Integration Guide and the HPE Data Protector 9.0x Virtualization Support Matrix and I stumbled over this table:

Data Protector versionsVMware VDDK componentSupported backup / mount proxy operating systems
9.00, 9.01VDDK 5.5.0Windows Server 2003 R2 (x64)
Windows Server 2008, 2008 R2 (x64)
Windows Server 2012 (x64)
RHEL 5.9 (x64)
RHEL 6.2, 6.3 (x64)
SLES 10.4 (x64)
SLES 11 (x64)
9.02, 9.03VDDK 5.5.3Windows Server 2003 R2 (x64)
Windows Server 2008, 2008 R2 (x64)
Windows Server 2012 (x64)
RHEL 5.9 (x64)
RHEL 6.2, 6.3, 6.4 (x64)
SLES 10.4 (x64)
SLES 11 (x64)
9.04VDDK 6.0Windows Server 2008 R2 (x64)
Windows Server 2012, 2012 R2 (x64)
RHEL 6.6, 7.0 (x64)
SLES 11, 12 (x64)
9.05VDDK 6.0 U1Windows Server 2008 R2 (x64)
Windows Server 2012, 2012 R2 (x64)
RHEL 6.6, 7.0 (x64)
SLES 11, 12 (x64)
9.06VDDK 6.0 U2Windows Server 2008 R2 (x64)
Windows Server 2012, 2012 R2 (x64)
RHEL 6.6, 7.0 (x64)
SLES 11, 12 (x64)

There was a footnote for VDDK 6.0 U1.

The VM backups does not use SAN transport mode on vSphere 5.1, 5.5 (and its updates) environment and falls back to NBDSSL/NBD. This is because of VDDK 6.0 U1 issue. For more information, see VMware Knowledge Base.

Ups… that’s my issue! The footnote inclued a link to VMware KB2135621 (Virtual Disk Development Kit 6.0 U1 Backup and Restore commands fail using SAN transport mode on ESXi 5.5.x hosts on both Windows and Linux proxies). Described symptoms:

  • Virtual Disk Development Kit 6.0 Update 1 backup and restore commands fail using SAN transport mode on ESXi 5.5.x hosts.
  • This issue occurs on both Windows and Linux proxies.

Yep, that’s my issue. The customers that were observing this issue were running vSphere 5.5, not 6.0. With this knowledge, I checked the version of the vixDiskLib.dll on one of the patched Data Protector hosts. And there it was:

vixDiskLib

The vixDiskLib.dll had the build version 6.0.0 build-2498720, which is the build version of the Virtual Disk Development Kit 6.0. So it seems, that the Data Protector hotfix QCIM2A65619 makes a downgrade of the VDDK that is used by Data Protector.

KB2135621 describes, that this issue is resolved in in VMware vCenter Server 6.0 Update 2. This also implies, that this is fixed for VDDK 6.0 U2 and therefore Data Protector 9.06.

I’m sorry Data Protector. It was not your fault!

Guest customizations fails after upgrade to VMware vSphere 6

VMware vSphere 6 is now an year old and it was time to update my lab to vSphere 6. The update went smooth, and everything has worked as expected. Some days later, I updated the master VM of a small automated desktop pool. I’m using VMware Horizon 6.2.1 in my lab to deploy a small number of Windows 8.1 VMs for tests, administration etc. The recompose of the pool failed during the guest customization.

view_error_decrypt_password

I checked the customization specification immediately and got an error in the vSphere C# client.

vcsa_error_decrypt_password

Interestingly, I got no error in the vSphere Web Client:

vcsa_error_decrypt_password_web_client

After re-entering the Administrator password, the  customization specification was usable again. No errors so far.

A quick search in the VMware KB lead me to the article “Virtual machines with customizations fail to deploy when using Custom SSL Certificates (1019893)“. But this article doesn’t apply to vCenter 6.0. For the notes: I’m using CA-signed certificates in my environment. It seems to be a good idea to re-enter the passwords in customization specifications after a vCenter migration/ upgrade (5.x to 6.x or from VCSA 5.x to 6.x).

VMware vCenter Storage Monitoring Service & Auto Deploy plug-in failed after upgrade to vSphere 6.0

Yesterday I did an upgrade of my vCenter Server Appliance 5.5 U3 to 6.0 U1. This was the first step to update my lab infrastructure to vSphere 6.0. A bit late, but better late than never. The update of the VCSA itself went smooth. No problems with certificates, hosts, VMs or PernixData FVP. But then, I discovered two errors on the old vSphere C# client (I know that I should use the Web Client…)

vsphere_client_plug-in_error

The first message indicates a problem with the VMware vCenter Storage Monitoring Service.

The other error message indicates a problem with Auto Deploy.

Both error messages have harmless reasons.

VMware vCenter Storage Monitoring Service

The error message regarding the VMware vCenter Storage Monitoring Service is an expected behaviour. The answer to this is written in the VMware vCenter Server 6.0 Release Notes.

  • vSphere Web Client. The Storage Reports selection from an object’s Monitor tab is no longer available in the vSphere 6.0 Web Client.
  • vSphere Client. The Storage Views tab is no longer available in the vSphere 6.0 Client.

To get rid of this message, simply ignore it or remove the old vSphere C# client. Newer releases of the C# client don’t have this plugin. If you need older C# client releases, you can ignore this error message.

Auto Deploy

The error message regarding Auto Deploy is caused by a stopped Auto Deploy service. The Auto Deploy service is stopped by default.

vcenter_6_auto_deploy

You can find the Auto Deploy service in the vSphere Web Client under Administration > System Configuration > Services > Auto Deploy. The error message will be gone, if you start the service. If you don’t need Auto Deploy, you can ignore this error message.

Marco van Baggum wrote a blog post about the same messages some months ago. He highlighted an alternative way to get rid of these messages.

Considerations when using Microsoft NLB with VMware Horizon View

A load balancer is an integral component of (nearly) every VMware Horizon View design. Not only to distribute the connections among a number of connection or security servers, but also to provide high availability in case of a connection or security server failure. Without a load balancer, connection attempts will fail, if a connection or security server isn’t available. Craig Kilborn wrote an excellent article about the different possible designs of load balancing. Craig highlighted Microsoft Network Load Balancing (NLB) as one of the possible ways to implement load balancing. Jason Langer also mentioned Microsoft NLB in his worth reading article “The Good, The Bad, and The Ugly of VMware View Load Balancing“.

Why Microsoft NLB?

Why should I use Microsoft NLB to load balance connections in my VMware Horizon View environment? It’s a question of requirements. If you already have a load balancer (hopefully redundant), then there is no reason to use Microsoft NLB for load balancing. Really no reason? A single load balancer is a single point of failure and therefore you should avoid this. Instead of using a single load balancer, you could use Microsoft NLB. Microsoft NLB is for free, because it’s part of Windows Server. At least two servers can form a highly available load balancer and you can install the NLB role directly onto your Horizon View connection or security servers.

How does it work?

Microsoft Windows NLB is part of the operating system since Windows NT Server. Two or more Windows servers can form a cluster with at one or more virtual ip addresses. Microsoft NLB knows three different operating modes:

  • Unicast
  • Multicast
  • Multicast (IGMP)

Two years ago I wrote an article why unicast mode sucks: Flooded network due HP Networking Switches & Windows NLB. This leads to the recommendation: Always use multicast (IGMP) mode!

Nearly all switches support IGMP Snooping. If not, spend some money on new switches. Let me get this clear: If your switches support IGMP Snooping, enable this for the VLAN to which the cluster nodes are connected. There is no need to configure static multicast mac addresses or dedicated VLANs to avoid flooding.

If you select the multicast (IGMP) mode, each cluster node will periodically send an IGMP join message to the multicast address of the group. This address is always 239.255.x.y, where x and y corresponds to the last two octets of the  virtual ip address. Upon receiving these multicast group join messages, the switch can send multicasts only to the ports of the group members. This avoids network flooding. Multicast (IGMP) simplifies the configuration of a Microsoft NLB.

  • Enable IGMP Snooping for the VLAN of the cluster nodes
  • Enable the NLB role on each server that should participate the cluster
  • Create a cluster with multicast (IGMP) mode on the first node
  • Join the other nodes
  • That’s it!

The installation process of a Microsoft NLB is not particularly complex, and if the NLB cluster is running, there is not much maintenance to do. As already mentioned, you can put the NLB role on each connection or connection server.

What are the caveats?

Sounds pretty good, right? But there are some caveat when using Microsoft NLB. Microsoft NLB does not support sticky connections and it does not support service awareness. Why is this a problem? Let’s assume that you have enabled “HTTP(S) Secure Tunnel”, “PCoIP Secure Gateway” and “Blast Secure Gateway”.

connection_server_settings

In this case, all connections are proxied through the connection or security servers.

The initial connection from the Horizon View client to the connection or security server is used for authentication and selection of the desired desktop pool or application. This is a HTTPS connection. At this point, the user has no connection to a pool or an application. When the user connects to a desktop pools or application, the client will open a second HTTPS connection. This connection is used to provide a secure tunnel for RDP. Because it’s the same protocol, the connection will be directed to the same connection or security server as before. The same applies to BLAST connections. But if the user connects to a pool via PCoIP, the View client will open a new connection using PCoIP with destination port 4172. If the PCoIP External URL refers to the load balanced URL, the connection can be directed to another connection or security server. If this is the case, the PCoIP connection will fail. This is because the source ip address might be the same, but another destination port is used. VMware describes this behaviour in KB1036376 (Unable to connect to the PCoIP Secure Gateway when using Microsoft NLB Clustering).

Another big caveat is the missing service awareness. Microsoft NLB does not check, if the load balanced service is available. If the load balanced service fails, Microsoft NLB will not stop to redirect requests to the the cluster node that is running the failed service. In this case, the users connection requests will fail.

Still the ugly?

So is Microsoft NLB still the ugly option? I don’t think so. Especially for small deployments, where the customer does not have a load balancer, Microsoft NLB can be a good option. If you want to load balance connection servers, Microsoft NLB can do a great job. In case of load balancing security servers, you should take a look at KB1036376, because you might need at least 3 public IP addresses for NAT. The missing service awareness can be a problem, but you can workaround it with a responsive monitoring.

In the end, it is a question of requirements. If you plan to implement other services that might require a load balancer, like Microsoft Exchange, you should take a look at a redundant, highly available load balancer appliance.

PowerCLI: Get-LunPathState

Careful preparation is a key element to success. If you restart a storage controller, or even the whole storage, you should be very sure that all ESXi hosts have enough paths to every datstore. Sure, you can use the VMware vSphere C# client or the Web Client to check every host and every datastore. But if you have a large cluster with a dozen datastores and some Raw Device Mappings (RDMs), this can take a looooong time. Checking the path state of each LUN is a task, which can be perfectly automated. Get a list of all hosts, loop through every host and every LUN, output a list of all hosts with all LUNs and all paths for each LUN. Sounds easy, right?

For a long time, I used this PowerCLI script for checking the LUN path state. But now I decided to give something back and I tweaked it a bit for my needs.

Feel free to use and/ or modify it.

Screen resolution scaling has stopped working after Horizon View agent update

Another inconvenience that I noticed during the update process from VMware Horizon View 6.1.1 to 6.2 was, that the automatic screen resizing stopped working. When I connected to a desktop pool with the VMware Horizon client, I only got the screen resolution of the VM (the resolution that is used when connecting to the VM with the vSphere console)), not 1920×1200 as expected. This issue only occured with PCoIP, not with RDP. I had this issue with a static desktop and a dynamic desktop pool, and it occurred after updating the Horizon View agent. The resolution scaling worked with a Windows 2012 R2 RDS host, when I connected to a RDS with PCoIP.

VMware KB1018158 (Configuring PCoIP for use with View Manager) did not solved the problem. I checked the VMX version, the video RAM config etc. Nothing has changed, everything was configured as expected. At this point it was clear to me, that this must be an issue with the Horizon View agent. I took some snapshots and tried to reinstall the Horizon View agent. I removed the Horizon View agent and the VMware tools from one of my static desktops. After a reboot, I installed the VMware tools and then the Horizon agent. To my surprise, this first attempt has solved the problem. I tried the same with my second static desktop pool VM and with the master VM of my dynamic desktop pool (don’t forget to recompose the VMs…). This workaround has fixed the problem in each case.

I don’t know if this is a bug. I haven’t found any hints in the VMware Community forum or blogs. Maybe someone knows the answer.

VMware Horizon View agent update on RDS host fails with “Internal Error 25030”

I’m running a small VMware Horizon View environment in my lab. Nothing fancy, but all you need to show what Horizon View can do for you. This environment includes a Windows Server 2012 R2 RDS host. During the update process from Horizon View 6.1.1 to 6.2, I had to update the View agent on this RDS host. This update installation failed with an “Internal Error 25030”, followed by a rollback. Fortunately I had a snapshot, so I went back to the previous state and tried the update again. This attempt also went awry.

To make a long story short: Read the fscking release notes! This quote is taken from the Horizon View 6.2 release notes:

When you upgrade View Agent 6.1.1 to View Agent 6.2 on an RDS host running on Windows Server 2012 or 2012 R2, the upgrade fails with an “Internal Error 25030” message.
Workaround: Uninstall View Agent 6.1.1, restart the RDS host, and install View Agent 6.2.

And this is not the first time that this error occurred. I found this quote in the the Horizon View 6.1.1 release notes:

When you upgrade View Agent 6.1 to View Agent 6.1.1 on an RDS host running on Windows Server 2012 or 2012 R2, the upgrade fails with an “Internal Error 25030” message.
Workaround: Uninstall View Agent 6.1, restart the RDS host, and install View Agent 6.1.1

If you take a closer look at these two statements, you might notice some similarities… But I do not want to be spiteful. The workaround did the trick. Simply uninstall the View agent (if it’s still installed after the rollback… that was not the case with me), reboot and reinstall the View agent.

PernixData Architect Software

With the general availability of PernixData FVP 3.1, PernixData released the first version of PernixData Architect.

One of the biggest problems today is, that management tools are often focused on deployment and monitoring of applications or infrastructure. This doesn’t lead to a holistic view over applications and related data center infrastructure. You have to monitor at several points within the application stack and even then, you won’t get a holistic view. Without proper information, you can’t make proper decisions. At this point, PernixData Architect comes into play.

PernixData Architect is a software platform and supports the complete IT life cycle from design and deployment over operation and optimization. It supports the decision making process with data gathering and big data analytics. PernixData Architect continuously generates information and recommendations based on gathered data from VMs, storage devices, vCenter, network etc. This information pool can analysed with big data techniques. Data are gathered, data is set into context (this is what information is) and information are linked and combined with recommendations. Here are some examples what PernixData Architect can do for you (Source)

  • Descriptive Analytics – Identify and profile the top 10 VMs on latency, throughput and IOPS.
  • Predictive Analytics – Calculate server-side resources needed to run a VM in Write Through versus Write Back mode, ensuring optimal hardware is allocated before a problem arises.
  • Prescriptive Analytics – Recommend ideal server-side resources based on application patterns.

PernixData Architect is a software-only solution and can deployed with our without PernixData FVP. Without FVP, Architect can be used as a monitoring tool and gives you visibility, management and recommendations. Architect works with any server and storage platform that is compatible with VMware vSphere!

I’ve installed the latest PernixData FVP 3.1 release in my lab and enabled the 30 days trial period for PernixData Architect. You can access Architect through the web UI.

prnx_architect_1

As you can see, I have two clusters in my lab and both are accelerated using PernixData FVP. One cluster uses Distributed Fault Tolerant Memory (DFTM), the other cluster uses SSDs as acceleration ressources. If Architect is enabled, FVP doesn’t display any stats and refers to the Architect UI. Below a screenshot of the summary screen which gives you a good overview at the first glance.

prnx_architect_2

Architect includes much more stats than FVP.

prnx_architect_3

On the “Intelligence” page, you get values for the working set for each ESXi host in the cluster. This is an important value for the right sizing of your acceleration ressources.

prnx_architect_4

As mentioned, PernixData Architect uses the gathered data to give you recommendations in realtime. Even in my lab cluster,  there are things to improve. ;)

prnx_architect_5

This is only a short overview about PernixData Architect. But you might see now what insight architect can give you. If you are curious to see what PernixData FVP and Architect can do for you, you can simply install both products as part of a proof-of-concept and test them for 30 days. Even if you don’t want to install FVP, Architect can used without FVP. And even FVP can used without acceleration ressources in a monitoring mode.

Using VCSA as remote syslog – Don’t forget the log rotation!

Important note: It seems that vCenter Server Appliance updates revert the changes. Please check the settings after each update!

The VMware vCenter Server Appliance (VCSA) can act as a remote syslog destition for ESXi hosts. This is very handy for troubleshooting and I really recommend to use this feature.  But VMware ESXi hosts can be really chatty and therefore it’s a good idea to keep an eye on the free disk space of the VCSA.

Yesterday, a colleague had an interesting support case. A customer reported that his Veeam Backup & Replication jobs failed and that he was unable to login to the vCenter with the vSphere Client and vSphere Web Client. My colleague checked the VCSA VM and noticed that the VPXD failed to start (“Waiting for vpxd to initialize: ….failed”). Together we checked the appliance and the log files. The vpxd.log (/var/log/vmware/vpx) was updated weeks ago, but the last entry was interesting: No space left on device. But there was free disk space on /storage/log. I immediately checked the inode count with df -i and there it was: No free inodes. Why is this a problem? Each name entry in the file system consumes an inode. If there are no free inodes, no new directories and files can be created. The error message is the same as for missing disk space. Something had to have created a lot of files on /storage/log. Because /var/log/vmware is a symbolic linkt to /storage/log/vmware, it had to be something on the /storage/log partition. We checked the remote syslog location under /storage/log/remote and found gigabytes and an incredible number of logs. After removing the logs, the VPXD was able to start and the inode count was on a normal level.

But why were there so many logs? We checked the logrotate config and found a faulty config for the remote syslog files. Instead of rotating logs and remove old ones, this config rotated all logs every day and potentiated the number of logs. Please note that there is no logrotate config to rotate remote syslog files by default! This one was added manually.

This is the default config for the remote syslog-collector of the VCSA:

As you can see, with these settings a folder for each host and each month is created. According to this VMTN posting, we changed the syslog-collector config a bit:

With this settings, only a single file per host is created. We made also a change to /etc/logrotate.d/syslog and added this at the end:

With this configuration 30 log files will be preserved. The number of log files or how often log rotation should happen (weekly or daily) can easily be adjusted. But these settings should be sufficient for small environments.

It’s important to understand that the VCSA has different disks and that the disks are mountend to different mount points within the root filesystem. This is from a vSphere 5.5 VCSA:

/var/log/vmware and /var/log/remote are links to /storage/log/vmware and /storage/log/remote. Make sure that there is always enough free diskspace on ALL disks! I also want to highlight VMware KB2092127 (After upgrading to vCenter Server Appliance 5.5 Update 2, pg_log file reports this error: WARNING: there is already a transaction in progress). This error hit me a couple of times…

HP offers 1TB StoreOnce VSA for free

A free StoreOnce VSA, like the well known 1 TB StoreVirtual VSA? That would be too cool to be real. But it is real! Since February, HP offers a free 1 TB version of their StoreOnce VSA. I totally missed this announcement, but thanks to Calvin Zito I noticed it today:

The link leads to another blog post from Ashwin Shetty (Can you protect your data for free? Introducing the new free 1TB StoreOnce VSA), in which he provides more information about the free 1 TB StoreOnce VSA.

HP StoreOnce VSA

HP StoreOnce VSA runs with the same software as the hardware-based StoreOnce appliances, but it’s delivered as a VM. You can run the VM on top of VMware ESXi, Microsoft Hyper-V or KVM. Beside the free 1 TB license, the StoreOnce VSA can purchased with 4 TB, 10 TB or 50 TB capacity (usable, non-deduplicated). In contrast to the hardware-based appliances, the StoreOnce VSA comes with licenses for replication and StoreOnce Catalyst. This makes the StoreOnce VSA a perfect fit for remote and branch offices. You can quickly deploy the StoreOnce VSA and replicate the backuped data to the central datacenter. But you can also deploy the VSA with the 4 TB, 10 TB or 50 TB license in your central datacenter and use it as a replication target for StoreOnce VSAs in the remote and branch offices (the replication target needs the replication license). A single VSA can act as replication target for up to 8 StoreOnce VSA and/ or StoreOnce appliances. You can scale the free 1 TB license with license upgrades to 4 TB, 10 TB and 50 TB. The StoreOnce VSA supports Catalyst, VTL (iSCSI) and as NAS (CIFS or NFS) backup targets. Take a look into the QuickSpecs for more information. I also recommend to read the two blog posts from Ashwin Shetty on Around the Storage Block:

Last year I’ve published several posts about the StoreOnce VSA. I recommend to download the free 1 TB StoreOnce VSA and to play with it. Some of my blog posts should help you get started.