Author Archives: Patrick Terlisten

About Patrick Terlisten

vcloudnine.de is the personal blog of Patrick Terlisten. Patrick has a strong focus on virtualization & cloud solutions, but also storage, networking, and IT infrastructure in general. He is a fan of Lean Management and agile methods, and practices continuous improvement whereever it is possible.Feel free to follow him on Twitter and/ or leave a comment.

Veeam Backup & Replication: Backup of Microsoft Active Directory Domain Controller VMs

To backup a virtual machine, Veeam Backup & Replication needs two permissions:

  • permission to access and backup the VM, as well as the
  • permission to do specific tasks inside the VM

to guarantee a consistent backup. The former persmission is granted by the user account that is used to access the VMware vCenter server (sorry for the VMW focust at this point). Usually, this account has the Administrator role granted at the vCenter Server level. The latter permission is granted by a user account that has permissions inside the guest operating system.

geralt / pixabay.com/ Creative Commons CC0

Something I often see in customer environments is the usage of the Domain Administrator account. But why? Because everything works when this account is used!

There are two reasons for this:

  • This account is part of the local Administrator group on every server and client
  • customers tend to grant the Administrator role to the Domain Admins group on vCenter Server level

In simple words: Many customers use the same account to connect to the vCenter, and for the application-aware processing of Veeam Backup & Replication. At least for Windows servers backups.

Houston, we have a problem!

Everything is fine until customers have to secure their environments. One of the very first things customers do, is to protect the Administrator account. And at this point, things might go wrong.

Using a service account to connect to the vCenter server is easy. This can be any account from the Active Directory, or from the embedded VMware SSO domain. I tend to create a dedicated AD-based service account. For the necessary permissions in the vCenter, you can grant this account Administrator permissions, or you can create a new user role in the vCenter. Veeam offers a PDF document which documents the necessary permissions for the different Veeam tasks.

The next challenge is the application-aware processing. For Microsoft SQL Server, the user account must have the sysadmin privileges on the Microsoft SQL Server. For Microsoft Exchange, the user must be member of the local Administrator group. But in case of a Active Directory Domain Contoller things get complicated.

A Domain Controller does not have a local user database (SAM). So what user account or group membership is needed to backup a domain controller using application-aware processing?

This statement is from a great Veeam blog post:

Permissions: Administrative rights for target Active Directory. Account of an enterprise administrator or domain administrator.

So the service account used to backup a domain controller is one of the most powerful accounts in the active directory.

There is no other way. You need a Domain or Enterprise Administrator account. I tend to create a dedicated account for this task.

I recommend to create a service account to connect the vCenter, and which is added to the local Administrator group on the servers to backup, and I create a dedicated Domain/ Enterprise Administrator account to backup the virtual Domain Controllers.

The advantage is that I can change apply different fine-grained password policies to this accounts. Sure, you can add more security by creating more accounts for different servers, and applications, add a dedicated role to the vCenter for Veeam etc. But this apporach is easy enough to implement, and adds a significant amount of user account security to every environment that is still using DOMAIN\Administrator to backup their VMs.

Veeam and StoreOnce: Wrong FC-HBA driver/ firmware causes Windows BSoD

One of my customers bought a very nice new backup solution, which consists of a

  • HPE StoreOnce 5100 with ~ 144 TB usable capacity,
  • and a new HPE ProLiant DL380 Gen10 with Windows Server 2016

as new backup server. StoreOnce and backup server will be connected with 8 Gb Fibre-Channel and 10 GbE to the existing network and SAN. Veeam Backup & Replication 9.5 U3a is already in use, as well as VMware vSphere 6.5 Enterprise Plus. The backend storage is a HPE 3PAR 8200.

This setup allows the usage of Catalyst over Fibre-Channel together with Veeam Storage Snapshots, and this was intended to use.

I wrote about a similar setup some month ago: Backup from a secondary HPE 3PAR StoreServ array with Veeam Backup & Replication.

The OS on the StoreOnce was up-to-date (3.16.7), Windows Server 2016 was installed using HPE Intelligent Provisioning. Afterwards, a drivers and firmware were updated using the latest SPP 2018.11 was installed. So all drivers and firmware were also up-to-date.

After doing zoning and some other configuration tasks, I installed Veeam Backup and Replication 9.5 U3, configured my Catalyst over Fibre-Channel repository. I configured a test backup… and the server failed with a Blue Screen of Death… which is pretty rare since Server 2008 R2.

geralt / pixabay.com/ Creative Commons CC0

I did some tests:

  • backup from 3PAR Storage Snapshots to Catalyst over FC repository – BSoD
  • backup without 3PAR Storage Snapshots to Catalyst over FC repository – BSoD
  • backup from 3PAR Storage Snapshots to Catalyst over LAN repository – works fine
  • backup without 3PAR Storage Snapshots to Catalyst over LAN repository – works fine
  • backup from 3PAR Storage Snapshots to default repository – works fine
  • backup without 3PAR Storage Snapshots to default repository – works fine

So the error must be caused by the usage of Catalyst over Fibre-Channel. I filed a case at HPE, uploaded gigabytes of memory dumps and heard pretty less during the next week.

HPE StoreOnce Support Matrix FTW!

After a week, I got an email from the HPE support with a question about the installed HBA driver and firmware. I told them the version number and a day later I was requested to downgrade (!) drivers and firmware.

The customer has got a SN1100Q (P9D93A & P9D94A) HBA in his backup server, and I was requested to downgrade the firmware to version 8.05.61, as well as the driver to 9.2.5.20. And with this firmware and driver version, the backup was running fine (~ 750 MB/s hroughput).

I found the HPE StoreOnce Support Matrix on the SPOCK website from HPE. The matrix confirmed the firmware and driver version requirement (click to enlarge).

Fun fact: None of the listed HBAs (except the Synergy HBAs) is supported with the latest StoreOnce G2 products.

Lessons learned

You should take a look at those support matrices – always! HPE confirmed that the first level recommendation “Have you trieed to update to the latest firmware” can cause similar problems. The fact, that the factory ships the server with the latest firmware does not make this easier.

Out-of-Office replies are dropped due to empty MAIL FROM

Today I had an interesting support call. A customer noticed that Out-of-Office replies were not received by recipients, even though the OoO option were enabled for internal and external recipients. Internal recipients got the OoO reply, but none of the external recipients.

cattu/ pixabay.com/ Creative Commons CC0

The Message Tracking Log is a good point to start. I quickly discovered that the Exchange server was unable to send the OoO mails. You can use the eventid FAIL to get a list of all failed messages.

Very interesting was the RecipientStatus of a failed mail.

550 Requested action not taken: mailbox unavailable is a pretty interesting error when sending mails over a mail relay of your ISP. Especially when other mails were successfully sent over the same mail relay.

Next stop: Protcol log of the send connector

I enabled the logging on the send connector using the EAC. This option is disabled by default. Depending on the amount of mails sent over the connector, you should make sure to disable the logging after your troubleshooting session. To enable the logging, follow these steps:

  • Open the EAC and navigate to
  • Mail flow > Send connectors
  • Select the connector you want to configure, and then click Edit
  • On the General tab in the Protocol logging level section, select the Verbose option
  • When you’re finished, click Save

The protocol log can be found under %ExchangeInstallPath%TransportRoles\Logs\Hub\ProtocolLog\SmtpSend.

After enabling the logging and another test mail, the log contained the necessary details to find the root cause. This is the interesting part of the SMTP communication:

The error occured right after the exchange server issued MAIL FROM:<> . But why is the MAIL FROM empty?

RFC 2298 is the key

An Out-of-Office reply is a Delivery Status Notification message. And RFC 2298 clearly states:

The envelope sender address (i.e., SMTP MAIL FROM) of the MDN MUST be
null (<>), specifying that no Delivery Status Notification messages
or other messages indicating successful or unsuccessful delivery are
to be sent in response to an MDN.

So the empty MAIL FROM is something that a mail relay should expect. In case of my customer that mail relay seems to act different. Maybe some kind of spam protection.

Database Availability Group (DAG) witness is in a failed state

As part of a maintenance job I had to update a 2-node Exchange Database Availability Group and a file-share witness server.

After the installation of Windows updates on the witness server and the obligatory reboot, the witness left in a failed state.

In my opinion, the re-creation of the witness server and the witness directory cannot be the correct way to solve this. There must be another way to solve this. In addition to this: The server was not dead. Only a reboot occured.

Check the basics

Both DAG nodes were online and working. A good starting point is a check of the cluster resources using the PowerShell.

In my case the cluster resource for the File Share Witness was in a failed state. A simple Start-ClusterResource solved my issue immediately.

In this case, it seems that the the cluster has marked the file share witness as unreliable, thus the resource was not started after the file share witness was back online again. In this case, I managed it to manually bring it back online by running Start-ClusterResource on one of the DAG members.

Vembu BDR Suite v4.0 is now generally available

Vembu Technologies was founded in 2002, and with 60.000 customers and more than 4000 partners, Vembu is a leading provider with a comprehensive portfolio of software products and cloud services to small and medium businesses.

Last week, Vembu has announced the availability of Vembu BDR Suite v4.0! Vembu’s new release is all about maintaining business continuity and ensuring high availability. Apart from new features, this release features significant enhancements and bug fixes that are geared towards performance improvement.

Vembu Technologies/ Vembu BDR Essentials/ Copyright by Vembu Technologies

The Vembu BDR Suite

The Vembu BDR Suite is an one stop solution to all your backup and disaster recovery needs. That is what Vembu says about their own product. The BDR Suite covers

  • Backup and replication of VMs running on VMware vSphere and Microsoft Hyper-V
  • Backup and bare-metal recovery for physical servers and workstations (Windows Server and Desktop)
  • File and application backups of Microsoft Exchange, Microsoft SQL Server, Microsoft SharePoint, Microsoft Active Directory, Microsoft Outlook, and MySQL
  • Creating of backup copies and transfer of them to a DR site

More blog posts about Vembu:

Vembu BDR Essentials – affordable backup for SMB customers
The one stop solution for backup and DR: Vembu BDR Suite

What’s new in 4.0?

Vembu BDR Suite v4.0 has got some pretty nice new features. IMHO, there are four highlights:

  • Hyper-V Failover Cluster Support for Backup & Recovery
  • Shared VHDX Backup
  • Hyper-V Checksum Based Incremental, and the
  • Credential Manager

There is a significat chance that you use a Hyper-V Failover Cluster if you have more than one Hyper-V host. With v4.0 Vembu added support for backup and recovery for the VMs residing in a Hyper-V Failover Cluster. Even if the VMs running on Hyper-V cluster move from one host to another, the backups will continue to run without any interruption.

A feature, that I’m really missing in VMware and Veeam, is the support for the backup shared VHDX files. v4.0 added support for this.

Vembu BDR Suite v4.0 also added support bot performing incremental backups with Hyper-V. They call it Checksum based incremental method, but it is in fact Change Block Tracking. An important feature for Hyper-V customers!

The Vembu Credential Manager allows you to store the necessary credentials at one place, use it everywhere inside the Vembu BDR Suite v4.0.

But there are also other, very nice enhancements.

  • Handling new disk addition for VMware ESXi and Hyper-V, which allows the backup of newly added disks at the next backup. In prioir releases, newly added disks were only backuped during the next full backup.
  • Reconnection for VMware ESXi and Hyper-V jobs in case of a dropped network connection
  • Application-wware processing for Hyper-V VMs can now enabled on a per-VM basis
  • API for VM list with Storage utilization report which allows you to generate detailed reports whenever you need one

Interested in trying Vembu BDR suite?, Try a 30-day free trial now! For any questions, simply send an e-mail to vembu-support@vembu.com or follow them on Twitter.

Exam prep & experience: Citrix NetScaler Advanced Topics: Security, Management, and Optimization (1Y0-340)

In May 2018, Citrix released their new Citrix Certified Expert – Networking certification, which completet the networking certification path at the upper end (blog post on training.citrix.com). The track starts with the Associate (CCA-N), the lower-level certification is a requirement for achieving the higher-level certification, continues with the Professional (CCP-N), and ends with the Expert (CCE-N) certification. This is pretty cool, and I’m very happy that Citrix now offers the CCE-N, because the expert-level certification was missing all the time.

kmicican/ pixabay.com/ Creative Commons CC0

Everything is cool… except you have passed exam 1Y0-351 to gain your CCP-N. In this case, you have to pass 1Y0-340 until Dec 31 2018. Otherwise you have to start with the CCA-N, after the validity period of your CCP-N is over (3y after passing the exam).

Bad move, Citrix, bad move. I’m really disappointed. I passed 1Y0-351 in Nov 2017, and now, 12 months later, I have to book, pay, and pass 1Y0-340 if I not want to start with a CCA-N in Nov 2020. Bad move, Citrix, bad move!

Exam 1Y0-340 is titled as “Citrix NetScaler Advanced Topics: Security, Management, and Optimization”, where as 1Y0-351 was titeld as “Citrix NetScaler 10.5 Essentials and Networking”. You can assume that more in-depth knowledge is needed to pass the exam, as it was necessary for 1Y0-351. Note the “Advanced Topics” in the exam title.

But what are these “advanced topics”?  According to the exam prep guide, the perfect candidate for the 1Y0-340 exam can deploy and/or manage

  • Citrix NetScaler Application Firewall (AppFirewall) to secure application access in a Citrix NetScaler 12 environment, as well as
  • NetScaler Management and Analytics System (NMAS) to administer a Citrix NetScaler environment, or
  • Optimize NetScaler-managed application delivery traffic

Citrix NetScaler Application Firewall (AppFirewall)

You should take an in-depth look at these topics:

  • Application Firewall Overview
  • Application Firewall Profiles and Policies
  • Regular Expression
  • Attacks and Protections
  • Monitoring and Troubleshooting
  • Security and Filtering

NetScaler Management and Analytics System (NMAS)

  • NetScaler MAS: Introduction and Configuration
  • Managing and Monitoring NetScaler Instances
  • Managing NetScaler Configurations
  • NetScaler Web Logging

Optimize NetScaler-managed application delivery traffic

  • Integrated Caching
  • Front-End Optimization
  • Tuning and Optimizations

How to prep?

The exam prep guide referres to the NetScaler documentation, as also to training material. Unfortunately I don’t have access to the newer training material, only to the training material from my CNS-220 course. But hey: At least we have tons of publically available NetScaler 12.0 documentation available!

The exam prep guide has a section in which Citrix outlines sections, objectives and references. You will find links to the NetScaler 12.0 documentation, as well as knowledge base articles, or blog posts. Go through it. Read it carefully!

The exam prep guide also outlines the section titles and weights. Two areas stand out:

  • Section 4: Attacks and Protections, and
  • Section 8: Managing and Monitoring NetScaler Instances

The section weights are directly map to the number of questions in the exam. If the exam has 60 questions, and section 4 has a weight of 21%, at least 12 questions will relate to “Attacks and Protections”.

How did it go?

First things first: I passed with a good score. The exam had 62 questions and I needed at least 62% to pass the exam. I passed with 82%. As a non-native English speaker that took the exam in a country where english is a foreign language, I got 30 minutes extra, resulting in 120 minutes for 62 questions. Plenty of time…

What should I say? It was a multiple choice test. Read the questions carefully. The exam guide did not lie to me. It came pretty close to the topics that were described in the guide. For most questions, my first “educated guess” was right. Sometimes, the least dumb answer seemed to be correct. ;)

It was a bit frustrating that Citrix has changed product names. NetScaler is no “Application Delivery Controller”, MAS is now known as “Citrix Application Delivery Management”. There was a button which showed a mapping table “old name – new name”.

If you are experienced with Citrix ADC deployments and configuration, I think the exam prep guide is enough to pass the exam.

Good luck!

Vembu VMBackup Deployment Scenarios

Vembu was founded in 2002 and has over 60,000 customers worldwide. One of their core products is the Vembu BDR Suite, which is an one stop solution to all your Backup and DR needs. I wrote a longer blog post about the Vembu BDR Suite.

One part of this suite is Vembu VMBackup, which is a data protection solution that is designed to backup VMware and Microsoft Hyper-V virtual machines secure and simple way. The offered features are compareable to Veeam Backup & Replication.

The core component of Vembu VMBackup is the Vembu BDR Backup server, which can be deployed in two ways:

  • On-premises Deployment
  • Hybrid Deployment

virnuls/ pixabay.com/ Creative Commons CC0

On-premises Deployment

In this deployment setup, customers deploy the product in their local environment. I think this is the most typical deployment type, where you install VMBackup on a physical server, in a VM or deployed as virtual appliance. Backup data is transferred  over LAN or SAN, and is written to the storage repositories. The Vembu BDR server acts as a centralized management point, where user can configure and manage backup and replication jobs.

In a simple deployment, the Vembu BDR Backup Server will act as backup proxy and management server instance. It is perfect for a small number of VMs with less simultaneous backup traffic and for VMBackup evaluation. The typical SMB environment.

If you seperate the management server from the backup proxy, the deployment changes to a distributed deployment. If necessary, multiple backup proxies can be deployed on physical hosts or in virtual machines. Customers can also deploy multiple BDR backups servers, which allows load balancing across a cluster of BDR backup servers. Pretty cool for bigger and/ or distributed environments. It allows customers to scale their backup solution over time.

On-Premises Deployment/ Vembu Technologies/ Copyright by Vembu Technologies

Hybrid Deployment

Backup is good, but having a backup copy offsite is better. Vembu OffsiteDR allows customers to create a copy of their backup data and transfer it to a DR location over LAN/ WAN. OffsiteDR instantly transfers backup data from a BDR Backup Server to an OffsiteDR server. Customers can restore failed VMs or missing files and application data in their DR site, or they can rebuild a failed BDR Backup Server from an OffsiteDR server.

Vembu Technologies/ OffsiteDR/ Copyright by Vembu Technologies

If customers don’t have a DR site, they can use Vembu CloudDR push a backup copy to the Vembu cloud. The data stored in the Vembu Cloud can easily be restored at anytime and to any location. Vembu uses AWS across all continents to asure the availability of their cloud services.

Vembu Technologies/ CloudDR/ Copyright by Vembu Technologies

Customers have the choice

It is obvious that customers have the freedom of choice how they deploy Vembu VMBackup.I like the virtual appliance approach, which eliminates the need for additional Windows Server licenses. More and more vendors tend to offer appliances for their products, just think about VMware vCenter Server Appliance, vRealize Orchestrator etc. So why not offer a backup server appliance? I wish other vendors would adopt this…

Another nice feature is the scale-out capability of Vembu. Start small and grow over time. Perfect for SMBs that want to start small and grow over time.

“Cannot execute upgrade script on host” during ESXi 6.5 upgrade

I was onsite at one of my customers to update a small VMware vSphere 6.0 U3 environment to 6.5 U2c. The environment consists of three hosts. Two hosts in a cluster, and a third host is only used to run a HPE StoreVirtual Failover Manager.

The update of the first host, using the Update Manager and a HPE custom ESX 6.5 image, was pretty flawless. But the update of the second host failed with “Cannot execute upgrade script on host”

typographyimages/ pixabay.com/ Creative Commons CC0

I checked the host and found it with ESXi 6.5 installed. But I was missing one of the five iSCSI datastores. Then I tried to patch the host with the latest patches and hit “Remidiate”. The task failed with “Cannot execute upgrade script on host”. So I did a rollback to ESXi 6.0 and tried the update again, but this time using ILO and the HPE custom ISO. But the result was the same: The host was running ESXi 6.5 after the update, but the upgrade failed with the “Upgrade Script” error. After this attempt, the host was unable to mount any of the iSCSI datastores. This was because the datastores were mounted ATS-only on the other host, and the failed host was unable to mount the datastores in this mode. Very strange…

I checked the vua.log and found this error message:

Focus on this part of the error message:

The upgrade script failed due to an illegal character in the output of esxcfg-info. First of all, I had to find out what this 0x80 character is. I checked UTF-8 and the windows1252 encoding, and found out, that 0x80 is the € (Euro) symbol in the windows-1252 encoding. I searched the output of esxcfg-info for the € symbol – and found it.

But how to get rid of it? Where does it hide in the ESXi config? I scrolled a bit up and down around the € symbol. A bit above, I found a reference to HPE_SATP_LH . This took immidiately my attention, because the customer is using StoreVirtual VSA and StoreVirtual HW appliances.

Now, my second educated guess of the day came into play. I checked the installed VIBs, and found the StoreVirtual Multipathing Extension installed on the failed host – but not on the host, where the ESXi 6.5 update was successful.

I removed the VIB from the buggy host, did a reboot, tried to update the host with the latest patches – with success! The cross-checking showed, that the € symbol was missing in the esxcfg-info output of the host that was upgraded first. I don’t have a clue why the StoreVirtual Multipathing Extension caused this error. The customer and I decided to not install the StoreVirtual Multipathing Extension again.

High CPU usage on Citrix ADC VPX

While building a small Citrix NetScaler… ehm… ADC VPX (I really hate this name…) lab environment, I noticed that the fan of my Lenovo T480s was spinning up. I was wondering why, because the VPX VM was just running for a couple of minutes – without any load. But the task manager told me, that the VMware Workstation Process was consuming 25% (I have a Intel i5 Quad Core CPU) CPU. So VMware Workstation was just eating a whole CPU core without doing anything. I would not care, but the fan… And it reminded me, that I’ve seen an similar behaviour in various VPX deployments on VMWare ESXi.

Fifaliana/ pixabay.com/ Creative Commons CC0

A quick search lead me to this Citrix Support Knowledge Center article: High CPU Usage on NetScaler VPX Reported on VMware ESXi Version 6.0. That’s exactly what I’ve observed.

The solution is setting the parameter cpuyield to yes.

The VPX does not need a reboot. Short after setting the parameter, the fan stopped spinning. Have I mentioned how I love silence on my desk? I’m pretty happy that my T480s is a really quiet laptop.

But what does this parameter is used for? In pretty simple words: To allocate CPU cycles, that are not used by other VMs. Until ADC VPX 11.1, the VPX was sharing CPU with other VMs. This changed with ADC VPX 12.0. Since this release, the VPX was like a child, that was playing with their favorite toy just to make sure, that no other child can play with it. Not very polite…

This is a quote from the Support Knowledge Center article:

Set ns vpxparam parameters:
-cpuyield: Release or do not release of allocated but unused CPU resources.

YES: Allow allocated but unused CPU resources to be used by another VM.

NO: Reserve all CPU resources for the VM to which they have been allocated. This option shows higher percentage in hypervisor for VPX CPU usage.
DEFAULT: NO

I don’t think that I would change this in production. But for lab environments, especially if you run this on VMware Workstation, I would set -cpuyield to yes .

Using Let’s Encrypt DNS-01 challenge validation with local BIND instance

I’m using Let’s Encrypt certificates for a while now. In the past, I used the standalone plugin (TLS-SNI-01) to get or renew my certificates. But now I switched to the DNS plugin. I run my own name servers with BIND, so it was a very low hanging fruit to get this plugin to work.

Clker-Free-Vector-Images/ pixabay.com/ Creative Commons CC0

To get or renew a certificate, you need to provide some kind of proof that you are requesting the certificate for a domain that is under your control. No certificate authority (CA) wants to be the CA, that hands you out a certificate for google.com or amazon.com…

The DNS-01 challenge uses TXT records in order to validate your ownership over a certain domain. During the challenge, the Automatic Certificate Management Environment (ACME) server of Let’s Encrypt will give you a value that uniquely identifies the challenge. This value has to be added with a TXT record to the zone of the domain for which you are requesting a certificate. The record will look like this:

This record is for a wildcard certificate. If you want to get a certificate for a host, you can add one or more TXT records like this:

There is a IETF draft about the ACME protocol. Pretty interesting read!

Configure BIND for DNS-01 challenges

I run my own name servers with BIND on FreeBSD. The plugin for certbot automates the whole DNS-01 challenge process by creating, and subsequently removing, the necessary TXT records from the zone file using RFC 2136 dynamic updates.

First of all, we need a new TSIG (Transaction SIGnature) key. This key is used to authorize the updates.

This key has to be added to the named.conf. The key is in the .key file.

The key is used to authroize the update of certain records. To allow the update of TXT records, which are needed for the challenge, add this to the zone part of you named.con.

The records start always with _acme-challenge.domainname.

Now you need to create a config file for the RFC2136 plugin. This file also includes the key, but also the IP of the name server. If the name server is running on the same server as the DNS-01 challenge, you can use 127.0.0.1 as name server address.

Now we have everything in place. This is a --dry-run from on of my FreeBSD machines.

This is a snippet from the name server log file at the time of the challenge.

You might need to modify the permissons for the directory which contains the zone files. Usually the name server is not running as root. In my case, I had to grant write permissions for the “bind” group. Otherwise you might get “permission denied”.