Author Archives: Patrick Terlisten

About Patrick Terlisten

vcloudnine.de is the personal blog of Patrick Terlisten. Patrick has a strong focus on virtualization & cloud solutions, but also storage, networking, and IT infrastructure in general. He is a fan of Lean Management and agile methods, and practices continuous improvement whereever it is possible. Feel free to follow him on Twitter and/ or leave a comment.

The one stop solution for backup and DR: Vembu BDR Suite

I have worked with a lot of backup software products during my career, but for the last years I have primarily worked with MicroFocus Data Protector (former HP OmniBack, HP Data Protector, or HPE Data Protector), and Veeam Backup & Replication. Data Protector was a great solution for traditional server environments, or when UNIX (HP-UX, AIX, Solaris etc.) compatibility was required. Features like Zero Downtime Backups, LAN-free or Direct SAN backups were available for many years. But their code quality has suffered severely in the recent years. The product no longer seemed like a one-stop shop. Some months ago, HPE sold its software division to MicroFocus and started to sell Veeam Backup & Replication through its channel. Some months prior selling the complete software division, HPE acquired Trilead, which is many of us well known because of their VM Explorer. Sad but true: Data Protector is dead to me.

I think I don’t have to say much about Veeam. Veeam is unbeaten when it comes down to virtualized server environments, and they constantly add features and extend their product portfolio. Think about their solutions Office 365, or Veeam Agent for Windows and Linux.

Why Vembu?

It is always good to have more than product in the portfolio, just because to give customers the choice between different products. If your only tool is a hammer, everthing looks like a nail. That is why I took a closer look at Vembu. I became aware of Vembu, because they asked to place an ad on vcloudnine. This was a year ago. So it was obvious to take a look at their products. Furthermore, Vembu and its products were mentioned many times in my Twitter timeline. Two good reasons to take a look at them!

Vembu Technologies was founded in 2002, and with 60.000 customers and more than 4000 partners, Vembu is a leading provider with a comprehensive portfolio of software products and cloud services to small and medium businesses. We are not talking about a newcomer!

The Vembu BDR Suite

The Vembu BDR Suite is an one stop solution to all your backup and disaster recovery needs. That is what Vembu says about their own product. The BDR Suite covers

  • Backup and replication of VMs running on VMware vSphere and Microsoft Hyper-V
  • Backup and bare-metal recovery for physical servers and workstations (Windows Server and Desktop)
  • File and application backups of Microsoft Exchange, Microsoft SQL Server, Microsoft SharePoint, Microsoft Active Directory, Microsoft Outlook, and MySQL
  • Creating of backup copies and transfer of them to a DR site

Let’s have a more detailed look at the Vembu BDR Suite. This is a picture of the overall architecture.

Vembu Technologies/ Vembu BDR Suite architecture/ Copyright by Vembu Technologies

VMBackup

VMBackup provides fast, efficient and agentless backup for VMs hosted on VMware ESXi and on Microsoft Hyper-V. It also provides the capability to replicate virtual machines from one ESXi host to another ESXi (VMreplication). You might guess it – This feature is only available for VMware ESXi. In case of Microsoft Hyper-V, you have to use the built-in Hyper-V replication. The failover and failback of replicated VMs is managed by the BDR Backup Server. VMBackup offers instant VM recovery, recovery of single files and folder from image-level backups, and recovery of application items from Microsoft Exchange, Microsoft SQL Server, Microsoft SharePoint, and Microsoft Active Directory. The functionality is similar to what you know from other products, like Veeam Backup & Replication, or MicroFocus Data Protector. VMBackup is licensed per socket, and it is available in a Standard (~ 150 $ per socket) and an Enterprise (~ 250 $ per socket) edition.

ImageBackup

ImageBackup addresses something, that might be extinct for some of us: Physical servers, like physical database or file servers. It can take image backups of Windows servers and workstations. This allows customers to restore the entire server or workstation from scratch to the same, or to new hardware. ImageBackup utilizes the Volume Shadow Copy Service (VSS) to create a consistent backup of a physical machine. Customers can restore a backup to the bare-metal, restore single files and folders, as well as application items from Microsoft Exchange, Microsoft SQL Server, Microsoft SharePoint, and Microsoft Active Directory. If necessary, the can be restored to a supported hypervisor. With other words: P2V migration. ImageBackup is licensed per host, or per application server if you wish to take backups of applications like Microsoft Exchange or SQL server. ImageBackup for servers costs ~ 150 $, and it is free for workstations.

NetworkBackup

NetworkBackup addresses the backup of files, folders and application data from Windows, Mac and Linux clients. It is designed to protect business data across file servers, application servers, workstations and other endpoints. It does not take an image backup, but full and incremental backups. The feature set and use case of NetworkBackup is similar to “traditional” backup software like MicroFocus Data Protector or ARCServe. NetworkBackup offers intelligent scheduling policies, bandwidth management and flexible retention polices. Clients are not always onsite, to address this, NetworkBackup can store its data in the Vembu Cloud (Vembu Cloud Services). NetworkBackup is licensed per file server (~ 60 $ per server), application server (~ 150 $), or workstation (free).

OffsiteDR

OffsiteDR creates and transfers backup copies to a DR site. Data is immediately transferred when it arrives at the backup server. The Data is encrypted in-flight using industry-standard AES 256 encryption. WAN optimization is included, which means that data is compressed, encrypted and deduplicated before being replicated to the OffsiteDR server. The recovery of VMs and files can directly be done from the OffsiteDR server. So there is no need to setup a new backup server in case of a disaster recovery. OffsiteDR covers different recovery screnarios, like instantly recover machines directly from the Vembu OffsiteDR server, bare-metal restore using the Vembu Recovery CD, or restore the virtual machine as on a VMware ESXi or Microsoft Hyper-V server directly from the Vembu OffsiteDR server. OffsiteDR is an add-on to VMBackup, and it is licensed per CPU socket (~ 90 $).

Universal Explorer

The Universal Explorer is used to restore items from various Microsoft applications, like Microsoft Exchange, SQL Server, SharePoint, or Active Directory. An item can be an email, a mailbox, complete databases, user or group objects etc. These items are sourced from image-level backups of physical and virtual machines. You might see some similarities to Veeam Explorer. Both products are comparable.

Recovery CD

The Vembu Recovery CD can be used to recover physical or virtual maschines. Drivers for the target platform will be injected during the restore. This is pretty handy in case of P2P and V2P migrations.

Licensing & Editions

Vembu offers a subscription and a perpetual license model. The subscription model can be purchased on a monthly or yearly basis, such as 1, 2, 3 or 5 years. It includes 24/ 7 standard technical support, updates and upgrades throughout the licensed period. The perpetual licensing model allows you to purchase and use the Vembu BDR suite by paying a single fee. This includes free maintenance and support for the first year.

Visit the pricing page for more detailed information. A Vembu BDR Suite edition comparison is also available.

Final thoughts

With 60.000 customers and 4000 partners, Vembu is not a newcomer in the backup business. The product portfolio is quite comprehensive. The Vembu BDR Suite offers industry standard features to a very sweet price. I can’t see any feature, that a SMB customer might require, which is not available. In sum, the Vembu BDR suite seems to me to be a very good alternative to the top dogs in the backup business, especially if we are talkin about SMB customers.

Backup from a secondary HPE 3PAR StoreServ array with Veeam Backup & Replication

When taking a backup with Veeam Backup & Replication, a VM snapshot is created to get a consistent state of the VM. The snapshot is taken prior the backup, and it is removed after the successful backup of the VM. The snapshot grows during its lifetime, and you should keep in mind, that you need some free space in the datastore for snapshots. This can be a problem, especially in case of multiple VM backups at a time, and if the VMs share the same datastore.

Benefit of storage snapshots

If your underlying storage supports the creation of storage snapshots, Veeam offers an additional way to create a consistent state of the VMs. In this case, a storage snapshot is taken, which is presented to the backup proxy, and is then used to backup the data. As you can see: No VM snapshot is taken.

Now one more thing: If you have a replication or synchronous mirror between two storage systems, Veeam can do this operation on the secondary array. This is pretty cool, because it takes load from you primary storage!

Backup from a secondary HPE 3PAR StoreServ array

Last week I was able to try something new: Backup from a secondary HPE 3PAR StoreServ array. A customer has two HPE 3PAR StoreServ 8200 in a Peer Persistence setup, a HPE StoreOnce, and a physical Veeam backup server, which also acts as Veeam proxy. Everything is attached to a pretty nice 16 Gb dual Fabric SAN. The customer uses Veeam Backup & Replication 9.5 U3a. The data was taken from the secondary 3PAR StoreServ and it was pushed via FC into a Catalyst Store on a StoreOnce. Using the Catalyst API allows my customer to use Synthetic Full backups, because the creation is offloaded to StoreOnce. This setup is dramatically faster and better than the prior solution based on MicroFocus Data Protector. Okay, this last backup solution was designed to another time with other priorities and requirements. it was a perfect fit at the time it was designed.

This blog post from Veeam pointed me to this new feature: Backup from a secondary HPE 3PAR StoreServ array. Until I found this post, it was planned to use “traditional” storage snapshots, taken from the primary 3PAR StoreServ.

With this feature enabled, Veeam takes the snapshot on the 3PAR StoreServ, that is hosting the synchronous mirrored virtual volume. This graphic was created by Veeam and shows the backup workflow.

Veeam/ Backup from secondary array/ Copyright by Veeam

My tests showed, that it’s blazing fast, pretty easy to setup, and it takes unnecessary load from the primary storage.

In essence, there are only three steps to do:

  • add both 3PARs to Veeam
  • add the registry value and restart the Veeam Backup Server Service
  • enable the usage of storage snapshots in the backup job

How to enable this feature?

To enable this feature, you have to add a single registry value on the Veeam backup server, and afterwards restart the Veeam Backup Server service.

  • Location: HKEY_LOCAL_MACHINE\SOFTWARE\Veeam\Veeam Backup and Replication\
  • Name: Hp3PARPeerPersistentUseSecondary
  • Type: REG_DWORD (0 False, 1 True)
  • Default value: 0 (disabled)

Thanks to Pierre-Francois from Veeam for sharing his knowledge with the community. Read his blog post Backup from a secondary HPE 3PAR StoreServ array for additional information.

CloudFlare API v4 and Fail2ban: Fixing the unban action

In January 2017, I wrote an article about how to protect your WordPress blog using the WP Fail2Ban plugin, fail2ban on your Linux/ FreeBSD host, and CloudFlare. Back then, the fail2ban was using the CloudFlare API V1, which was already deprecated since November 2016.

Free-Photos/ pixabay.com/ Creative Commons CC0

Although the actions were updated later to use CloudFlare API V4, I still had problems with the unbaning of IP addresses. IP addresses were banned, but the unban action failed. 

This is the unban action, which is included in fail2ban (taken from fail2ban-0.10.3.1 which is shipped with FreeBSD 11.1-RELEASE-p10):

And this is the unban action, which finally solved this issue:

I found the solution at serverfault.com. The only difference is an additional tr -d '\n'  in the last line of the statement. Kudos to Jake for fixing this!

To prevent the action file to being overwritten, you should copy the original cloudflare.conf  located in the  action.d  directory, e.g. to mycloudflare.conf , and use the copied action file in your fail definition.

Windows Network Policy Server (NPS) server won’t log failed login attempts

This is just a short, but interesting blog post. When you have to troubleshoot authentication failures in a network that uses Windows Network Policy Server (NPS), the Windows event log is absolutely indispensable. The event log offers everything you need. The success and failure event log entries include all necessary information to get you back on track. If failure events would be logged…

geralt/ pixabay.com/ Creative Commons CC0

Today, I was playing with Alcatel-Lucent Enterprise OmniSwitches and Access Guardian in my lab. Access Guardian refers to the some OmniSwitch security functions that work together to provide a dynamic, proactive network security solution:

  • Universal Network Profile (UNP)
  • Authentication, Authorization, and Accounting (AAA)
  • Bring Your Own Device (BYOD)
  • Captive Portal
  • Quarantine Manager and Remediation (QMR)

I have planned to publish some blog posts about Access Guardian in the future, because it is a pretty interesting topic. So stay tuned. :)

802.1x was no big deal, mac-based authentication failed. Okay, let’s take a look into the event log of the NPS… okay, there are the success events for my 802.1x authentication… but where are the failed login attempts? Not a single one was logged. A short Google search showed me the right direction.

Failed logon/ logoff events were not logged

In this case, the NPS role was installed on a Windows Server 2016 domain controller. And it was a german installation, so the output of the commands is also in german. If you have an OS installed in english, you must replace “Netzwerkrichtlinienserver” with “Network Policy Server”.

Right-click the PowerShell Icon and open it as Administrator. Check the current settings:

As you can see, only successful logon and logoff events were logged.

The option /success:enable /failure:enable activeates the logging of successful and failed logon and logoff attempts.

Replace SSL certificates on Citrix NetScaler using the CLI

Sometimes you have to replace SSL certificates instead of updating them, e.g. if you switch from a web server SSL certificate to a wildcard certificate. The latter was my job today. In my case, the SSL certificate was used in a Microsoft Exchange 2016 deployment, and the NetScaler configuration was using multiple virtual servers. I’m using this little script for my NetScaler/ Exchange deployments.

skylarvision/ pixabay.com/ Creative Commons CC0

When using multiple virtual servers, replacing a SSL certificate using the GUI can be challenging, because you have to navigate multiple sites, click here, click there etc. Using the CLI, the same task is much easier und faster. I like the Lean mindset, so I’m trying to avoid “waste”, in this case, “waste of time”.

Update or replace?

There is a difference between updating or replacing of certificates. When using the same CSR and key as for the expired certificate, you can update the certificate. If you use a new certificate/ key pair, you have to replace it. Replacing a certificate  includes the unbinding of the old, and binding the new certificate.

Replacing a certificate

The new certificate usually comes as a PFX (PKCS#12) file. After importing it, you have to install (create) a new certificate/ key pair.

Do yourself a favor and add the expiration date to the name of the certificate/ key pair.

Now you can unbind the old, and bind the new certificate. Please note, that this causes a short outage of your service!

SSL Cert Unbind Causing NetScaler Crash

You should check what NetScaler software release you are running. There is a bug, which is fixed in 12.0 build 57.X, which causes the NetScaler appliance to crash if a SSL certificate is unbound and a SSL transaction is running. Check CTX230965 for more details.

Bypass stateful firewall on a Sophos XG

Usually, bypassing a firewall is not the best idea. But sometimes you have to. One case, where you want to bypass a firewall, is asymmetric routing.

MichaelGaida/ pixabay.com/ Creative Commons CC0

What is asymmetric routing? Imagine a scenario with two routers on the same network. One router offeres access to the internet, the other router provides access to other sites with site-2-site VPN tunnels.

Asymmetric Routing

Host 1 uses R1 as default gateway. R1 has static routes configured to the networks reachable over the VPN, or it has learned them dynamically using a routing protocol from R2. A packet from host 1 arrives at R1, is routed to R2, and is sent over the VPN tunnel. The answer to this packet arrives at R2, and is sent directly to host 1, because host 1 is the destination. This works because R2 and host 1 are on the same network. This is asymmetric routing, because request and answer go different ways.

In case of routing, this is not a problem. But if R1 is a firewall, this firewall might be stubborn, because it does not see the whole traffic.

Bypass the stateful firewall

I recently had such a setup due to some technical debts. The firewall dropped that “Invalid Traffic”. Fortunately, there is a way to bypass the statefull firewall. You can create advanced firewall rules using the CLI. There is no way to create these rules using the GUI. And this only applies to the Sophos XG (former Cyberoam products).

Login to the device console and select option 4. Then enter on the console the following commands, one per destination:

Make sure that you have a static or dynamically learned route to the networks. This is not a routing entry, it only tells the firewall what traffic should bypass the stateful firewall.

Veeam backups fails because of time differences

Last week I had an interesting incident at a customer. The customer reported that one of multiple Veeam backup jobs jobs constantly failed.

jarmoluk/ pixabay.com/ Creative Commons CC0

The backup job included two VMs, and the backup of one of these VMs failed with this error:

The verified the used credentials for that job, but re-entering the password does not solved the issue. I then checked the Veeam backup logs located under %ProgramData%\Veeam\Backup (look for the Agent.Job_Name.Source.VM_Name.vmdk.log) and found VDDK Error 3014:

The user, that was used to connect to the vCenter, was an Active Directory located account. The account were granted administrator privileges root of the vCenter. Switching from an AD located account to Administrator@vsphere.local solved the issue. Next stop: vmware-sts-idmd.log on the vCenter Server appliance. The error found in this log confirmed my theory, that there was an issue with the authentication itself, not an issue with the AD located account.

To make a long story short: Time differences. The vCenter, the ESXi hosts and some servers had the wrong time. vCenter and ESXi hosts were using the Domain Controllers as time source.

This is the ntpq  output of the vCenter. You might notice the jitter values on the right side, both noted in milliseconds.

After some investigation, the root cause seemed to be a bad DCF77 receiver, which was connected to the domain controller that was hosting the PDC Emulator role. The DCF77 receiver was connected using an USB-2-LAN converter. Instead of using a DCF77 receiver, the customer and I implemented a NTP hierarchy using a valid NTP source on the internet (pool.ntp.org).

DOT1X authentication failed on HPE OfficeConnect 1920 switches

The last two days, I have supported a customer during the implementation of 802.1x. His network consisted of HPE/ Aruba and some HPE Comware switches. Two RADIUS server with appropriate policies was already in place. The configuration and test with the ProVision based switches was pretty simple. The Comware based switches, in this case OfficeConnect 1920, made me more headache.

blickpixel/ pixabay.com/ Creative Commons CC0

The customer had already mac authentication running, so all I had to do, was to enable 802.1x on the desired ports of the OfficeConnect 1920. The laptop, which I used to test the connection, was already configured and worked flawless if I plugged it into a 802.1x enabled port on a ProVision based switch. The OfficeConnect 1920 simply wrote a failure to its log and the authentication failed. The RADIUS server does not logged any failure, so I was quite sure, that the switch caused the problem.

After double-checking all settings using the web interface of the switch, I used the CLI to check some more settings. Unfortunately, the OfficeConnect 1920 is a smart-managed switch and provides only a very, very limited CLI. Fortunately, there is a developer access, enabling the full Comware CLI. You can enable the full CLI by entering

after logging into the limited CLI. You can find the password using your favorite internet search engine. ;)

Solution

While poking around in the CLI, I stumbled over this option, which is entered in the interface context:

RADIUS is the authentication domain, which was used on this switch. The command specifies, that the authentication domain RADIUS has to be for 802.1x authentication requests. Otherwise the switch would use the default authentication domain SYSTEM, which causes, that the switch tries to authenticate the user against the local user database.

I have not found any way to specify this setting using the web GUI! If you know how, of if you can provide additional information about this “issue”, please leave a comment.

HPE Networking expert level certifications

A couple of days ago, I took the HP0-Y47 exam “Deploying HP FlexNetwork Core Technologies”. It was one of two required exams to achive the HPE ASE – Data Center Network Integrator V1, and the HP ASE – FlexNetwork Integrator V1 certification. It was a long planned upgrade to my HP ATP certification, and it is a necessary certification for the HPE partner status of my employer.

You might find it confusing that I’m talking about an HP ASE and a HPE ASE. That is not a typo. The HP ASE was released prior the HP/ HPE split. The HPE ASE was released after the split in HP and HPE.

The HP/ HPE ATP is a professional level certification, comparable to the Cisco Certified Network Associate (CCNA). The HP/ HPE ASE is an expert level certification, so the typical candidate for a HP/ HPE ASE certification is a professional with three to five years experience in designing and architecting complex enterprise-level networks.

Requirements

There are different ways to achieve this certification. Regardless of the way you chose, you need a certification from which you can upgrade. This does not have to be a HP/ HPE certification! If you hold a valid CCNA/ CCNP or JNCIP-ENT, you can upgrade from this certification without the need of a valid HP/ HPE ATP Networking certification.

If you want to earn the HPE ASE – Data Center Network Integrator V1, and the HP ASE – FlexNetwork Integrator V1 certification in a single step, you need at least one of these certifications:

  • HP ATP – FlexNetwork Solutions V3
  • HPE ATP – Data Center Solutions V1

Or if you want to upgrade from a non-HP/ HPE certification:

  • Cisco – CCNP (any CCNP regardless of technology)
  • Cisco – Certified Design Professional (CCDP)
  • Juniper – JNCIP-ENT

Now you need to pass two exams:

HP2-Z34 (Building HP FlexFabric Data Centers)

The HP2-Z34 exam focuses on deployment and implementation of HPE FlexFabric Data Center solutions. Therefore, the exams covers topics like

  • Multitenant Device Context (MDC)
  • Datacenter Bridging (DCB)
  • Multiprotocol Label Switching (MPLS)
  • Fibre Channel over Ethernet (FCoE)
  • Ethernet Virtual Interconnect (EVI),
  • Multi-Customer Edge (MCE),
  • Transparent Interconnection of Lots of Links (TRILL), and
  • Shortest Path Bridging Mac-in-Mac mode (SPBM).

HPE offers a study guide to prepare for this exam: Building HP FlexFabric Data Centers (HP2-Z34 and HP0-Y51). I used this guide to prepare for the exam (eBook). The guide was of an average quality. Its sufficient to prepare for the exam, but I used other materials to get a better understanding of some topics.

HP2 exams are web-based exams. To pass the HP2-Z34 exam, I had to answer 60 questions in 105 minutes, with a passing score of 70%. The exam was quite demanding, especially if you don’t have much real-world experience with some of the covered topics.

HP0-Y47 (Deploying HP FlexNetwork Core Technologies)

The HP0-Y47 exam covers the configuration, implementation, and the troubleshoot enterprise level HPE FlexNetwork solutions. The exam covers different topics, e.g.

  • Quality of Service (QoS)
  • redundancy (VRRP, Stacking)
  • multicast routing (IGMP, PIM)
  • dynamic routing (OSPF, BGP)
  • ACLs, and
  • port authentication/ port security (Mac-auth, Web-auth, 802.1x)

I used the HP ASE FlexNetwork Solutions Integrator (HP0-Y47) study guide to prepre myself for the exam. Unfortunately, it had the same average quality as the HP2 Z34 guide: Good enough to pass the exam, but don’t expect to much.

HP0-Y47 is a proctored exam. I had to answer 55 questions in 150 minutes, with a passing score of 65%. The exam is not very hard, if you were familiar with the covered topics. Experience with ProVision and Comware is absolutely necessary, because both platforms have their peculiarities, e.g. processing of ACLs, differences in Stacking technologies, commands, STP support etc.

It took me some time to prepare for both exams, despite the fact that I work with ProVision and Comware Switches every day. So I’m pretty happy that I passed both exams on the first try.

vSphere Distributed Switch health check fails on HPE Comware switches

During the replacement of some VMware ESXi hosts at a customer, I discovered a recurrent failure of the vSphere Distributed Switch health checks. A VLAN and MTU mismatch was reported. On the physical side, the ESXi hosts were connected to two HPE 5820 switches, that were configured as an IRF stack. Inside the VMware bubble, the hosts were sharing a vSphere Distributed Switch.

cre8tive / pixelio.de

The switch ports of the old ESXi hosts were configured as Hybrid ports. The switch ports of the new hosts were configured as Trunk ports, to streamline the switch and port configuration.

Some words about port types

Comware knows three different port types:

  • Access
  • Hybrid
  • Trunk

If you were familiar with Cisco, you will know Access and Trunk ports. If you were familiar with HPE ProCurve or Alcatel-Lucent Enterprise, these two port types refer to untagged and tagged ports.

So what is a Hybrid port? A Hybrid port can belong to multiple VLANs where they can be untagged and tagged. Yes, multiple untagged VLANs on a port are possible, but the switch will need additional information to bridge the traffic into correct untagged VLANs. This additional information can be  MAC addresses, IP addresses, LLDP-MED etc. Typically, hybrid ports are used for in VoIP deployments.

The benefit of a Hybrid port is, that I can put the native VLAN of a specific port, which is often referred as Port VLAN identifier (PVID), as a tagged VLAN on that port. This configuration allows, that all dvPortGroups have a VLAN tag assigned, even if the VLAN tag represents the native VLAN of a switch port.

Failing health checks

A failed health check rises a vCenter alarm. In my case, a VLAN and MTU alarm was reported. In both cases, VLAN 1 was causing the error. According to VMware, the three main causes for failed health checks are:

  • Mismatched VLAN trunks between a vSphere distributed switch and physical switch
  • Mismatched MTU settings between physical network adapters, distributed switches, and physical switch ports
  • Mismatched virtual switch teaming policies for the physical switch port-channel settings.

Let’s take a look at the port configuration on the Comware switch:

As you can see, this is a normal trunk port. All VLANs will be passed to the host. This is an except from the  display interface Ten-GigabitEthernet1/0/9  output:

The native VLAN is 1, this is the default configuration. Traffic, that is received and sent from a trunk port, is always tagged with a VLAN id of the originating VLAN – except traffic from the default (native) VLAN! This traffic is sent without a VLAN tag, and if frames were received with a VLAN tag, this frames will be dropped!

If you have a dvPortGroup for the default (native) VLAN, and this dvPortGroup is sending tagged frames, the frames will be dropped if you use a “standard” trunk port. And this is why the health check fails!

Ways to resolve this issue

In my case, the dvPortGroup was configured for VLAN 1, which is the default (native) VLAN on the switch ports.

There are two ways to solve this issue:

  • Remove the VLAN tag from the dvPortGroup configuration
  • Change the PVID for the trunk port

To change the PVID for a trunk port, you have to enter the following command in the interface context:

You have to change the PVID on all ESXi facing switch ports. You can use a non-existing VLAN ID for this.

vSphere Distributed Switch health check will switch to green for VLAN and MTU immediately.

Please note, that this is not the solution for all VLAN-related problems. You should make sure that you are not getting any side effects.