Category Archives: Networking

High CPU usage on Citrix ADC VPX

While building a small Citrix NetScaler… ehm… ADC VPX (I really hate this name…) lab environment, I noticed that the fan of my Lenovo T480s was spinning up. I was wondering why, because the VPX VM was just running for a couple of minutes – without any load. But the task manager told me, that the VMware Workstation Process was consuming 25% (I have a Intel i5 Quad Core CPU) CPU. So VMware Workstation was just eating a whole CPU core without doing anything. I would not care, but the fan… And it reminded me, that I’ve seen an similar behaviour in various VPX deployments on VMWare ESXi.

Fifaliana/ pixabay.com/ Creative Commons CC0

A quick search lead me to this Citrix Support Knowledge Center article: High CPU Usage on NetScaler VPX Reported on VMware ESXi Version 6.0. That’s exactly what I’ve observed.

The solution is setting the parameter  cpuyield  to yes.

The VPX does not need a reboot. Short after setting the parameter, the fan stopped spinning. Have I mentioned how I love silence on my desk? I’m pretty happy that my T480s is a really quiet laptop.

But what does this parameter is used for? In pretty simple words: To allocate CPU cycles, that are not used by other VMs. Until ADC VPX 11.1, the VPX was sharing CPU with other VMs. This changed with ADC VPX 12.0. Since this release, the VPX was like a child, that was playing with their favorite toy just to make sure, that no other child can play with it. Not very polite…

This is a quote from the Support Knowledge Center article:

Set ns vpxparam parameters:
-cpuyield: Release or do not release of allocated but unused CPU resources.

YES: Allow allocated but unused CPU resources to be used by another VM.

NO: Reserve all CPU resources for the VM to which they have been allocated. This option shows higher percentage in hypervisor for VPX CPU usage.
DEFAULT: NO

I don’t think that I would change this in production. But for lab environments, especially if you run this on VMware Workstation, I would set  -cpuyield  to yes .

EAPoL forwarding on NEC VoIP phones

A customer is running their PCs behind their VoIP phones. Nothing unusual, most VoIP phones I know have an embedded ethernet switch, so that you only need one cable to connect PC and VoIP phone to your network.

Martinelle/ pixabay.com/ Creative Commons CC0

As part of a network security project, my colleague and I implemented IEEE 802.1X port-based Network access control at one of our customers networks. The setup consists of multiple Alcatel-Lucent Enterprise OmniSwitches (6450-P10 and 6860/E) and Aruba ClearPass.

We noticed, that mac-address based authentication worked all the time, but 802.1x fails constantly if the client was connected to a VoIP phone (NEC DT700). The phones does not do any port authentication. We use a device classification rule and User Network Profiles to get them to their correct VLAN. But the connected PCs should do a 802.1x based port authentication.

Wireshark FTW!

We used Wireshark to take a look at the communication. We created a packet trace on a client behind a VoIP phone, and we mirrored the traffic of the port, to which the phone was connected. Our assumption was that the VoIP phones drop the EAP packets from the connected PC.

This is a packet trace from my ThinkPad X250 which was connected to the phone.

Patrick Terlisten/ vcloudnine.de/ Creative Commons CC0

You can see the repeating “Request, Identity” from the switch, and the answer from my laptop (Response, Identity). The destination for the response is a multicast mac-address. But this frame was not captured behind the VoIP phone! It was missing. On the packet trace, that was created my mirroring the switch port to which the phone was connected, the “Request, Identity” was seen, but not the “Response, Identity”. The phone was dropping the EAP packets of my laptop!

RTFM!

The customer called the company who was maintaining the phones. But they did not understood our problem, so they enabled 802.1x on the phones. We disabled this instantly again.

I decided to take a look into the manual of the NEC DT700 and I found a point called “EAPoL forwarding” in the advanced network settings. After enabling this setting, EAP started working instantly.

Patrick Terlisten/ vcloudnine.de/ Creative Commons CC0

This is again a packet trace from my laptop, taken while it was connected to a VoIP phone. As you can see, the last EAP packet is “Success”!

EAPoL forwarding did the trick. :)

Replace SSL certificates on Citrix NetScaler using the CLI

Sometimes you have to replace SSL certificates instead of updating them, e.g. if you switch from a web server SSL certificate to a wildcard certificate. The latter was my job today. In my case, the SSL certificate was used in a Microsoft Exchange 2016 deployment, and the NetScaler configuration was using multiple virtual servers. I’m using this little script for my NetScaler/ Exchange deployments.

skylarvision/ pixabay.com/ Creative Commons CC0

When using multiple virtual servers, replacing a SSL certificate using the GUI can be challenging, because you have to navigate multiple sites, click here, click there etc. Using the CLI, the same task is much easier und faster. I like the Lean mindset, so I’m trying to avoid “waste”, in this case, “waste of time”.

Update or replace?

There is a difference between updating or replacing of certificates. When using the same CSR and key as for the expired certificate, you can update the certificate. If you use a new certificate/ key pair, you have to replace it. Replacing a certificate  includes the unbinding of the old, and binding the new certificate.

Replacing a certificate

The new certificate usually comes as a PFX (PKCS#12) file. After importing it, you have to install (create) a new certificate/ key pair.

Do yourself a favor and add the expiration date to the name of the certificate/ key pair.

Now you can unbind the old, and bind the new certificate. Please note, that this causes a short outage of your service!

SSL Cert Unbind Causing NetScaler Crash

You should check what NetScaler software release you are running. There is a bug, which is fixed in 12.0 build 57.X, which causes the NetScaler appliance to crash if a SSL certificate is unbound and a SSL transaction is running. Check CTX230965 for more details.

Bypass stateful firewall on a Sophos XG

Usually, bypassing a firewall is not the best idea. But sometimes you have to. One case, where you want to bypass a firewall, is asymmetric routing.

MichaelGaida/ pixabay.com/ Creative Commons CC0

What is asymmetric routing? Imagine a scenario with two routers on the same network. One router offeres access to the internet, the other router provides access to other sites with site-2-site VPN tunnels.

Asymmetric Routing

Patrick Terlisten/ www.vcloudnine.de/ Creative Commons CC0

Host 1 uses R1 as default gateway. R1 has static routes configured to the networks reachable over the VPN, or it has learned them dynamically using a routing protocol from R2. A packet from host 1 arrives at R1, is routed to R2, and is sent over the VPN tunnel. The answer to this packet arrives at R2, and is sent directly to host 1, because host 1 is the destination. This works because R2 and host 1 are on the same network. This is asymmetric routing, because request and answer go different ways.

In case of routing, this is not a problem. But if R1 is a firewall, this firewall might be stubborn, because it does not see the whole traffic.

Bypass the stateful firewall

I recently had such a setup due to some technical debts. The firewall dropped that “Invalid Traffic”. Fortunately, there is a way to bypass the statefull firewall. You can create advanced firewall rules using the CLI. There is no way to create these rules using the GUI. And this only applies to the Sophos XG (former Cyberoam products).

Login to the device console and select option 4. Then enter on the console the following commands, one per destination:

Make sure that you have a static or dynamically learned route to the networks. This is not a routing entry, it only tells the firewall what traffic should bypass the stateful firewall.

DOT1X authentication failed on HPE OfficeConnect 1920 switches

The last two days, I have supported a customer during the implementation of 802.1x. His network consisted of HPE/ Aruba and some HPE Comware switches. Two RADIUS server with appropriate policies was already in place. The configuration and test with the ProVision based switches was pretty simple. The Comware based switches, in this case OfficeConnect 1920, made me more headache.

blickpixel/ pixabay.com/ Creative Commons CC0

The customer had already mac authentication running, so all I had to do, was to enable 802.1x on the desired ports of the OfficeConnect 1920. The laptop, which I used to test the connection, was already configured and worked flawless if I plugged it into a 802.1x enabled port on a ProVision based switch. The OfficeConnect 1920 simply wrote a failure to its log and the authentication failed. The RADIUS server does not logged any failure, so I was quite sure, that the switch caused the problem.

After double-checking all settings using the web interface of the switch, I used the CLI to check some more settings. Unfortunately, the OfficeConnect 1920 is a smart-managed switch and provides only a very, very limited CLI. Fortunately, there is a developer access, enabling the full Comware CLI. You can enable the full CLI by entering

after logging into the limited CLI. You can find the password using your favorite internet search engine. ;)

Solution

While poking around in the CLI, I stumbled over this option, which is entered in the interface context:

RADIUS is the authentication domain, which was used on this switch. The command specifies, that the authentication domain RADIUS has to be for 802.1x authentication requests. Otherwise the switch would use the default authentication domain SYSTEM, which causes, that the switch tries to authenticate the user against the local user database.

I have not found any way to specify this setting using the web GUI! If you know how, of if you can provide additional information about this “issue”, please leave a comment.

HPE Networking expert level certifications

A couple of days ago, I took the HP0-Y47 exam “Deploying HP FlexNetwork Core Technologies”. It was one of two required exams to achive the HPE ASE – Data Center Network Integrator V1, and the HP ASE – FlexNetwork Integrator V1 certification. It was a long planned upgrade to my HP ATP certification, and it is a necessary certification for the HPE partner status of my employer.

You might find it confusing that I’m talking about an HP ASE and a HPE ASE. That is not a typo. The HP ASE was released prior the HP/ HPE split. The HPE ASE was released after the split in HP and HPE.

The HP/ HPE ATP is a professional level certification, comparable to the Cisco Certified Network Associate (CCNA). The HP/ HPE ASE is an expert level certification, so the typical candidate for a HP/ HPE ASE certification is a professional with three to five years experience in designing and architecting complex enterprise-level networks.

Requirements

There are different ways to achieve this certification. Regardless of the way you chose, you need a certification from which you can upgrade. This does not have to be a HP/ HPE certification! If you hold a valid CCNA/ CCNP or JNCIP-ENT, you can upgrade from this certification without the need of a valid HP/ HPE ATP Networking certification.

If you want to earn the HPE ASE – Data Center Network Integrator V1, and the HP ASE – FlexNetwork Integrator V1 certification in a single step, you need at least one of these certifications:

  • HP ATP – FlexNetwork Solutions V3
  • HPE ATP – Data Center Solutions V1

Or if you want to upgrade from a non-HP/ HPE certification:

  • Cisco – CCNP (any CCNP regardless of technology)
  • Cisco – Certified Design Professional (CCDP)
  • Juniper – JNCIP-ENT

Now you need to pass two exams:

HP2-Z34 (Building HP FlexFabric Data Centers)

The HP2-Z34 exam focuses on deployment and implementation of HPE FlexFabric Data Center solutions. Therefore, the exams covers topics like

  • Multitenant Device Context (MDC)
  • Datacenter Bridging (DCB)
  • Multiprotocol Label Switching (MPLS)
  • Fibre Channel over Ethernet (FCoE)
  • Ethernet Virtual Interconnect (EVI),
  • Multi-Customer Edge (MCE),
  • Transparent Interconnection of Lots of Links (TRILL), and
  • Shortest Path Bridging Mac-in-Mac mode (SPBM).

HPE offers a study guide to prepare for this exam: Building HP FlexFabric Data Centers (HP2-Z34 and HP0-Y51). I used this guide to prepare for the exam (eBook). The guide was of an average quality. Its sufficient to prepare for the exam, but I used other materials to get a better understanding of some topics.

HP2 exams are web-based exams. To pass the HP2-Z34 exam, I had to answer 60 questions in 105 minutes, with a passing score of 70%. The exam was quite demanding, especially if you don’t have much real-world experience with some of the covered topics.

HP0-Y47 (Deploying HP FlexNetwork Core Technologies)

The HP0-Y47 exam covers the configuration, implementation, and the troubleshoot enterprise level HPE FlexNetwork solutions. The exam covers different topics, e.g.

  • Quality of Service (QoS)
  • redundancy (VRRP, Stacking)
  • multicast routing (IGMP, PIM)
  • dynamic routing (OSPF, BGP)
  • ACLs, and
  • port authentication/ port security (Mac-auth, Web-auth, 802.1x)

I used the HP ASE FlexNetwork Solutions Integrator (HP0-Y47) study guide to prepre myself for the exam. Unfortunately, it had the same average quality as the HP2 Z34 guide: Good enough to pass the exam, but don’t expect to much.

HP0-Y47 is a proctored exam. I had to answer 55 questions in 150 minutes, with a passing score of 65%. The exam is not very hard, if you were familiar with the covered topics. Experience with ProVision and Comware is absolutely necessary, because both platforms have their peculiarities, e.g. processing of ACLs, differences in Stacking technologies, commands, STP support etc.

It took me some time to prepare for both exams, despite the fact that I work with ProVision and Comware Switches every day. So I’m pretty happy that I passed both exams on the first try.

vSphere Distributed Switch health check fails on HPE Comware switches

During the replacement of some VMware ESXi hosts at a customer, I discovered a recurrent failure of the vSphere Distributed Switch health checks. A VLAN and MTU mismatch was reported. On the physical side, the ESXi hosts were connected to two HPE 5820 switches, that were configured as an IRF stack. Inside the VMware bubble, the hosts were sharing a vSphere Distributed Switch.

cre8tive / pixelio.de

The switch ports of the old ESXi hosts were configured as Hybrid ports. The switch ports of the new hosts were configured as Trunk ports, to streamline the switch and port configuration.

Some words about port types

Comware knows three different port types:

  • Access
  • Hybrid
  • Trunk

If you were familiar with Cisco, you will know Access and Trunk ports. If you were familiar with HPE ProCurve or Alcatel-Lucent Enterprise, these two port types refer to untagged and tagged ports.

So what is a Hybrid port? A Hybrid port can belong to multiple VLANs where they can be untagged and tagged. Yes, multiple untagged VLANs on a port are possible, but the switch will need additional information to bridge the traffic into correct untagged VLANs. This additional information can be  MAC addresses, IP addresses, LLDP-MED etc. Typically, hybrid ports are used for in VoIP deployments.

The benefit of a Hybrid port is, that I can put the native VLAN of a specific port, which is often referred as Port VLAN identifier (PVID), as a tagged VLAN on that port. This configuration allows, that all dvPortGroups have a VLAN tag assigned, even if the VLAN tag represents the native VLAN of a switch port.

Failing health checks

A failed health check rises a vCenter alarm. In my case, a VLAN and MTU alarm was reported. In both cases, VLAN 1 was causing the error. According to VMware, the three main causes for failed health checks are:

  • Mismatched VLAN trunks between a vSphere distributed switch and physical switch
  • Mismatched MTU settings between physical network adapters, distributed switches, and physical switch ports
  • Mismatched virtual switch teaming policies for the physical switch port-channel settings.

Let’s take a look at the port configuration on the Comware switch:

As you can see, this is a normal trunk port. All VLANs will be passed to the host. This is an except from the  display interface Ten-GigabitEthernet1/0/9  output:

The native VLAN is 1, this is the default configuration. Traffic, that is received and sent from a trunk port, is always tagged with a VLAN id of the originating VLAN – except traffic from the default (native) VLAN! This traffic is sent without a VLAN tag, and if frames were received with a VLAN tag, this frames will be dropped!

If you have a dvPortGroup for the default (native) VLAN, and this dvPortGroup is sending tagged frames, the frames will be dropped if you use a “standard” trunk port. And this is why the health check fails!

Ways to resolve this issue

In my case, the dvPortGroup was configured for VLAN 1, which is the default (native) VLAN on the switch ports.

There are two ways to solve this issue:

  • Remove the VLAN tag from the dvPortGroup configuration
  • Change the PVID for the trunk port

To change the PVID for a trunk port, you have to enter the following command in the interface context:

You have to change the PVID on all ESXi facing switch ports. You can use a non-existing VLAN ID for this.

vSphere Distributed Switch health check will switch to green for VLAN and MTU immediately.

Please note, that this is not the solution for all VLAN-related problems. You should make sure that you are not getting any side effects.

Demystifying “Interfaces on which heartbeats are not seen”

By accident, I found a heartbeat/ VLAN issue on a NetScaler cluster at one of my customers. The NetScaler ADC appliances have three interfaces connected to a switch stack. Two of the three interfaces were configured as a channel (LAG). This is a snippet from the config:

On the switch stack, the port to which interface 1/3 is connected, is configured as an access port. The ports, to which the channel is connected, is configured as a trunk port with some permitted VLANs. The customer is using HPE Comware based switches. The terminology is the same for Cisco. If you use HPE ProVision or Alcatel Lucent Enterprise, translate “access” to “untagged” and “trunk” to “tagged”. Because the channel is configured as a trunk port on the switch, the tagall option was set.

Issue

While examining the output of  show ha node I saw this:

Because interface 1/3 was not affected, this had to be a VLAN issue. During the initial troubleshooting, I was able to discover heartbeat packets in VLAN 1 and in VLAN 10.

Solution

The solution was easy: Remove the tagged option for VLAN 10 on LA/1.

instead of

Because of the configured tagall  option, all packets sourced by LA/1 are tagged with the corrosponding VLAN ID. But because it’s now explicitly configured without a tag for VLAN 10, VLAN 10 is now also the native VLAN for LA/1.

Now the NetScaler was sending heartbeat packets with a tag for VLAN 10, and the issue was solved.

Explanation

Heartbeat packets are always send without a VLAN tag (untagged). There are two exceptions:

  • The NSVLAN is configured with a specific VLAN ID, or
  • an interface used for hearbeats is configured with the tagall

In this case, the heartbeat packets are tagged with the ID of the native VLAN ID of the interface. A show interface of the channel showed, that the channel was using VLAN 1 as the native VLAN.

How does the NetScaler determine the native VLAN for an interface? The native VLAN is the VLAN, to which an interface is bound untagged. An interface can only be bound untagged to a single VLAN. But it can be bound tagged to multiple VLANs.

If you take a look at the config snippet at the top of this blog post, you might notice, that interface 1/3 is bound untagged to VLAN 10. So this is the native VLAN for interface 1/3. But this interface is not using the tagall  option. Therefore, heartbeat packets are not tagged. The channel LA/1 is bound tagged to VLAN 10. But it was also bound to VLAN 1, without the tagged  option. This caused, that VLAN 1 was used as the native VLAN for channel LA/1. And because LA/1 is configured with the tagall  option, the heartbeats were tagged with a tag for VLAN 1. That’s why I was able to see the heartbeats, that were send over channel LA/1, in VLAN 1.

In the end, the NetScaler appliances were sending heartbearts from interface 1/3 to VLAN 10, and from channel LA/1 to VLAN 1. This caused the message “Interfaces on which heartbeats are not seen: LA/1”.

Citrix Certified Professional – Networking (CCP-N) exam experience

Last friday I passed the 1Y0-351 (Citrix NetScaler 10.5 Essentails and Networking) exam with a pretty good score. The exam was necessary, not only because I will do much more NetScaler projects in the future, but also because Citrix has made it mandatory to have a CCP-N in your company to to sell Citrix NetScaler.

Preparation

My employer booked me a 5-day course (CNS-220 Citrix NetScaler Essentials and Traffic Management). Very nice, although I already had experience with NetScaler deployments. This training was designed for NetScaler 12.0, not for 10.5.

A training might be recommended to prepare for an exam, but usually it is not sufficient to pass it. But I want to pass the exam in the first try, so I took a closer look into the Citrix NetScaler 10.5 Essentials and Networking Preparation Guide.

In addition to the student and lab material, I deployed three NetScaler VPX (10.5,11.1 and 12.0) in my lab. I really recommend this! Especially to learn the CLI and how to read the log files.

The exam

S. Hofschlaeger / pixelio.de

The exam 1Y0-351 is focused on NetScaler 10.5, and will be not available after January 19, 2018. The sucessor of this exam is 1Y0-340, which is based on NetScaler 12.0. It is available since October 20, 2017. You might have noticed that my course was designed for 12.0, but I took the 10.5 exam. Well, I could not identify a question that would have had to be answered differently for NetScaler 12.0. But I really recommend to take the exam matching your course.

You have to answer 72 questions in 120 minutes. I got 30 minutes extra, because I’m a non-native english speaker. I had to answer two survey before the exam. One of them was a self-assessment about my NetScaler skills.

The questions were pretty fair, no trick questions, or questions were multiple answers seemed to be correct. The exam met the exam objectives from the prep guide. And because I already wrote it: You really should work with the CLI, and you really should know the important logs.

In sum: A challenging, but pretty fair exam. No marketing, no factual knowledge from spec sheets etc. When you are quite familiar with NetScalers, there is a good chance to pass the exam in the first attempt.

Notes about 802.1x and MAC authentication

Open network ports in offices, waiting rooms and entrance halls make me curious. Sometimes I  want to plugin a network cable, just to see if I get an IP address. I know many companies that does not care about network access control. Anybody can plugin any device to the network. When talking with customers about network access control, or port security, I often hear their complains about complexity. It’s too complex to implement, to hard to administrate. But it is not sooo complex. In the easiest setup (with mac authentication), you need a switch, that can act as authenticator, and a authentication server. But IEEE 802.1x is not much more complicated.

A brief overview over IEEE 802.1x

IEEE 802.1X offers authentication and authorization in wired or wireless networks. The supplicant (client) requests access to the network by providing a username/ password, or a digital certificate to the authenticator (switch). The authenticator forwards the provided credentials to the authentication server (mostly RADIUS or DIAMETER). The authentication server verifies the credentials and decides, if the supplicant is allowed to access the network.

802.1x uses the Extensible Authentication Protocol (EAP RFC5247) for authentication. Because EAP is a framework, there are different implementations, like EAP Transport Layer Security (EAP-TLS), or EAP with pre-shared key (EAP-PSK). Because it is only a framework, each protocol, that uses EAP, has to encapsulate it. Typical encapsulations are EAP over LAN (that is what 802.1x uses), RADIUS/ DIAMETER can use also use EAP. Protected EAP (PEAP) encapsulates EAP traffic into a TLS tunnel. PEAP is typically used as a replacement for EAP in EAPOL, or with with RADIUS or DIAMETER.

IEEE 802.1x EAP

Wikipedia/ wikipedia.org/ Public domain image resources

So far nothing special. It’s more a security thing, but an important one, if you ask me. But many customers avoid 802.1x, because of complexity. It’s perfect to keep you out of your own network, if something fails. And not all devices can act as supplicant.

But there is another benefit of 802.1x: RADIUS-Access-Accept messages can be used to dynamically assign VLAN memberships (RADIUS Extensions, RFC6929). To assign a VLAN membership to a port, to which a supplicant is connected, the RADIUS server adds three attributes to the Access-Accept message:

  • Tunnel-Type (VLAN)
  • Tunnel-Medium-Type (802)
  • Tunnel-Private-Group-Id (VLAN ID)

The authenticator uses these attributes to dynamically assign a VLAN to the port, to which the supplicant is connected.

MAC authentication

How does MAC authentication fit into this? If a client does not support 802.1x, the authenticator can use the mac-address of the connected device as username and password. The RADIUS server can use these credentials to authenticate the connected device. If you use a windows-based NAP (Windows Server NPS role), you have to create a user object in your Active Directory or local user database, that uses the mac-address as username and password. Depending on the switch configuration, the format of the username differes (xx:xx:xx:xx:xx:xx or xxxxxx-xxxxxx etc.). It’s a security fail, right? Yes, it is. So please:

  • Use MAC authentication only when needed, and
  • make sure that your authenticator uses PEAP

PEAP uses a TLS tunnel to protect the CHAP messages.

Another important part is your authentication server, mostly a RADIUS or DIAMETER server. Make sure that it is highly available. You should have at least two authentication server. I would not load balance them through a load balancer (Citrix NetScaler etc.). Simply add two authentication servers to your switch configuration. If your authentication server uses a user database, like Microsoft Active Directory, make sure that this database is also highly available. As I said: It is perfect to keep you out of your own network.

Sample config for ArubaOS (HPE ProVision based switches)

Here’s a sample config for a Aruba 2920 switch, running ArubaOS WB.16.04. 802.1x and MAC authentication are configured for the ports 1 to 5. If the authentication failes, VLAN 999 will be assigned to the port. VLAN 999 is used as unauth VLAN, which is used for unauthenticated clients.

If 802.1x fails, the authenticator, will try MAC authentication. If this fails too, VLAN 999 is assigned to the switch port.

In this case, the client was authenticated by 802.1x.

This is the output for MAC authentication.

In both cases, VLAN 1 was dynamically assigned by RADIUS-Access-Accept messages.