Author Archives: Patrick Terlisten

About Patrick Terlisten

vcloudnine.de is the personal blog of Patrick Terlisten. Patrick has a strong focus on virtualization & cloud solutions, but also storage, networking, and IT infrastructure in general. He is a fan of Lean Management and agile methods, and practices continuous improvement whereever it is possible.Feel free to follow him on Twitter and/ or leave a comment.

Fan health sensors report false alarms on HPE Gen10 Servers with ESXi 6.7

I’ve got several mails and comments about this topic. It looks like that the latest ESXi 6.7 updates are causing some trouble on HPE ProLiant Gen10 servers.

I’ve blogged about recurring host hardware sensor state alarm messages some weeks ago. A customer noticed them after an update. Last week, I got the first comments under this blog post abot fan failure messages after applying the latest ESXi 6.7 updates. Then more and more customers asked me about this, because they got these messages too in their environment after applying the latest updates.

Last Saturday I tweeted my blog post to give a hint to my followers who may be experiencing the same problem.

Fortunately one of my followers (Thanks Markus!) pointed me to a VMware KB article with a workaround: Fan health sensors report false alarms on HPE Gen10 Servers with ESXi 6.7 (78989).

This is NOT a solution, but a workaround. Keep that in Mind.

Thanks again to Markus. Make sure to visit his awesome blog (MY CLOUD-(R)EVOLUTION) , especially if you are interested in vSphere, Veeam and automation!

Missing Microsoft Teams calendar tab with on-premise Exchange

Microsoft Teams got a big push due to the current COVID19 crisis and many of my customers deployed it in the past weeks. At ML Network, we are using Microsoft Teams for more than a year, and we don’t want to miss it anymore.

Source: Microsoft

We are running Exchange 2016 on-premises, currently CU16. We were missing the calendar tab in Teams since we started with Microsoft Teams. when you do some research about this issue, you will find many threads and blog posts, but these are the two key facts:

  • it is supported with on-premises hybrid Exchange deployments
  • it works flawless with Exchange Online

Our Exchange is configured as full-hybrid mode deployment. I did this as we deployed Office 365 at our organization.

Let’s summarize:

  • Exchange 2016 CU16
  • Hybrid Deployment
  • Office 365 with Teams enabled
  • no calendar tab when the Exchange mailbox is hosted on-premises

OAuth FTW!

While doing an Exchange Hybrid deployment for one of my customers some weeks ago, I’ve stumbled over an OAuth error message at the end of the Hybric Connection Wizard. The message was HCW8064

“HCW has completed, but was not able to perform the OAuth portion of your Hybrid configuration”

We were not able to fix this. Microsoft offers two solutions:

Yesterday I did the upgrade from CU15 to CU16 on our Exchange server and while watching the progress bar I did some research on this issue again. I found strong evidence that Microsoft Teams needs working OAuth to display the calendar tab and access the on-premises hosted mailbox. So I gave it a try and used the latest version of the HCW wizard.

What should I say? No OAuth configuration error and after a restart of Microsoft Teams, the calendar tab appeared.

Lessons Learned:

  • always use the latest CU für Exchange
  • always use the latest HCW Wizard

Connecting to Exchange Online with PowerShell

The task was simple: Change the alias and the primary SMTP address of a Microsoft Teams team. This can be done by changing the alias and the SMTP address of the underlaying Office 365 group. But how? All you need is a PowerShell connection to Exchange Online.

All you need is a PowerShell on your local computer and Office 365 credentials with the necessary privileges.

First we need to provide the necessary credentials.

 $cred = Get-Credential

A windows will come up and you must enter your Office365 credentials.

The next step is to create a PowerShell remote session with Exchange Online.

$Session = New-PSSession -ConfigurationName Microsoft.Exchange -ConnectionUri https://outlook.office365.com/powershell-liveid/ -Credential $cred -Authentication Basic -AllowRedirection

Please note that basic auth will be disabled in October 2020!

To connect to this remote session, use Import-PSSession.

Import-PSSession $Session -DisableNameChecking

When you finished your work, make sure to remove the remote session with Remove-PSSession!

Remove-PSSession $Session 

Space reclamation of VMFS 5 Datastores using esxcli

It was a bit quiet here in January caused by a new “private project” which has attracted some resources, and will pull more resources in the future.

But this will not stop me from documenting useful stuff. This one is nothing new, but commonly asked by some customers: How do I get my storage capacity back after deleting VMs?!

The outlined steps are all done using esxcli. You need to execute them on a single ESXi host, not on each host in the cluster.

Connect to one of your ESXi hosts using SSH. You can use this small PowerCLI command to enable SSH on a specific host.

Get-VMHost esx1.lab.local | Get-VMHostService | Where Key -EQ "TSM-SSH" | Start-VMHostService 

The first step is to identify the datastore(s) from which you want to reclaim storage.

[[email protected]:~] esxcli storage vmfs extent list
 Volume Name    VMFS UUID                            Extent Number  Device Name                           Partition
 -------------  -----------------------------------  -------------  ------------------------------------  ---------
 VMDS01         55dc0522-c72eebec-3780-d89d672d7a3c              0  naa.60030d90eca17602ce5c5a54a083e31c          1

We will need the device name, and later the UUID. The next step is to identify if the device is detected as a thin-provisioned disk, and if it is VAAI-capable. I’ve shortened the output of the esxcli output to the necessary output.

[[email protected]:~] esxcli storage core device list -d naa.60030d90eca17602ce5c5a54a083e31c
    Thin Provisioning Status: yes
    VAAI Status: supported

No we have to verify if all necessary VAAI options are supported.

[[email protected]:~] esxcli storage core device vaai status get -d naa.60030d90eca17602ce5c5a54a083e31c
 naa.60030d90eca17602ce5c5a54a083e31c
    VAAI Plugin Name:
    ATS Status: supported
    Clone Status: supported
    Zero Status: supported
    Delete Status: supported

Important for us is the “Delete” primitive. If this is supported, we can use UNMAP to reclaim storage.

[[email protected]:~] esxcli storage vmfs unmap -u 55dc0522-c72eebec-3780-d89d672d7a3c

This process will take some time depending on the amount of storage that has to be reclaimed. And it will put some load on your storage, so you might want to run this in a less productive time.

VCAP6.5-DCV Design – Objective 2.4 Build manageability requirements into a vSphere 6.x logical design

This seems to be my last blog post for 2019 and it covers covers objective 2.4 (Build manageability requirements into a vSphere 6.x logical design) of the VCAP6.5-DCV Design exam. It is based on the VMware Certified Advanced Professional 6.5 in Data Center Virtualization Design (3V0-624) Exam Preparation Guide (last update August 2017).

The necessary skills and abilities are documented in the exam prep guide for the older VCAP6-DCV Design exam (3V0-622). I think they also apply to the current version of the exam:

  • Evaluate which management services can be used with a given vSphere Solution
  • Differentiate infrastructure qualities related to management
  • Differentiate available command line-based management tools (PowerCLI, vMA etc.)
  • Evaluate VMware Management solutions based on customer requirements
  • Build interfaces into the logical design for existing operations practices
  • Address identified operational readiness deficiencies
  • Define Event, Incident and Problem Management practices
  • Analyze Release Management practices
  • Determine request fulfillment and release management processes
  • Determine requirements for Configuration Management
  • Define change management processes based on business requirements
  • Based on customer requirements, identify required reporting assets and processes

While the last blog post has covered the availability requirements, this blog posts focuses on the manageability requirements of a logical design. It’s all about how to manage the proposed solution.

Evaluate which management services can be used with a given vSphere Solution

You can use different “services” to manage a vSphere environment.

  • vCenter and vMA

Both appliances offer you different services to connect to in order to manage your environment, like

  • vSphere Client (Web Client, C# Client)
  • SSH
  • APIs
  • PowerCLI

The different tools help you to manage the different vSphere components, like

  • HA
  • DRS
  • Networking (vDS, vSS)
  • Auto Deploy
  • Host Profiles
  • etc.

Differentiate infrastructure qualities related to management

The different infrastructure qualities are

  • Availability
  • Manageability
  • Performance
  • Recoverability
  • Security

Depending on which infrastructure quality you consider, it affects the manageability of the proposed solution. For example: A single vCenter might not offer the required availability. Or a single datastore might not meet the required performance. But a highly-available vCenter or a SDRS cluster affects the way how you management the solution.

Differentiate available command line-based management tools (PowerCLI, vMA etc.)

You should be able to differentiate between PowerCLI (PowerShell) and vMA (Appliance) or vCLI (command-line tools for ESXi).

Evaluate VMware Management solutions based on customer requirements

Depending on the customers requirements, some solutions might be out of scope. If the customer doesn’t have a vSphere Enterprise Plus license, there’s no way to use Storage DRS.

Build interfaces into the logical design for existing operations practices

This topic is about what existing interfaces (in terms of systems) the customer already using and how to build them into the design. Think about Syslog servers, Active Directory for authentication (infrastructure quality design), Public Key Infrastructure (PKI) for certificates etc.

Address identified operational readiness deficiencies

Operational Readiness (OR) is the capability of an organization to (efficiently) deploy, operate, and maintain a system and/ or its processes. Before the proposed solution is going to production, any deficits in regard of OR has to be identified and addresses.

Define Event, Incident and Problem Management practices

This sounds like ITIL, and I would assume that the definition of event, incident and problem of ITIL is meant. ITIL defines

  • Event: An event can be defined as any detectable or discernible occurrence that has significance for the management of the IT Infrastructure or the delivery of IT service and evaluation of the impact a deviation might cause to the services. Events are typically notifications created by an IT service, Configuration Item (CI) or monitoring tool. (Wikipedia)
  • Incident: An incident is an event that could lead to loss of, or disruption to, an organization’s operations, services or functions. (Wikipedia)
  • Problem: The Information Technology Infrastructure Library defines a problem as the cause of one or more incidents. (Wikipedia)

The design should include practices for event, incident and problem management. Most customers will already have practices for this, but they might be adjusted for the proposed solution.

Analyze Release Management practices

Release management is the process of managing, planning, scheduling and controlling the deployment of new or modified services. This topic covers the currently deployed Release Management processes of the customers.

Determine request fulfillment and release management processes

This topic is related to the prior topic. You should determine if the customers has already deployed request fulfillment and release management processes, and if they are already deployed, you should check if they are suitable for the proposed solution.

The request fulfillment will allow users to request and receive standardized services. Think about the automated deployment of VMs after requesting a new VM using a portal web site.

Determine requirements for Configuration Management

Changes to the proposed solution will be required over time. Configuration Management covers the management of all Configuration Items (CI). Event if it’s not mentioned in this topic, Configuration Management is related to Change Management, because all changes to CIs has to be documented.

Define change management processes based on business requirements

The objective of change management in this context is to ensure that standardized methods and procedures are used for efficient and prompt handling of all changes to control IT infrastructure, in order to minimize the number and impact of any related incidents upon service. (Wikipedia)

If a customer already has ITSM processes in place, they most likely will have a change management process. This process has to be defined to fulfill the requirements of the proposed solution.

Based on customer requirements, identify required reporting assets and processes

Especially when it comes down to security, it’s important to talk about monitoring and logging. This topic is about

  • What CIs have to be monitored?
  • What events have to be logged/ tracked?
  • How to keep track of changes to configuration items?
  • How keep documentation up-to-date?

Summary

This objective is full of ITSM/ ITIL. It’s pretty helpful if you were familiar with the concepts of ITSM/ ITIL. You should have a good understanding of the different management tools and management solutions and services of a vSphere design.

VCAP6.5-DCV Design – Objective 2.3 Build availability requirements into a vSphere 6.x logical design

This blog post covers objective 2.3 (Build availability requirements into a vSphere 6.x logical design) of the VCAP6.5-DCV Design exam. It is based on the VMware Certified Advanced Professional 6.5 in Data Center Virtualization Design (3V0-624) Exam Preparation Guide (last update August 2017).

The necessary skills and abilities are documented in the exam prep guide for the older VCAP6-DCV Design exam (3V0-622). I think they also apply to the current version of the exam:

  • Evaluate which logical availability services can be used with a given vSphere solution
  • Differentiate infrastructure qualities related to availability
  • Describe the concept of redundancy and the risks associated with single points of failure
  • Explain class of nines methodology
  • Determine availability component of service level agreements (SLAs) and service level management processes
  • Determine potential availability solutions for a logical design based on customer requirements
  • Create an availability plan, including maintenance processes
  • Balance availability requirements with other infrastructure qualities
  • Analyze a vSphere design and determine possible single points of failure

Let’s start with…

Evaluate which logical availability services can be used with a given vSphere solution

VMware vSphere offers a broad band of features that allows you to create highly available solutions. When we take a look at the infrastructure, feature like VMware HA, FT, or even multiple NICs at a distributed vSwitch allow to increase availablility. When we look at the application layer, other techniques, like DRS can help us to increase availability to use DRS to place VMs on different hosts (anti-affinity rules) etc.

Differentiate infrastructure qualities related to availability

The infrastructure qualities are:

  • Availability
  • Manageability
  • Performance
  • Recoverability
  • Security

Availability and Recoverability are tight together. René van den Bedem has written an very good blog post about how recoverability affectes availability.

Describe the concept of redundancy and the risks associated with single points of failure

This topic is pretty clear and should be easy to explain. You should be able to identify what a single point of failure is, and how you can avoid them. Examples for a single point of failure are:

  • only a single-port HBA in a server
  • only one network uplink from a Top-of-Rack switch to a Core-Switch
  • using of RAID 0

Explain class of nines methodology

This is also easy:

  • Two Nines- 99% – 3.65 days downtime per year
  • Three Nines- 99,9% – 8.76 hours downtime per year
  • Four Nines- 99,99% – 52.6 minutes downtime per year
  • Five Nines – 99,999% – 5.26 minutes downtime per year
  • Six Nines – 99,9999% – 31.56 seconds downtime per year

Important note: “Downtime” means “unplanned downtime”, not planned downtime, like in maintenance windows.

Determine availability component of service level agreements (SLAs) and service level management processes

An Service Level Agreement (SLA) is a contact between two parties, usually a supplier and a customer. The SLA describes targets that should be met. This can be an availability expressed using the “class of nines methodology”. If this target is missed,the supplier ofthen has to pay a penalty to the customer.

So it is pretty important to build a design that can fulfill the availability requirements. Depending on the requirements you may have to use VMware FT. If the availability requirements are lower, VMware HA may be sufficient. It is important that you can choose the best technique for the given SLA.

Determine potential availability solutions for a logical design based on customer requirements

Now it’s time to put things together. You know the different techniques that are offered by VMware vSphere, and you know the customer requirements. This allows you to determine the potential availability solutions for a logical design.

Create an availability plan, including maintenance processes

Again, I’d like to recommend the blog post of René van den Bedem. It’s all about RPO, RTO, MTD and how much does an unplanned downtime costs (result of a Business Impact Analysis).

Balance availability requirements with other infrastructure qualities

At some point of your design you need to holistically look at your design and you have to ensure that a decision, that was made, doesn not impact other requirements or other decision.

Analyze a vSphere design and determine possible single points of failure

This is pretty self-explanatory and can be done together with the preceding step.

Summary

Availability is the main theme of this objective. Do not lose sight of the customer’s requirements. Increasing availability is often associated with immense additional costs.

Read the mentioned blog post from René and I rellay recommend this vBrownBag video with Rebecca Fitzhugh.

Why we need a vSAN licensing for SMB customers

Not every customer is running a full-blown vSphere Enterprise Plus licensing. To be honest, when I look at the number of sold licenses, most of my customers are running vSphere Essentials Plus. Not Essentials, nor Standard or Enterprise (Plus), but two or three hosts with Essentials Plus. And that’s perfectly fine!

Two or three hosts with 10 GbE and pretty often 12G SAS. Some of them with Fibre-Channel, nearly no one with iSCSI. My colleagues and I developed a pretty rock solid setup over the last years, which we sell like some kind of building block: HPE ProLiant, HPE MSA, Aruba Switches, vSphere Essentials Plus. A perfect setup for most of our customers, which run something between 10 and 30 VMs on it. Some of them also add Horizon View (Add-On) to it.

But requirements change. More customers ask for more hosts. When customers break out of the Essentials Plus licensing, then often because of the host limitation. Less of them do this because they need DRS or even Storage vMotion.

Some of my customers have heard about vSAN and they like the idea behind it. Especially when you take into account, that hardware costs decrease and flash storage is getting cheaper. But when you discuss the idea of combining vSAN and Essentials licensing, you will hit the host limitation early.

VMware itself states in the vSAN licensing guide:

The 2-node vSAN deployment model is not restricted to a specific vSAN license edition. In other words, any of the licensing editions can be used with a 2-host configuration. vSphere Essentials Kit or vSphere Essentials Plus Kit licensing limits the number of hosts managed by
vCenter Server Essentials to three. The vSAN witness host – virtual appliance or physical – is considered a host in these Essentials licensing bundles.

Source: VMware vSAN Licensing Guide

When you take a look at the Horizon Desktop licensing, or at the RoBo licensing, you will see another kind of limitation: Limiting the number of VMs, not the number of hosts. This is pretty interesting when you think about combining vSAN and Essentials licensing.

Why not offering a “HCI Essentials Kit” limitied to 25 VMs, and the features offered by Essentials Plus and vSAN Standard? This would allow customers to run four or five hosts with vSAN. By limiting the number of VMs, customers can scale-out their infrastructure in terms of capacity.

Hey VMware, you might think about this over the Christmas holiday. ;) There is a customer segment that is not yet sufficiently addressed by your sales team. This is a chance for more YoY growth. ;)

VMware ESXi 6.7: Recurring host hardware sensor state alarm

This posting is ~12 months years old. You should keep this in mind. IT is a short living business. This information might be outdated.

If you found this blog post because you are searchting for a solution for a FAN FAILURE on your ProLiant Gen10 HW after applying the latest ESXi 6.7 patches, then use this shortcut for the workaround: Fan health sensors report false alarms on HPE Gen10 Servers with ESXi 6.7


I had a really annoying problem at one of my customers. After deploying new VMware ESXi hosts (HPE ProLiant DL380 Gen10) along with an upgrade of the vCenter Server Appliance to 6.7 U2, the customer reported recurring host hardware sensor state alarm messages in the vCenter for all hosts.

After acknowledging the alarm, it recurred after a couple of minutes or hours. The hardware was finde, no errors or warnings were noticed in the ILO Management Log. But the vCenter reported periodically a Sensor -1 type error in the Events window. The /var/log/syslog.log contained messages like this:

2019-11-29T04:39:48Z sfcb-vmw_ipmi[4263212]: IpmiIfcSelGetInfo: IPMI_CMD_GET_SEL_INFO cc=0xc1
 2019-11-29T04:39:49Z sfcb-vmw_ipmi[4263212]: IpmiIfcSelGetInfo: IPMI_CMD_GET_SEL_INFO cc=0xc1
 2019-11-29T04:39:50Z sfcb-vmw_ipmi[4263212]: IpmiIfcSelGetInfo: IPMI_CMD_GET_SEL_INFO cc=0xc1
 2019-11-29T04:39:51Z sfcb-vmw_ipmi[4263212]: IpmiIfcSelGetInfo: IPMI_CMD_GET_SEL_INFO cc=0xc1
 2019-11-29T04:39:52Z sfcb-vmw_ipmi[4263212]: IpmiIfcSelGetInfo: IPMI_CMD_GET_SEL_INFO cc=0xc1

Sure, you can ignore this. But you shouldn’t ignore this, because these events can result in the vCenter database increasing in size. vCenter can crash once the SEAT partition size goes above the 95% threshold. So you better fix this!

Long story short: This bug is fixed with the latest November updates for ESXi 6.7 U3. A workaround is to disable the WBEM service. The WBEM service might be enabled after a reboot. In this case you have to disable the sfcbd-watchdog service.

But the best way to solve this is to install the latest patches (VMware ESXi 6.7, Patch Release ESXi670-201911001)

VCAP6.5-DCV Design – Objective 2.2 Map service dependencies

This posting is ~1 year years old. You should keep this in mind. IT is a short living business. This information might be outdated.

This blog post covers objective 2.2 (Map service dependencies) of the VCAP6.5-DCV Design exam. It is based on the VMware Certified Advanced Professional 6.5 in Data Center Virtualization Design (3V0-624) Exam Preparation Guide (last update August 2017).

The necessary skills and abilities are documented in the exam prep guide for the older VCAP6-DCV Design exam (3V0-622). I think they also apply to the current version of the exam:

  • Evaluate dependencies for infrastructure and application services that will be included in a vSphere design
  • Create Entity Relationship Diagrams that map service relationships and dependencies
  • Analyze interfaces to be used with new and existing business processes
  • Determine service dependencies for logical components
  • Include service dependencies in a vSphere 6.x Logical Design
  • Analyze services to identify upstream and downstream service dependencies
  • Navigate logical components and their interdependencies and make decisions based upon all service relationships

Let’s start with the second topic of this objective.

Evaluate dependencies for infrastructure and application services that will be included in a vSphere design

This topic covers two different parts of our vSphere design:

  • infrastructure, and
  • application services

You should clarify what components of your design depend on each other, or if they depend on components, that are not part of your design. VMware HA needs a shared Storage, or VMware ESXi needs NTP and DNS to work properly.

The same applies to the application services (or applications) that are part of your design. What dependencies do they have. Imagine a three-tier application with database, application logic and web frontend.

You must be able to identify and describe these dependencies.

Create Entity Relationship Diagrams that map service relationships and dependencies

If you are able to identify and describe the dependencies, you also must be able to create a Entity Relationship Diagrams (ER-Diagram) to visualize these dependencies.

Do your homework and try to identify these dependencies at the beginning. Tools like the vRealize Infrastructure Navigator can help you to identify them.

Analyze interfaces to be used with new and existing business processes

It is pretty important to understand how systems interact. To gain this knowledge, you have to analyze the interfaces of business processes. This doesn’t mean that you have to click through ERP applications, but you should get familiar with how processes are tight together.

Determine service dependencies for logical components

You also have to identify the service dependencies for the logical components in your design. You can use tools like vRealize Operations Manager or the Infrastructure Navigator to get the necessary information.

Include service dependencies in a vSphere 6.x Logical Design

The identified service dependencies have to be included into the logical design. This is a pretty important step and you should pay it the necessary attention. Tables and ER diagrams will help you at this step.

Analyze services to identify upstream and downstream service dependencies

An upstream service is a service, which is mandatory for another service, because it relies on it. Downstream services need upstream services to work properly. For example: DNS is an upstream service for Active Directory.

The understanding of up- and downstream services is important for things like startup/ shutdown plans.

Navigate logical components and their interdependencies and make decisions based upon all service relationships

You should visualize the service dependencies. This will help you to evaluate the impact if a service fails or how service are interact with each other.

Summary

Most of the topics in this objective overlap. Quite basic everything is about the understanding how things are connected and interact. This will help you to get a better understanding of dependencies and what services are crucial for the business or your solution.

Think again on DNS. No one of us will ever build a solution with a single DNS server, because nearly everything will melt down if DNS is not available. DNS is a perfect example for an upstream service.

Load balancing ADFS and ADFS Proxy using Citrix ADC

This posting is ~1 year years old. You should keep this in mind. IT is a short living business. This information might be outdated.

Last week I had to setup a small Active Directory Federation Services (ADFS) farm that will be used to allow Single Sign-On (SSO) with Office 365.

Active Directory Federation Services (ADFS) is a solution developed by Microsoft to provide users an authenticated access to applications, that are not capable of using Integrated Windows Authentication (IWA).

Required by the customer was a two node ADFS farm located on the internal network, and a two node ADFS Proxy farm located at the DMZ.

An ADFS Proxyserver acts as a reverse proxy and it is typically located in your organizations perimeter network (DMZ).

This picture shows a typical ADFS/ ADFS Proxy setup:

ADFS/ WAP Design/ Citrix/ citrix.com

My customer has decided to use Citrix ADC (former NetScaler) to load balance the requests for the ADFS farm and the ADFS Proxy farm. In addition to load balancing, this offers high availability in case of a failed ADFS server or ADFS Proxy server. Please note that Citrix ADC can act as a ADFS Proxy, but this requires the Advanced Edition license. My customer “only” had a Standard License, so we had to setup dedicated ADFS Proxy servers on the DMZ network.

Citrix ADC setup

The ADFS service name is typically something like adfs.customer.tld. This farm name has to be the same for internal and external access. For internal access, the ADFS service name must be resolved to the VIP of the Citrix ADC. The same applies to external accesss. So you have to setup split DNS.

ADFS uses HTTP and HTTP, so my first attempt was to use this Citrix ADC Content Switch based setup:

add server srv_adfs1 x.x.x.x
add server srv_adfs2 x.x.x.y

add cs vserver cs_vsrv_adfs SSL x.x.x.x 443 -cltTimeout 180 -caseSensitive OFF
add lb vserver lb_vsrv_adfs SSL 0.0.0.0 0 -persistenceType SSLSESSION -cltTimeout 180

add cs action cs_action_adfs -targetLBVserver lb_vsrv_adfs
add cs policy cs_pol_adfs -rule "HTTP.REQ.URL.SET_TEXT_MODE(IGNORECASE).CONTAINS(\"adfs.customer.tld\")" -action cs_action_adfs
bind cs vserver cs_vsrv_adfsL -policyName cs_pol_adfs -priority 100

add serviceGroup svcgrp_adfs SSL -maxClient 0 -maxReq 0 -cip ENABLED X-MS-Forwarded-Client-IP -usip NO -useproxyport YES -cltTimeout 180 -svrTimeout 360 -CKA NO -TCPB NO -CMP YES -appflowLog DISABLED

add lb monitor mon_adfs HTTP-ECV -send "GET /federationmetadata/2007-06/federationmetadata.xml" -recv "adfs.customer.tld/adfs/services/trust" -LRTM ENABLED -secure YES

bind serviceGroup svcgrp_adfs srv_gk-adfs1 443 -CustomServerID "\"None\""
bind serviceGroup svcgrp_adfs srv_gk-adfs2 443 -CustomServerID "\"None\""
bind serviceGroup svcgrp_adfs -monitorName mon_adfs

bind lb vserver lb_vsrv_adfs svcgrp_adfs

bind ssl vserver lb_vsrv_adfs -certkeyName cert-key-pair
bind ssl vserver cs_vsrv_adfs -certkeyName cert-key-pair

set ssl vserver lb_vsrv_adfs -ssl3 DISABLED
set ssl vserver cs_vsrv_adfs -ssl3 DISABLED

This is a pretty common setup for HTTP/ HTTPS based services. But it doesn’t work… Mainly because the monitor was not getting the required response. So the monitored service was down for the ADC, and therefore the service group, the load balancing virtual server and the content switch won’t came up.

The reason for this is Server Name Indication (SNI), an extension to Transport Layer Security (TLS). SNI is enabled and required since ADFS 3.0. The monitor tries to access the URL http://x.x.x.x/federationmetadata/2007-06/federationmetadata.xml, but the ADFS service won’t answer to those requests, because it includes the ip address, and not the ADFS service name.

But there is a workaround for everything on the Internet! You can change the binding on the ADFS server nodes using netsh.

netsh http add sslcert ipport=<IPAddress:port> certhash=<certhash> appid=<appid> certstorename=MY

I will not add the necessary options to this command, because: DON’T DO THIS!

Yes, the service group, the load balancing virtual server and the content switch will come up after this change. But you will not be able to enable a trust between your ADFS Proxy servers and the ADFS farm.

Microsofts requirements on Load Balancing ADFS

Microsoft offers a nice overview about the requirements when deploying ADFS. There is a section about the Network requirements. Below this, Microsoft clearly documents the requirements when load balancing ADFS servers and ADFS Proxy servers.

The load balancer MUST NOT terminate SSL. AD FS supports multiple use cases with certificate authentication which will break when terminating SSL. Terminating SSL at the load balancer is not supported for any use case.

Requirements for deploying AD FS/ microsoft.com

Okay, with this in mind, the you can’t use a ADC Content Switch as described above. Because it will terminate SSL. You have to switch to a load balancing virtual server and a service group with SSL bridge . Citrix describes SSL bridge as follows:

A SSL bridge configured on the NetScaler appliance enables the appliance to bridge all secure traffic between the SSL client and the SSL server. The appliance does not offload or accelerate the bridged traffic, nor does it perform encryption or decryption. Only load balancing is done by the appliance. The SSL server must handle all SSL-related processing. Features such as content switching, SureConnect, and cache redirection do not work, because the traffic passing through the appliance is encrypted.

But there is a second, very interesting statement:

It is recommended to use the HTTP (not HTTPS) health probe endpoints to perform load balancer health checks for routing traffic. This avoids any issues relating to SNI. The response to these probe endpoints is an HTTP 200 OK and is served locally with no dependence on back-end services. The HTTP probe can be accessed over HTTP using the path ‘/adfs/probe’http://<Web Application Proxy name>/adfs/probe
http://<ADFS server name>/adfs/probe
http://<Web Application Proxy IP address>/adfs/probe
http://<ADFS IP address>/adfs/probe

Requirements for deploying AD FS/ microsoft.com

This is pretty interesting, because it addresses the above described issue with the monitor. The solution to this is a HTTP-ECV monitor with on port 80, a GET to “/adfs/probe” and the check for a HTTP/200.

A working Citrix ADC setup

This setup is divided into two parts: One for the ADFS farm, and a second one for the ADFS Proxy farm. It uses SSL bridge and HTTP for the service monitor.

Load balancing the ADFS farm

add server srv_adfs1 x.x.x.x
add server srv_adfs2 x.x.x.y

add serviceGroup svcgrp_adfs SSL_BRIDGE -maxClient 0 -maxReq 0 -cip DISABLED -usip NO -useproxyport YES -cltTimeout 180 -svrTimeout 360 -CKA NO -TCPB NO -CMP NO
add lb vserver lb_vsrv_adfs SSL_BRIDGE x.x.x.z 443 -persistenceType SSLSESSION -cltTimeout 180
add lb monitor mon_adfs_http HTTP -respCode 200 -httpRequest "GET /adfs/probe" -LRTM ENABLED -destPort 80

bind serviceGroup svcgrp_adfs srv_adfs1 443
bind serviceGroup svcgrp_adfs srv_adfs2 443
bind serviceGroup svcgrp_adfs -monitorName mon_adfs_http
bind lb vserver lb_vsrv_adfs svcgrp_adfs
set ssl vserver lb_vsrv_adfsproxy -ssl3 DISABLED

Load balancing the ADFS Proxy farm

add server srv_adfsproxy1 y.y.y.y
add server srv_adfsproxy2 y.y.y.x

add serviceGroup svcgrp_adfsproxy SSL_BRIDGE -maxClient 0 -maxReq 0 -cip DISABLED -usip NO -useproxyport YES -cltTimeout 180 -svrTimeout 360 -CKA NO -TCPB NO -CMP NO
add lb vserver lb_vsrv_adfsproxy SSL_BRIDGE y.y.y.z 443 -persistenceType SSLSESSION -cltTimeout 180
add lb monitor mon_adfs_proxy_http HTTP -respCode 200 -httpRequest "GET /adfs/probe" -LRTM ENABLED -destPort 80

bind serviceGroup svcgrp_adfsproxy srv_adfsproxy1 443
bind serviceGroup svcgrp_adfsproxy srv_adfsproxy2 443
bind serviceGroup svcgrp_adfs -monitorName mon_adfs_proxy_http
bind lb vserver lb_vsrv_adfsproxy svcgrp_adfsproxy
set ssl vserver lb_vsrv_adfsproxy -ssl3 DISABLED

I have implemented it on a NetScaler 12.1 with a Standard license. If you have feedback or questions, please leave a comment. :)