Author Archives: Patrick Terlisten

About Patrick Terlisten

vcloudnine.de is the personal blog of Patrick Terlisten. Patrick has a strong focus on virtualization & cloud solutions, but also storage, networking, and IT infrastructure in general. He is a fan of Lean Management and agile methods, and practices continuous improvement whereever it is possible. Feel free to follow him on Twitter and/ or leave a comment.

Two registry changes to improve physical Horizon View Agent experience

Using physical clients as Horizon View agents is pretty common for me. My office pc, as well as my Lenovo X250 are often used by using the Horizon View Client and the Blast protocol. But as good as the performance is, there were a couple of things that bugged me.

Bild von Mediamodifier auf Pixabay 

On my office pc, I encountered pretty often a black screen, either on first connect, or on reconnect. The typical issue caused by misconfigured firewall policies, but this was completly out of scope in this case, because my collegues never had issues with black screens. The problem occured with different versions of View Agent.

I finally fixed it after I tried to connected per HTML5 client. I got an error, that the connection server was unable to connect to 172.28.208.1. Huh? I don’t know this address… I checked my office pc and found out that this IP was assigned to the Hyper-V virtual switch.

I fixed this by adding this registry change:

HKLM\Software\VMware, Inc.\VMware VDM\IpPrefix = n.n.n.n/m (REG_SZ)

n.n.n.n/m is the subnet on which your View Agent should be connected by the Connection Server.

The second issue was that my client constantly failed to reboot. I clicked reboot, and the machine was gone. No TeamViewer, no RDP, no PING. Only a hard power-off helped to get it back to work.

I fixed this by adding another registry change:

HKCU\Control Panel\Desktop\AutoEndTasks = 1 (REG_SZ)
HKLM\SYSTEM\CurrentControlSet\Control\WaitToKillAppTimeout = 2000 (REG_SZ)
HKU\.DEFAULT\Control Panel\Desktop\AutoEndTasks = 1 (REG_SZ)

This fixed both issues, black screen and hung on reboot, for me.

MFA disabled, but Azure asks for second factor?!

I just had a Teams call with a customer to resolve a strange mystery about Azure MFA.

The customer called me and explained, that he has a user with Azure Multifactor Authentication (MFA) disabled, but when he logs in with this account, he is asked to setup MFA. He setup MFA and was able to login according to their Conditional Access policies.

Bild von Lalmch auf Pixabay 

The customer and I took a look into their tenant and checked a couple of things. The first thing the customer showed me was this screen:

As you can see, the MFA state for this user is “disabled” (german language screenshot). Then we tool a look using the MSOnline PowerShell module.

PS C:\Users\p.terlisten> $x = Get-MsolUser -UserPrincipalName user@domain.tld
PS C:\Users\p.terlisten> $x.StrongAuthenticationMethods

ExtensionData                                    IsDefault MethodType
-------------                                    --------- ----------
System.Runtime.Serialization.ExtensionDataObject     False OneWaySMS
System.Runtime.Serialization.ExtensionDataObject     False TwoWayVoiceMobile
System.Runtime.Serialization.ExtensionDataObject      True PhoneAppOTP
System.Runtime.Serialization.ExtensionDataObject     False PhoneAppNotification

The user has MFA enabled and the second factor is an authenticator app on his phone.

Schrödinger’s MFA

The mystery is not a mystery anymore if you take into account that the first screenshot is the screenshot of the Per-User MFA.

The customer is using Conditional Access, therefore Security Defaults are disabled for his tenant. Microsoft states:

If your organization is a previous user of per-user based Azure AD Multi-Factor Authentication, do not be alarmed to not see users in an Enabled or Enforced status if you look at the Multi-Factor Auth status page. Disabled is the appropriate status for users who are using security defaults or Conditional Access based Azure AD Multi-Factor Authentication.

What are security defaults?

Conditional Access, or enabled Security Defaults, will force a user to enroll MFA, even if the per-user MFA setting is set to “disabled”!

You have to disable Security Defaults, and you have to disable Conditional Access in order to get per-user MFA reflect the current state of MFA for a specific user.

vCenter Server migration to 6.7 fails with “Failed to check VMware STS. The SSL certificate of STS service cannot be verified”

This posting is ~12 months years old. You should keep this in mind. IT is a short living business. This information might be outdated.

There are still customers out there that are running vCenter Server on a Windows host. This year, despite the fact that most customers have set project on hold, I managed some of them to migrate to a vCenter Server Appliance.

Some days ago I had an meeting with one of my favorite customers to migrate their vCenter Server 6.5 to a vCenter Server Appliance 6.7 U3l. They were still on 6.5 because of some legacy ESXi 5.5 hosts, but they managed it to remove them from their vCenter and we were able to start the migration.

Healthcheck & Stage 1

We did a healthcheck the day before, so we pretty sure that everything should will went smooth. Stage 1 was easy. We deployed the appliance (X-Large… *cough* *cough*) and moved to stage 2. We had ~20 GB of data to migrate. IMHO nothing fancy, I had vCenter withs more data to migrate.

Stage 2… and FAIL

To make a long story short: Stage 2 failed pretty hard with an unrecoverable error.

Encountered an internal error. Traceback (most recent call last): File "/usr/lib/vmidentity/firstboot/vmidentity-firstboot.py", line 1641, in main vmidentityFB.boot() File "/usr/lib/vmidentity/firstboot/vmidentity-firstboot.py", line 345, in boot self.checkSTS(self.__stsRetryCount, self.__stsRetryInterval) File "/usr/lib/vmidentity/firstboot/vmidentity-firstboot.py", line 1179, in checkSTS raise Exception('Failed to initialize Secure Token Server.') Exception: Failed to initialize Secure Token Server.

Deep dive into the log files

The messaged indicated a problem with the Secure Token Server (STS). Without any troubleshooting we did a second try, which also fails with the same message. Time to dive into the logs…

The log file of vmidentity-firstboot.py was pretty helpful, because the end of the log file pointed us to the right direction.

Failed to check VMware STS.
com.vmware.vim.sso.client.exception.CertificateValidationException: The SSL certificate of STS service cannot be verified

This message led us to VMware KB76144 (“Failed to check VMware STS. The SSL certificate of STS service cannot be verified” while upgrading VCSA from 6.5 to 6.7/7.0)

According to the KB article, the cause for our problem is a certificate in STS_INTERNAL_SSL_CERT store which is used by the STS. For sure: This vCenter was upgraded from 5.5 at some time in the past.

So we checked the certificate stores and found further evidence, that a certificate seemed to be our main problem. As you can see, this certificate from the STS_INTERNAL_SSL_CERT store was expired some days ago.

Fortunately, KB76144 offers a simple solution to this problem. In short:

  • remove certificates from the STS_INTERNAL_SSL_CERT store, and
  • re-import the certificate from the MACHINE_SSL_CERT store

It’s DNS… or NTP… or a expired certificate

Because we had a Windows-based vCenter, we had to modify the commands from the KB article for Windows.

C:\Program Files\VMware\vCenter Server\vmafdd>vecs-cli entry getcert --store MACHINE_SSL_CERT --alias __MACHINE_CERT --output c:\temp\machine_ssl.crt

C:\Program Files\VMware\vCenter Server\vmafdd>vecs-cli entry getkey --store MACHINE_SSL_CERT --alias __MACHINE_CERT --output c:\temp\machine_ssl.key

C:\Program Files\VMware\vCenter Server\vmafdd>vecs-cli entry getcert --store STS_INTERNAL_SSL_CERT --alias __MACHINE_CERT --output c:\temp\STS_INTERNAL_SSL_CERT-__MACHINE_CERT.crt

C:\Program Files\VMware\vCenter Server\vmafdd>vecs-cli entry getkey --store STS_INTERNAL_SSL_CERT --alias __MACHINE_CERT --output c:\temp\STS_INTERNAL_SSL_CERT-__MACHINE_CERT.key

C:\Program Files\VMware\vCenter Server\vmafdd>vecs-cli entry delete --store STS_INTERNAL_SSL_CERT --alias __MACHINE_CERT -y
Deleted entry with alias [__MACHINE_CERT] in store [STS_INTERNAL_SSL_CERT] successfully

C:\Program Files\VMware\vCenter Server\vmafdd>vecs-cli entry create --store STS_INTERNAL_SSL_CERT --alias __MACHINE_CERT --cert c:\temp\machine_ssl.crt --key c:\temp\machine_ssl.key
Entry with alias [__MACHINE_CERT] in store [STS_INTERNAL_SSL_CERT] was created successfully

The last step was to restart the VMware STS service. After this, we tried the migration again and the migration went smooth.

Details on Windows 10 E3/ E5 Subscription Activation

This posting is ~12 months years old. You should keep this in mind. IT is a short living business. This information might be outdated.

One of my customers purchased a bunch of Microsoft 365 subscriptions in order to use them with Office 365 and Windows 10 Enterprise. The customer called me because he had trouble to activate the Windows 10 Enterprise license.

Source: Microsoft

I would like so summarize some of the requirements in order to successfuly active Windows 10 Enterprise subscriptions.

License

First of all, there is a licensing requirement. You need at least a Windows 10 Pro or Windows 10 Pro Education. You need one of these licenses! There is no way to use the Windows 10 Enterprise subscription without a base license, because it’s an upgrade!

Source: Microsoft

In case of my customer, the Pro license was missing. After adding and activating a Pro key, the key and edition was automatically updated to Windows 10 Enterprise.

AzureAD

In ordner to activate the license, the devices must be Azure AD-joined or Hybrid Azure AD joined. Workgroup-joined or Azure AD registered devices are not supported!

The Windows 10 Enterprise license must assigned to the user. The license can’t assigned to a device. Without an assigned license, the device can’t upgrade from a Pro to an Enterprise license.

Horizon View – Why Automated Desktop Pools with Full Clones are still a thing

This posting is ~12 months years old. You should keep this in mind. IT is a short living business. This information might be outdated.

We have to deal with COVID19 for a year now and from the IT perspective, 2020 was a pretty strange year. Many project were not cancelled, but were placed on-hold. But two kinds of projects went through the roof:

  • Microsoft 365, and
  • Horizon View

As you might noticed I blogged a lot about Exchange, Exchange Online and Horizon this year. The reason for this is pretty simple: That was driving my business this year.

In early 2020, when we decided to move into our home offices, we deployed Horizon View on physical PCs at ML Network (my employer). This was a simple solution and it works for us until today.

Some of my customers also deployed Horizon View for the same reason: A secure and easy way to get a desktop. For some of them, the tech was new and they struggled with DEM, Linked Clones, customization etc. The solution in this case was easy: Full Clones with dedicated assignment.

One customer moved from Windows 7 and floating-assignment and Linked Clones to Windows 10 and Full Clones and dedicated assignment (not my project).

Another customer started to implement Horizon View with Horizon 2006 and he started with Instant-Clones, dedicated assignment and DEM. I told him to go with Full Clones, but his IT-company moved on with Instant Clones. Now he’s complaining about gaining complexity.

My 2 cents

Many customers struggle with Windows 10 and the customization of Windows 10. Tools like Dynamic Environment Manager (DEM) are powerful, but they can be quite complex, especially when it comes down to small IT orgs with 50,100 oder 200 desktops, were each member of the IT has to be a jack of all trades.

I always recommend to start with Full Clones, just to get in touch with the technology. And I always recommend to get the requirements clear with the stakeholders and the user. Things like not working software, missing settings after a logoff/ logon or slow response are the main difficutiles who will force a VDI project to fail.

When you are familiar with the technology, proceed further with DEM, Instant Clones, floating assignment. But you should learn to walk, before you start to run.

Maybe I’m getting old. :D I’m not against modern technology and new features. I’m not a grumpy old senior consultant. But I think I’ve learned the hard way why it’s a bad idea to overburden IT-orgs and their users with new tech, especially in times like these.

Adobe Flash will die and how does this affects VMware

This posting is ~1 year years old. You should keep this in mind. IT is a short living business. This information might be outdated.

December 31, 2020 will not only be the end of the miserable year 2020, it will also be the end of an era – the era of Adobe Flash! Adobe has announced that they will stop supporting Adobe Flash after December 31, 2020. Furthermore, Adobe will block Flash from running in Flash Player on January 12, 2021. Adobe strongly recommends that all users immediately uninstall Flash Player. I got a popup a couple of times, asking me if I want to uninstall Adobe Flash. It’s still installed… :/

Adobe Flash isn’t a big thing in web development anymore, but there is a reason why I still have Adobe Flash installed – Admin Interfaces!

Source: 9GAG

We all had to deal with Flash after VMware started with the vSphere Web Client. It was slow and partially painful buggy. New newer HTML5 based Web Client was much better, but not feature complete until vSphere 6.7.

But the vSphere Web Client was not the only admin interface based on Flash used in a VMware product. The Horizon Administrator, which was the main administration interface until Horizon 7.8, is also based on Flash. Or vRealize Operations uses Flash until version 6.6.

Update now!

If you want to remove Adobe Flash from your computer, you have to update your whole, or at least parts, of your VMware infrastructure.

The simple rule is: Update to the latest release and everything will be fine. If you are running vSphere 6.7 U3, the HTML5 based Web Client is feature complete. The same applies to Horizon View. If you are running 7.10 or a newer release, everything is fine.

VMware has published and KB article which summarizes the update paths: VMware Flash End of Life and Supportability (78589).

But what if I can’t/ or I’m unwilling to update?

In this case, there is an easy approach: Disconnect your systems from the internet or at least block the internet access for them. The alternative approach is not recommended! Stop the automatic updates on your web browser and use the Flash-based User Interfaces on a browser which still supports Flash. Again: This is really not recommended!

Exchange HCW8078 – Migration Endpoint could not be created

This posting is ~1 year years old. You should keep this in mind. IT is a short living business. This information might be outdated.

While migrating a customer from Exchange 2010 to Exchange 2016, I had to create an Exchange Hybrid Deployment, because the customer wants to use Microsoft Teams. Nothing fancy and I’ve did this a couple of times.

Unfortunantely the Hybrid Connection Wizard failed to create the migration endpoint. A quick check of the logs showed this error:

Microsoft.Exchange.MailboxReplicationService.MRSRemotePermanentException: The Mailbox Replication Service could not connect to the remote server because the certificate is invalid. The call to 'https://mail.contoso.com/EWS/mrsproxy.svc' failed. Error details: Could not establish trust relationship for the SSL/TLS secure channel with authority 'mail.contoso.com'. -->The underlying connection was closed: Could not establish trust relationship for the SSL/TLS secure channel. --> The remote certificate is invalid according to the validation procedure

The customer had not plans to move mailboxes to Exchange Online, so we didn’t care about this error. But the Calendar tab in Teams was not visible, and Teams logs stated that Teams was unable to discover the mailbox. A typical sign of a not working EWS connection.

It’s always TLS… or DNS… or NTP

The customer used a certificate from its own PKI, so it was not trusted by Microsoft. In addition, the Exchange was located behind a Sophos XG which was running Webserver Protection (Reverse Proxy). But this was not the main cause for the problems.

The root cause was the certificate from the customers PKI.

And therefore you should make sure to use a proper certificate from a 3rd CA for Exchange Hybrid Deployments. I really please every customer to stop using self-signed certificates, or certificates from their own PKI for external connections.

The customer has switched to a Let’s Encrypt certificate for testing purposes and the problems went away, without running the HCW again. He will now purchase a certificate from a 3rd party CA.

VMware ESXi 6.7 memory health warnings after ProLiant SPP

This posting is ~1 year years old. You should keep this in mind. IT is a short living business. This information might be outdated.

During the deployment of a vSAN cluster consisting of multiple HPE ProLiant DL380 Gen10 hosts, I noticed a memory health warning after updating the firmware using the Support Pack for ProLiant. The error was definitely not shown before the update, so it was clear, that this was not a real issue with the hardware. Furthermore: All hosts showed this error.

Memory health status after SPP

The same day, a customer called me and asked me about a strange memory health error after he has updated all of his hosts with the latest SPP…

My first guess, that this was not caused by a HW malfunction was correct. HPE published a advisory about this issue:

The Memory Sensor Status Reported in the vSphere Web Client Is Not Accurate For HPE ProLiant Gen10 and Gen10 Plus Servers Running VMware ESXi 6.5/6.7/7.0 With HPE Integrated Lights-Out 5 (iLO 5) Firmware Version 2.30

To fix this issue, you have to update the ILO5 firmware to version 2.31. You can do this manually using the ILO5 interface, or you can add the file to the SPP. I’ve added the BIN file to the USB stick with the latest SPP.

If you want to update the firmware manually, simply upload the BIN file using the built-in firmware update function.

  1. Navigate to Firmware & OS Software in the navigation tree, and then click Update Firmware
  2. Select the Local file option and browse to the BIN file
  3. To save a copy of the component to the iLO Repository, select the Also store in iLO Repository check box
  4. To start the update process, click Flash

You can download the latest ILO5 2.31from HPE using this link. After the FW update, the error will resolve itself.

Only ESXi 6.7 is affected, and only ESXi 6.7 running on HPE ProLiant hosts, regardless if ML, DL or BL series.

Moving a small on-prem environment to Azure/ O365 – Part 2

This posting is ~1 year years old. You should keep this in mind. IT is a short living business. This information might be outdated.

A couple of days ago, I wrote about our first steps to move our on-prem stuff to Azure. This post will cover how we adopted Office 365 and how we have started with our Azure deployment.

Our first step into Office 365 was Microsoft Teams. We needed a solution for calls (audio/ video) and chat. We skipped Skype 4 Business and started with Microsoft Teams.

Microsoft 2020 (c)

Our Microsoft Teams deployment was pretty simple: We used our Microsoft IUR Office 365 E3 plans. Microsoft Azure AD Connect was quickly deployed and the Microsoft Exchange Hybrid Connection Wizard did the rest. Some weeks later we deployed ADFS/ ADFS Proxy. We used this setup over several months and it was pretty slick and was working flawless. At this point, we only used Teams, Planner and OneDrive 4 Business (SharePoint).

Some months went by until we decided to move to Azure.

Resource groups in Azure

You can imagine a resource group (RG) as a container that contains one or more resources, like VMs, NICs, SQL instances etc. The resource group can contain all the resources for the solution, or only those resources that you want to manage as a group.

First question: What do we need to deploy?

The answer was easy:

  • in sum 9 VMs
  • VPN gateway
  • Recovery Services Vault
  • Automation Account
  • Log Analytics Workspace

Second question: One or multiple resource groups?

An easy rule of thumb is, that a resource group should contain only resources that share the same life cycle and sponsor.

Third question: Who needs delegated priviledges to manage this stuff?

In our case there was no need to fine-graded RBAC. All of our technical staff has a personalized admin account and should be able to do whatever is necessary.

So we went for a single resource group.

Do yourself a favor and keep the recommended naming conventions in mind!

Site-2-Site VPN

To connect our on-prem network to Azure, we had to setup a Site-2-Site VPN. This was the first thing after creating our first resource group. We used a Gen 1 Basic VPN Gateway, which was sufficient for our needs (max 100 Mbit, no OpenVPN, no BGP).

Keep in mind to choose your networks and subnets wisely. If you need to deploy 9 VMs, don’t use 10.0.0.0/8. ;) In our case we added two network ranges with a single subnet in each network range. One for our server VMs, and a second subnet as gateway subnet.

VM Deployment

We deployed our VMs as B-Series VMs. A common mistake is to use the wrong VM size. Start small and right-size a VM if necessary. Most of our VMs are B2s (2 CPUs, 4 GB RAM). Only the Exchange (B4m), the management (B2ms) and the RDS server (B2ms) differ from this. This looks pretty small for Server 2019, but it is working pretty nice.

After deploying the VMs, we assigned static IP addresses to them. To our suprise most things in Azure are lacking proper IPv6 support. :( That hurt a lot.

For most VMs we used Standard HDDs instead of SSDs. Even for your file server, because the bottleneck is not the disk, it is the connection between clients and server. Beside this, we used managed disks for all VMs, and we deployed a second disk for data if necessary (Exchange, domain Controller, file server etc.).

If a server had a DNAT in our on-prem network, we deployed a public IP, and secured the access to it.

All VMs are connected to the same Network Security Group (NSG), which we use to get control over what a VM can reach, and who can access a VM.

Server Migration

Over a couple of days we moved more and more services to Azure, starting with our Domain Controllers, PKI and file services. These were low hanging fruits. The file server was easy because we already had a DFS namespace in place, so all we had to do were to change the DFS Links and point them to the new file server. The data was copied by using DFS replication.

DHCP was moved to our on-prem firewall. A printserver was not necessary any more. Windows Updates were switched back to download from Microsoft and Delivery Optimization.

The applications that were running on our Linux and Windows application server were also easy to migrate. After a couple of days we had our server workload running on Azure.

To get our ERP running, we deployed a single RDS host (quick deployment), and deployed our ERP as a remote app. It was too slow to use it over the VPN. Unfortunately the application lacks a proper database backend. :/ But as a remote app, it is working pretty good.

A bigger challenge was Exchange, but not because of the mailbox migrations.

Exchange Online

The migration to Exchange Online was pretty simple. Since our first HCW run, we used the central mail transport, so that all mails are received and sent by our on-prem mail gateway.

The mailbox migration was pretty easy and we had zero issues. Then we tried to switch the mail transport from central of Exchange Online. This was flawless too… except the fact, that our ticket system was unable to send e-mails.

Our ticket system relays its mail over our Exchange server. After switching the mail server in our ticket system to the new Azure based VM, the mails stuck in the outbound queue, even if the server tried to send the mail to our on-prem mail gateway. This quote from Microsoft explains the whole problem:

Starting on November 15, 2017, outbound email messages that are sent directly to external domains (such as outlook.com and gmail.com) from a virtual machine (VM) are made available only to certain subscription types in Microsoft Azure. Outbound SMTP connections that use TCP port 25 were blocked. (Port 25 is primarily used for unauthenticated email delivery.)

This change in behavior applies only to new subscriptions and new deployments since November 15, 2017.

Source: Microsoft

This is the case for MSDN, Azure Pass, Azure in Open, Education, BizSpark, and Free Trial subscriptions!

If you created an MSDN, Azure Pass, Azure in Open, Education, BizSpark, Azure Sponsorship, Azure Student, Free Trial, or any Visual Studio subscription after November 15, 2017, you’ll have technical restrictions that block email that’s sent from VMs within these subscriptions directly to email providers. The restrictions are done to prevent abuse. No requests to remove this restriction will be granted.

If you’re using these subscription types, you’re encouraged to use SMTP relay services, as outlined earlier in this article or change your subscription type.

Source: Microsoft

We accelerated our migration and disabled the central mail transport earlier than planned. Then we configured our Linux application server to authenticate against Exchange Online using SMTP Auth and SMTP Submission (587/tcp). For incoming mails, the mails are routed to the application server using a Exchange Online connector and a transport rule which matches to specific mail addresses.

The Azure based Exchange VM is only needed because we still have an Azure AD Connect running. Microsoft has planned to replace this by a new solution. And until this, we will run this Exchange 2016 in Azure. But it is not part of our mail flow.

Moving Azure AD Connect & decommissioning ADFS

Because we had to get rid of the ADFS server and ADFS Proxy, we deployed Pass-Through Authentication and Seamless SSO. Then we decommissioned the ADFS setup.

Moving Azure AD Connect was a bit quirky. We had conditional access already in place and the Azure AD Connect setup was unable to handle this. The synchronisation account was unable to sync, because it ran into a MFA request. We optimized our policies and got this sorted out.

Decommissioning old stuff

Whenever we moved a service successful to Azure, we switched off the on-prem server, and modified our documentation to reflect the made changes. At the end, we were able to switch off three of our four ESXi hosts. A last ESXi Host is still running for our Horizon View deployment and our firewall.

Next steps

The next post will cover how we automated this, how we do backups and whatever you’re interested in. Leave a comment! :)

Exchange Control Panel /ecp broken after certificate replacement

This posting is ~1 year years old. You should keep this in mind. IT is a short living business. This information might be outdated.

As part of an ongoing Exchange 2010 to 2016 migration, I had to replace the self-signed certificate with a certificate from the customers PKI. Everything went fine, the customer had a suitable template, we’ve added the necessary hostnames and bound IIS and SMTP to the certificate. The mess started with an iisreset /noforce

Bild von Oskars Zvejs auf Pixabay 

The iisreset took longer than expected. After that, I tried to login into the ECP, entered username and password and got an error.

<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
<System>
  <Provider Name="MSExchange Front End HTTP Proxy" />
  <EventID Qualifiers="49152">1003</EventID>
  <Level>2</Level>
  <Task>1</Task>
  <Keywords>0x80000000000000</Keywords>
  <TimeCreated SystemTime="2020-10-22T12:16:38.934123400Z" />
  <EventRecordID>368718</EventRecordID>
  <Channel>Application</Channel>
  <Computer>server.domain.tld</Computer>
  <Security />
</System>
<EventData>
  <Data>Owa</Data>
  <Data>System.NullReferenceException: Object reference not set to an instance of an object. at 
Microsoft.Exchange.HttpProxy.FbaModule.ParseCadataCookies(HttpApplication httpApplication) at 
Microsoft.Exchange.HttpProxy.FbaModule.OnBeginRequestInternal(HttpApplication httpApplication) at 
Microsoft.Exchange.HttpProxy.ProxyModule.<>c__DisplayClass16_0.<OnBeginRequest>b__0() at 
Microsoft.Exchange.Common.IL.ILUtil.DoTryFilterCatch(Action tryDelegate, Func`2 filterDelegate, Action`1 catchDelegate)
</Data>
</EventData>
</Event>

Pretty strange. We switched back to the self-singned certificate, did an iisreset and everyting was fine again.So it was pretty obvious that the error was related to the certificate, or to be more clear, to the certificate template.

A short research confirmed this. The template was a modified v3 web server template from an Enterprise CA running Windows Server 2008 R2.

With Windows Server 2008, Microsoft introduced a new cryptographic API called Cryptography Next Generation (CNG), which separates cryptographic providers (algorithm implementation) from key storage providers (create, delete, export, import, open and store keys). The older CryptoAPI does not differ between this and implements cryptographic algorithms and key storage.

The modified template used CNG instead of CryptoAPI. We noticed this when we checked the certificate with certutil -store my <thumbprint>.

If the listed provider for the certificate is Microsoft Software Key Storage Provider, then you will have to re-import the certificate. If Microsoft RSA SChannel Cryptographic Provider is used, everything is fine.

You have to remove the certificate, then re-import it using

certutil -csp "Microsoft RSA SChannel Cryptographic Provider" -importpfx <CertificateFilename>

You need a PKCS#12 file (PFX) and the password. Re-import it and then you can use the certificate for Exchange. Bind services to it and restart the IIS.