Category Archives: Server

Notes for a 2-Tier Microsoft Windows PKI

Implementing a public key infrastructure (PKI) is a recurring task for me. More and more customers tend to implement a PKI in their environment. Mostly not to increase security, rather then to get rid of browser warnings because of self-signed certificates, to secure intra-org email communication with S/MIME, or to sign Microsoft Office macros.

tumbledore / pixabay.com/ Pixybay License

What is a 2-tier PKI?

Why is a multi-tier PKI hierarchy a good idea? Such a hierarchy typically consits of a root Certificate Authority (CA), and an issuing CA. Sometimes you see a 3-tier hierarchy, in which a root CA, a sub CA and an issuing CA are tied together in a chain of trust.

A root CA issues, stores and signs the digital certificates for sub CA. A sub CA issues, stores and signs the digital certificates for issuing CA. Only an issuing CA issues, stores and signs the digital certificates for users and devices.

In a 2-tier hierarchy, a root CA issues the certificate for an issuing CA.

In case of security breach, in which the issuing CA might become compromised, only the CA certificate for the issuing CA needs to be revoked. But what of the root CA becomes compromised? Because of this, a root CA is typically installed on a secured, and powered-off (offline) VM or computer. It will only be powered-on to publish new Certificate Revocation Lists (CRL), or to sign/ renew a new sub or issuing CA certificate.

Lessons learned

Think about the processes! Creating a PKI is more than provisioning a couple of VMs. You need to think about processes to

  • request
  • sign, and
  • revoke

Be aware of what a digital certificate is. You, or your CA, confirms the identity of a party by handing out a digital certificate. Make sure that no one can issue certificates without a proof of his identity.

Think about lifetimes of certificates! Customers tend to create root CA certificates with lifetimes of 10, 20 or even 40 years. Think about the typical lifetime of a VM or server, which is necessary to run an offline root CA. Typically the server OS has a lifetime of 10 to 12 years. This should determine the lifetime of a root CA certificate. IMHO 10 years is a good compromise.

For a sub or issuing CA, a lifespan of 5 years is a good compromise. Using the same lifetime as for a root CA is not a good idea, because an issued certificate can’t be longer valid than the lifetime of the CA certificate of the issuing CA.

A lifespan of 1 to 3 years for thinks like computer or web server certificates is okay. If a certificate is used for S/MIME or code signing, you should go for a lifetime of 1 year.

But to be honest: At the end of the day, YOU decide how long your certificates will be valid.

Publish CRLs and make them accessable! You can’t know if a certificate is revoked by a CA. But you can use a CRL to check if a certificate is revoked. Because of this, the CA must publish CRLs regulary. Use split DNS to use the same URL for internal and external requests. Make sure that the CRL is available for external users.

This applies not only to certificates for users or computers, but also for sub and issuing CAs. So there must be a CRL from each of your CAs!

I recommend to publish CRLs to a webserver and make this webserver reachable over HTTP. An issued certificate includes the URL or path to the CRL of the CA, that has issued the certificate.

Make sure that the CRL has a meaningful validity period. Of an offline root CA, which issues only a few certificates of its lifetime, this can be 1 year or more. For an issuing CA, the validity period should only a few days.

Publish AIA (Authority Information Access) information and make them accessable! AIA is an certificate extension that is used to offer two types of information :

  • How to get the certificate of the issuing or upper CAs, and
  • who is the OCSP responder from where revocation of this certificate can be checked

I tend to use the same place for the AIA as for the CDP. Make sure that you configure the AIA extension before you issue the first certificates, especially configure the AIA and CDP extension before you issue intermediate and issuing CA certificates.

Use a secure hash algorithm and key length! Please stop using SHA1! I recommend at least SHA256 and 4096 bit key length. Depending on the used CPUs, SHA512 can be faster than SHA256.

Create a CApolicy.inf! The CApolicy.inf is located uder C:\Windows and will will be used during the creation of the CA certificate. I often use this CApolicy.inf files.

For the root CA:

For the issuing CA:

Final words

I do not claim that this is blog post covers all necessary aspects of such an complex thing like an PKI. But I hope that I have mentioned some of the important parts. And at least: I have a reference from which I can copy and paste the CApolicy.inf files. :D

Out-of-Office replies are dropped due to empty MAIL FROM

Today I had an interesting support call. A customer noticed that Out-of-Office replies were not received by recipients, even though the OoO option were enabled for internal and external recipients. Internal recipients got the OoO reply, but none of the external recipients.

cattu/ pixabay.com/ Creative Commons CC0

The Message Tracking Log is a good point to start. I quickly discovered that the Exchange server was unable to send the OoO mails. You can use the eventid FAIL to get a list of all failed messages.

Very interesting was the RecipientStatus of a failed mail.

550 Requested action not taken: mailbox unavailable is a pretty interesting error when sending mails over a mail relay of your ISP. Especially when other mails were successfully sent over the same mail relay.

Next stop: Protcol log of the send connector

I enabled the logging on the send connector using the EAC. This option is disabled by default. Depending on the amount of mails sent over the connector, you should make sure to disable the logging after your troubleshooting session. To enable the logging, follow these steps:

  • Open the EAC and navigate to
  • Mail flow > Send connectors
  • Select the connector you want to configure, and then click Edit
  • On the General tab in the Protocol logging level section, select the Verbose option
  • When you’re finished, click Save

The protocol log can be found under %ExchangeInstallPath%TransportRoles\Logs\Hub\ProtocolLog\SmtpSend.

After enabling the logging and another test mail, the log contained the necessary details to find the root cause. This is the interesting part of the SMTP communication:

The error occured right after the exchange server issued MAIL FROM:<> . But why is the MAIL FROM empty?

RFC 2298 is the key

An Out-of-Office reply is a Delivery Status Notification message. And RFC 2298 clearly states:

The envelope sender address (i.e., SMTP MAIL FROM) of the MDN MUST be
null (<>), specifying that no Delivery Status Notification messages
or other messages indicating successful or unsuccessful delivery are
to be sent in response to an MDN.

So the empty MAIL FROM is something that a mail relay should expect. In case of my customer that mail relay seems to act different. Maybe some kind of spam protection.

Database Availability Group (DAG) witness is in a failed state

As part of a maintenance job I had to update a 2-node Exchange Database Availability Group and a file-share witness server.

After the installation of Windows updates on the witness server and the obligatory reboot, the witness left in a failed state.

In my opinion, the re-creation of the witness server and the witness directory cannot be the correct way to solve this. There must be another way to solve this. In addition to this: The server was not dead. Only a reboot occured.

Check the basics

Both DAG nodes were online and working. A good starting point is a check of the cluster resources using the PowerShell.

In my case the cluster resource for the File Share Witness was in a failed state. A simple Start-ClusterResource solved my issue immediately.

In this case, it seems that the the cluster has marked the file share witness as unreliable, thus the resource was not started after the file share witness was back online again. In this case, I managed it to manually bring it back online by running Start-ClusterResource on one of the DAG members.

Using Let’s Encrypt DNS-01 challenge validation with local BIND instance

This posting is ~1 year years old. You should keep this in mind. IT is a short living business. This information might be outdated.

I’m using Let’s Encrypt certificates for a while now. In the past, I used the standalone plugin (TLS-SNI-01) to get or renew my certificates. But now I switched to the DNS plugin. I run my own name servers with BIND, so it was a very low hanging fruit to get this plugin to work.

Clker-Free-Vector-Images/ pixabay.com/ Creative Commons CC0

To get or renew a certificate, you need to provide some kind of proof that you are requesting the certificate for a domain that is under your control. No certificate authority (CA) wants to be the CA, that hands you out a certificate for google.com or amazon.com…

The DNS-01 challenge uses TXT records in order to validate your ownership over a certain domain. During the challenge, the Automatic Certificate Management Environment (ACME) server of Let’s Encrypt will give you a value that uniquely identifies the challenge. This value has to be added with a TXT record to the zone of the domain for which you are requesting a certificate. The record will look like this:

This record is for a wildcard certificate. If you want to get a certificate for a host, you can add one or more TXT records like this:

There is a IETF draft about the ACME protocol. Pretty interesting read!

Configure BIND for DNS-01 challenges

I run my own name servers with BIND on FreeBSD. The plugin for certbot automates the whole DNS-01 challenge process by creating, and subsequently removing, the necessary TXT records from the zone file using RFC 2136 dynamic updates.

First of all, we need a new TSIG (Transaction SIGnature) key. This key is used to authorize the updates.

This key has to be added to the named.conf. The key is in the .key file.

The key is used to authroize the update of certain records. To allow the update of TXT records, which are needed for the challenge, add this to the zone part of you named.con.

The records start always with _acme-challenge.domainname.

Now you need to create a config file for the RFC2136 plugin. This file also includes the key, but also the IP of the name server. If the name server is running on the same server as the DNS-01 challenge, you can use 127.0.0.1 as name server address.

Now we have everything in place. This is a --dry-run from on of my FreeBSD machines.

This is a snippet from the name server log file at the time of the challenge.

You might need to modify the permissons for the directory which contains the zone files. Usually the name server is not running as root. In my case, I had to grant write permissions for the “bind” group. Otherwise you might get “permission denied”.

 

Powering on a VM with shared VMDK fails after extending a EagerZeroedThick VMDK

This posting is ~1 year years old. You should keep this in mind. IT is a short living business. This information might be outdated.

I hope that you are not reading this blog post while searching for a solution for a failed cluster. If so, feel free to leave a comment if this blog post saved your evening or weekend. :)

Last friday, a change at one of my customers went horribly wrong. I was not onsite, but they contacted me during the night from friday to saturday, because their most important Windows Server Failover Cluster was unable to start after extending a shared VMDK.

cripi/ pixabay.com/ Creative Commons CC0

They tried something pretty simple: Extending an virtual disk of a VM. That is something most of us doing pretty often. The customer did this also pretty often. It was a well known task… Except the fact, that the VM was part of a Windows Server Failover Cluster. With shared VMDKs. And the disks were EagerZeroedThick, because this is a requirement for shared VMDKs.

They extended the disk using the vSphere Web Client. And at this point, the change was doomed to fail. They tried to power-on the VMs, but all they got was this error:

VMware ESX cannot open the virtual disk, “/vmfs/volumes/4c549ecd-66066010-e610-002354a2261b/VMNAME/VMDKNAME.vmdk” for clustering. Please verify that the virtual disk was created using the ‘thick’ option.

A shared VMDK is a VMDK in multiwriter mode. This VMDK has to be created as Thick Provision Eager Zeroed. And if you wish to extend this VMDK, you must use vmkfstools with the option -d eagerzeroedthick. If you extend the VMDK using the Web Client, the extended portion of the disk will become LazyZeroed!

VMware has described this behaviour in the KB1033570 (Powering on the virtual machine fails with the error: Thin/TBZ disks cannot be opened in multiwriter mode). There is also a blog post by Cormac Hogan at VMware, who has described this behaviour.

That’s a screenshot from the failed cluster. Check out the type of the disk (Thick-Provision Lazy-Zeroed).

Patrick Terlisten/ vcloudnine.de/ Creative Commons CC0

You must use vmkfstools to extend a shared VMDK – but vmkfstools is also the solution, if you have trapped into this pitfall. Clone the VMDK with option -d eagerzeroedthick.

Another solution, which was new to me, is to use Storage vMotion. You can migrate the “broken” VMDK to another datastore and change the the disk format during Storage vMotion. This solution is described in the “Notes” section of KB1033570.

Both ways will fix the problem. The result will be a Thick Provision Eager Zeroed VMDK, which will allow the VMs to be successfully powered on.

CloudFlare API v4 and Fail2ban: Fixing the unban action

This posting is ~1 year years old. You should keep this in mind. IT is a short living business. This information might be outdated.

In January 2017, I wrote an article about how to protect your WordPress blog using the WP Fail2Ban plugin, fail2ban on your Linux/ FreeBSD host, and CloudFlare. Back then, the fail2ban was using the CloudFlare API V1, which was already deprecated since November 2016.

Free-Photos/ pixabay.com/ Creative Commons CC0

Although the actions were updated later to use CloudFlare API V4, I still had problems with the unbaning of IP addresses. IP addresses were banned, but the unban action failed. 

This is the unban action, which is included in fail2ban (taken from fail2ban-0.10.3.1 which is shipped with FreeBSD 11.1-RELEASE-p10):

And this is the unban action, which finally solved this issue:

I found the solution at serverfault.com. The only difference is an additional tr -d '\n' in the last line of the statement. Kudos to Jake for fixing this!

To prevent the action file to being overwritten, you should copy the original cloudflare.conf located in the action.d directory, e.g. to mycloudflare.conf , and use the copied action file in your fail definition.

Windows Network Policy Server (NPS) server won’t log failed login attempts

This posting is ~1 year years old. You should keep this in mind. IT is a short living business. This information might be outdated.

This is just a short, but interesting blog post. When you have to troubleshoot authentication failures in a network that uses Windows Network Policy Server (NPS), the Windows event log is absolutely indispensable. The event log offers everything you need. The success and failure event log entries include all necessary information to get you back on track. If failure events would be logged…

geralt/ pixabay.com/ Creative Commons CC0

Today, I was playing with Alcatel-Lucent Enterprise OmniSwitches and Access Guardian in my lab. Access Guardian refers to the some OmniSwitch security functions that work together to provide a dynamic, proactive network security solution:

  • Universal Network Profile (UNP)
  • Authentication, Authorization, and Accounting (AAA)
  • Bring Your Own Device (BYOD)
  • Captive Portal
  • Quarantine Manager and Remediation (QMR)

I have planned to publish some blog posts about Access Guardian in the future, because it is a pretty interesting topic. So stay tuned. :)

802.1x was no big deal, mac-based authentication failed. Okay, let’s take a look into the event log of the NPS… okay, there are the success events for my 802.1x authentication… but where are the failed login attempts? Not a single one was logged. A short Google search showed me the right direction.

Failed logon/ logoff events were not logged

In this case, the NPS role was installed on a Windows Server 2016 domain controller. And it was a german installation, so the output of the commands is also in german. If you have an OS installed in english, you must replace “Netzwerkrichtlinienserver” with “Network Policy Server”.

Right-click the PowerShell Icon and open it as Administrator. Check the current settings:

As you can see, only successful logon and logoff events were logged.

The option /success:enable /failure:enable activeates the logging of successful and failed logon and logoff attempts.

Single Sign On (SSO) with RemoteApps on Windows Server 2012 (R2)

This posting is ~3 years years old. You should keep this in mind. IT is a short living business. This information might be outdated.

A RemoteApp is an application, that is running on a Remote Desktop Session Host (RDSH), and only the display output is sent to the client. Because the application is running on a RDSH, you can easily deliver applications to end users. Another benefit is, that data is not leaving the datacenter. Software and data are kept inside the datacenter. RemoteApps can be used and deployed in various ways:

  • Users can start RemoteApps through the Remote Desktop Web Access
  • Users can start RemoteApps using a special RDP file
  • Users can simply start a link on the desktop or from the start menu (RemoteApps and Desktop connections deployed by an MSI or a GPO)
  • or they can click on a file that is associated with a RemoteApp

Even in times of VDI (LOL…), RemoteApps can be quite handy. You can deploy virtual desktops without any installed applications. Application can then delivered using RemoteAPps. This can be handy, if you migrate from RDSH/ Citrix published desktops to  VMware Horizon View. Or if you are already using RDSH, and you want to try VMware Horizon View.

But three things can really spoil the usage of RemoteApps:

  • certificate warnings
  • warnings about an untrusted publisher
  • asking for credentials (no Single Sign On)

Avoid certificate warnings

As part of the RDS reployment, the assistant kindly asks for certificates. Sure, you can deploy self signed certificates, but that’s not a good idea. You should deploy certificates from your internal certificate authority.

RDS Certificate Settings

Patrick Terlisten/ www.vcloudnine.de/ Creative Commons CC0

This is a screenshot from my tiny single server RDS farm. Make sure that you use the correct names for the certificates! If you are using a RDS farm, make sure that you include the DNS name of the RD Connection Broker HA cluster.

If you want to make the RD Web Access publicly available, make sure that you include the public DNS name into the certificate.

Untrusted Publisher

When you try to open a RemoteApp, you might get this message:

RDS Connection Warning

Patrick Terlisten/ www.vcloudnine.de/ Creative Commons CC0

Annoying, isn’t it? But easy to fix. Remember the certificates you deployed during the RDS deployment? You need the certificate thumbprint of the publisher certificate (check the screenshot from the deployment properties > “RD Connection Broker – Publishing”). This is a screenshot from my lab:

RDS Certificate Thumbprint

Patrick Terlisten/ www.vcloudnine.de/ Creative Commons CC0

Take this thumbprint, open a PowerShell windows and convert the thumbprint into a format, that can be used with the GPO we have to build.

The result is a string without spaces and only with uppercase letters. Now we need to create a GPO. This GPO has to be linked to the OU in which the computers or users reside, that should use the RemoteApp. In my example, I use the user part of a GPO. So this GPO has to be linked to the OU, in which the users reside. The necessary GPO setting can be found here:

User Configuration > Policies >Administrative Templates > Windows Components > Remote Desktop Services > Remote Desktop Connection Client > Specify SHA1 thumbprints of certificates representing trusted .rdp publishers

RemoteApp GPO Settings SHA1 Thumbprint

Patrick Terlisten/ www.vcloudnine.de/ Creative Commons CC0

I use the same GPO to publish the default connection URL. With this setting configured, the users automatically get the published RemoteApps to their start menu. You can find the setting here:

User Configuration > Policies >Administrative Templates > Windows Components > Remote Desktop Services > RemoteAppe and Desktop Connections > Specify default connection URL

RemoteApp Connection URL

Patrick Terlisten/ www.vcloudnine.de/ Creative Commons CC0

Credentials dialog

At this point, you will still get a “Asking for credentials” dialog. To allow the client to pass the current user login information to the RDS host, we need to configure an additional setting. Create a new GPO and link this GPO to the OU, in which the computers reside, on which the RemoteApps should be used. The setting can be found here:

Computer Configuration > Policies >Administrative Templates > System > Credentials Delegation > Allow delegating default credentials

RemoteApp Credential Delegation

Patrick Terlisten/ www.vcloudnine.de/ Creative Commons CC0

You have to add the FQDN of your RD Connection Broker server or farm. Please make sure that you add the “TERMSRV” prefix! Because I use a single server deployment, my RD Connection Broker is also my RDS host.

The final test

Make sure that all group policies were applied. Open the Remote Desktop Connection Client and enter the RDS farm name. If everything is configured properly, you should connected without asked for credentials.

RDS Connection

Patrick Terlisten/ www.vcloudnine.de/ Creative Commons CC0

The same should happen, if you try to start a RemoteApp.

RDS Connection Notepad++

Patrick Terlisten/ www.vcloudnine.de/ Creative Commons CC0

Troubleshooting

If you are still getting asked for credentials, something  is wrong with the credentials delegation. Check the GPO and if it is linked to the correct OU. If you are getting certificate warnings, check the names that you have included in the certificates. Warnings about untrusted publishers may be caused by a wrong SHA1 thumbprint (or wrong format).

How to monitor ESXi host hardware with SNMP

This posting is ~3 years years old. You should keep this in mind. IT is a short living business. This information might be outdated.

The Simple Network Management Protocol (SNMP) is a protocol for monitoring and configuration of network-attached devices. SNMP exposes data in the form of variables and values. These variables can then be queried or set. A query retrieves the value of a variable, a set operation assigns a value to a variable. The variables are organized in a hierarchy and each variable is identified by an object identifiers (OID). The management information base (MIB ) describes this hierarchy. MIB files (simple text files) contain metadata for each OID. These are necessary for the translation of a numeric OID into a human-readable format.  SNMP knows two devices types:

  • the managed device which runs the SNMP agent
  • the network management station (NMS) which runs the management software

The NMS queries the SNMP agent with GET requests. Configuration changes are made using SET requests. The SNMP agent can inform the NMS about state changes using a SNMP trap message. The easiest way for authentication is the SNMP community string.

SNMP is pretty handy and it’s still used, especially for monitoring and managing networking components. SNMP has the benefit, that it’s very lightweight. Monitoring a system with WBEM or using an API can cause slightly more load, compared to SNMP. Furthermore, SNMP is a internet-protocol standard. Nearly every device supports SNMP.

Monitoring host hardware with SNMP

Why should I monitor my ESXi host hardware with SNMP? The vCenter Server can trigger an alarm and most customers use applications like VMware vRealize Operations, Microsoft System Center Operations Manager, or HPE Systems Insight Manager (SIM). There are better ways to monitor the overall health of an ESXi host. But sometimes you want to get some stats about the network interfaces (throughput), or you have a script that should do something, if a NIC goes down or something else happens. Again, SNMP is very resource-friendly and widely supported.

Configure SNMP on ESXi

I focus on ESXi 5.1 and beyond. The ESXi host is called “the SNMP Agent”. We don’t configure traps or trap destinations. We just want to poll the SNMP agent using SNMP GET requests. The configuration is done using esxcli . First of all, we need to set a community string and enable SNMP.

That’s it! The necessary firewall ports and services are opened and started automatically.

Querying the SNMP agent

I use a CentOS VM to show you some queries. The Net-SNMP package contains the tools snmpwalk and snmpget. To install the Net-SNMP utils, simply use yum .

Download the VMware SNMP MIB files, extract the ZIP file, and copy the content to to /usr/share/snmp/mibs.

Now we can use snmpwalk to “walk down the hierarchy “. This is only a small part of the complete output. The complete snmpwalk output has more than 4000 lines!

Now we can search for interesting parts. If you want to monitor the link status of the NICs, try this:

As you can see, I used a subtree of the whole hierarchy (IF-MIB::ifDescr). This is the “translated” OID. To get the numeric OID, you have to add the option -O fn to snmpwalk .

You can use snmptranslate to translate an OID.

So far, we have only the description of the interfaces. With a little searching, we find the status of the interfaces (I stripped the output).

ifOperStatus.1 corresponds with ifDescr.1 , ifOperStatus.2 corresponds with ifDescr.2 and so on. The ifOperStatus corresponds  with the status of the NICs in the vSphere Web Client.

nic_status_web_client

If you want to monitor the fans or power supplies, use these these OIDs.

Many possibilities

SNMP offers a simple and lightweight way to monitor a managed device. It’s not a replacement for vCenter, vROps or SCOM. But it can be an addition, especially because SNMP is an internet-protocol standard.

How to dramatically improve website load times

This posting is ~3 years years old. You should keep this in mind. IT is a short living business. This information might be outdated.

Over the last weeks, I’ve tried to improve the performance of my blog. The side was very slow and the page load times varied between 5 and 10 seconds. Much too long! I’ve reduced time consuming plugins, checked the size of pictures, checked CSS and HTML for misconfiguration/ slow clode and tuned the database. The page load times have not really improved.

Yesterday, I checked the httpd.conf on my webserver and found a little typo (accidentally commented line). After a restart of the Apache webserver, the page load times have dramatically improved (down to 2 – 3 seconds). What had happened?

HTTP keep-alive

HTTP keep-alive, sometimes also called “HTTP persistent connection”, was designed to transfer multiple HTTP requests and responses over a single TCP connection. This is much better as opening a new connection for every single request/ response pair. The benefits of HTTP keep-alive are:

  • lower CPU usage
  • lower memory usage
  • reduced latency due to reduced requests/ handshaking

These benefits are even more important, if you use HTTPS connections (and vcloudnine.de is HTTPS-only…), because each new HTTP connection needs much more CPU time and round-trips compared to an unsecure HTTP connection. This little picture clarifies the differences.

HTTP_persistent_connection

Wikipedia/ wikipedia.org/ Public domain image resources

If you’re using Apache, you can enable HTTP keep-alive with a single line in the httpd.conf.

Further information can be found in the documentation of Apache (Apache webserver 2.2 and 2.4).