Tag Archives: vExpert

Escaping special characters in proxy auth passwords in vCenter

EDIT: It seems that his was fixed in vCenter 7.0 U3.

While debugging a vCener Lifecycle Manager, which was unable to download updates, I’ve stumbled over a weird behaviour, which is (IMHO) by design.

Some of you might use a proxy server. And some of you might use a proxy server which requires credentials. In my case, my customer uses a Sophos SG appliance as a web proxy server with authentication. The customer creaded a user with a complex password. But I was unable to get a working internet connection.

Image by Ed Webster from Pixabay 

I played a bit with curl on the bash of the vCenter. The proxy settings are stored under /etc/sysconfig/proxy. These settings are used to populate the http_proxy and https_proxy environment variable. It’s important to know, that the credentials stored in the /etc/sysconfig/proxy are encoded with the percent-encoding, also known as URL encoding. So someone with root access can grab credentials from these file.

But then I noticed something weird. I set the http_proxy variable manually with

http://username:password@proxy.domain.tld:8080

and I got this error:

-bash: !": event not found

Okay… there was a ! in the password and the BASH tried to execute the part behind the !. But it was part of the password, so I had to tell the BASH that it has to take this literally.

I escaped the ! in the password with a \. And to my surprise: The vCenter was able to download updates. I decoded the percent-encoded string in the /etc/sysconfig/poxy and found the escaped ! (\!). For example. Instead of Passw0rd! I had to enter Passw0rd\! in the password field.

Long story short: Use a password without special characters, otherwise escape them, because the password is stored in BASH variables.

Configure VMware Horizon View client device certificate authentication

Adding a second factor to your authentication is always a good idea. Typically the second factor is a One-Time Password (OTP) or a push notification. But what if you want to allow the login into your Horizon View environment only from specific devices? This implies that you need some kind of second factore that also identifies the device. At this point the arch enemy of many of us comes into play: Certificates!

To be honest: It is not so hard to get client device certificate authentication to work. All you need is:

  • Unified Access Gateway 2.6 or later
  • Horizon 7 version 7.5 or later
  • A certificate installed on the client device that Unified Access Gateway accepts

Configure X.509 authentication settings

The first step is to configure the UAG to accept a device certificate. To do so, log into the UAG admin interface, expand the authentication settings and open the X.509 settings.

You need to upload the Root CA certificate, which is used to sign the device certificates, as a Base64 coded file. I always recommend to enable “Cert Revocation”. You can enable “Use CRL from Certificates”, if the certificates include the URL to the CRL. Otherwise you can add the CRL location. This location must be accessable for the UAG! Click “Save” and you are ready to configure the Horizon settings.

Configure Horizon settings to use X.509 authentication

After you have configured the X.509 authentication, you have to enable the device certificate authentication for Horizon View. Expand “Horizon Settings” and enter the configuration settings.

It is important to select “Device X.509 Certificate AND Passthrough”.

Save the settings and you are ready to go. At this point a user must use a device with a valid device certificate.

Device Certificate

It is important to know that you have to create a new certificate template. The computer certificate template, which is included in a standard Microsoft PKI, cannot be used! It is mandatory to use the “Microsoft Enhanced RSA and AES Cryptographic Provider” in the template. It only works with this Cryptographic Service Provider (CSP)!

The easiest way is to duplicate the “Computer” template and change the necessary settings. First of all: The CSP must changed to “Microsoft Enhanced RSA and AES Cryptographic Provider” and it must be the only provider.

The subject name of the certificate should automatically be populated with information from the Active Directory, in this case the computer name.

Because the certificate is only for authentication purposes, you should remove “Server Authentication” from the Application Polices. Otherwise this certificate could be used to run a webserver.

Depending on your policies, you should mark the private key as “not exportable”!

The last step is important. After you enrolled the certificate to your computer, you need to add permissions to the user that should be able to use the certificate for authentication! This is necessary because it is a device certificate, and only SYSTEM and the local administrators group has permissions to access the private key of the certificate.

That’s it. If you open the View Client and try to connect to your View environment, then you should get a certificate selection dialog. After chosing the correct certificate, you need to enter user credentials.

Only with a valid certificate and valid credentials a connection to your View environment is possible.

VMware vCenter 7.0 U2 deployment fails at stage 2

Today I had to deploy a new vCenter appliance. Nothing fancy, new deployment. Stage 1 was easy, but stage 2 failed several times. I re-deployed the vCenter appliance two times, but as the deployment failed for the third time, I took a look into the logs.

The deployment failed without any error, but it didn’t finished. It stopped during the start of different services without any error.

First of all: Log into the appliance using SSH or the console. Use the root account and the root password you have entered during the setup.

A good point to start are the logs under /var/log/firstboot. I used ls -lt to get the last written logs. Most services will write two logs: One log ends with _stdout.log, and the second one will end with _stderr.log. The _stdout.log contails the log messages of the service. The _stderr.log contains the errors. I searched for a service that has written to a _stderr.log – and I found it: scafirstboot.py_10507_stderr.log.

And this log gave me a hint what the root cause was. One of the last log entries was:

ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate is not yet valid

What what? A certificate not only has an end date, but also a date before which it is not valid – a start date. And this is often indicates a problem with – NTP. And it was NTP. I have configured NTP for the vCenter, but not for the ESXi on which I deployed the vCenter. -.- If it is not DNS, it’s NTP. Or a invalid certificate. Or both.

VCAP-DCV Design 2021 – Objective 1.1 Gather and analyze business requirements

This blog post covers objective 1.1 (Gather and analyze business requirements) of the VCAP-DCV Design 2021 exam. It is based on the VMware Certified Advanced Professional 6.5 in Data Center Virtualization Design (3V0-624) Exam Preparation Guide (last update December 2019).

When you get the task to design something , you will instinctively start gathering information about the requirements that have to be fulfilled. Everything IT is doing should support the business in some way.

The necessary skills and abilities are documented in the exam prep guide for the older VCAP6-DCV Design exam (3V0-622). I think they also apply to the current version of the exam:

  • Associate a stakeholder with the information that needs to be collected
  • Utilize inventory and assessment data from a current environment to define a baseline state
  • Analyze customer interview data to explicitly define customer objectives for a conceptual design
  • Determine customer priorities for defined objectives
  • Ensure that Availability, Manageability, Performance, Recoverability and Security (AMPRS) considerations are applied during the requirements gathering process
  • Given results of the requirements gathering process, identify requirements for a conceptual design
  • Categorize requirements by infrastructure qualities to prepare for logical design requirements

Associate a stakeholder with the information that needs to be collected

Let’s start with the stakeholders and why they are important for us. But what is a stakeholder? A stakeholder is a person with an interest or concern in something, especially a business (Oxford). Stakeholders can be internal or external parties. An internal stakeholder is someone with a direct relationship to the company. An external stakeholder has no direct connection to the company, but it is affected in some way. This can be suppliers, the government, or other groups. A stakeholder can be anyone, but in our context stakeholders are typically

  • C-Level Executives (CEO, CFO, CIO etc.)
  • Vice Presidents
  • Managers, but also
  • Engineers and end users

As always: It depends. :)

Utilize inventory and assessment data from a current environment to define a baseline state

We also need to understand the current environment and what is currently deployed at the company. Interviews with the stakeholders are important, but in most cases they will not answer all questions. Depending on what is currently deployed, different tools can be used to gain the necessary data. Some examples:

  • RVTools, PowerCLI, vSphere Web Client, vROps etc
  • Custom scripts
  • Windows Server Manager
  • Network Monitoring Tools, like HPE Intelligent Management Center
  • Asset Management

It is important to document the results of the assessment. This is the baseline state of the current environment.

Analyze customer interview data to explicitly define customer objectives for a conceptual design

Now we need to get back to the results of the interviews that we did with the stakeholders to define the goals and the scope of the design. We also need to understant the the

  • Constraints
  • Assumptions,
  • Requirements, and
  • Risks

When we talk about requirements, we have to differ between functional (WHAT) and non-functional (HOW) requirements.

These information will allow us to create a conceptual design, which is written down in a workbook document.

Determine customer priorities for defined objectives

The next step is to define the priorities over the defined objectives. It is important to weight e.g. requirements and risks. Milestones have to be defined. They will help us to measure the success of the project and keep it on track.

Ensure that AMPRS considerations are applied during the requirements gathering process

AMPRS stands for

  • Availability
  • Manageability
  • Performance
  • Recoverability, and
  • Security

It is important to understand the meaning of each of these terms.

Availability considerations address the availability requirements of our design. These are typically expressed by percent uptime of a specific system. For example: 99,5% availability for file services.

Manageability considerations address the management and operational requirements of our design. This can be alerting, reports, access concepts etc.

Performance considerations express the required performance characteristics of the design. For example: Mails per second by a given size.

Recoverability considerations cover the ability to recover from an unexpected incident or disaster. This topic typically addresses backup and recovery of our design.

Security considerations cover the requirements around data control, access management, governance, risk management etc.

Given results of the requirements gathering process, identify requirements for a conceptual design

Now we have collected information from the relevant stakeholders, including the goals, scope, and CARR (constraints, assumptions, requirements, risks), and we have collected details about the current environment. Now it is time to put these information together and create a conceptual design.

The conceptual design must be approved by the stakeholders. This assures that everything is covered. Creating a conceptual design is an iterative process. The conceptual design is finished when the relevant stakeholders have approved it.

Categorize requirements by infrastructure qualities to prepare for logical design requirements

Sounds simple, but it can be challenging: The documented requirements have to be grouped by infrastructure categories, eg.

  • Networking
  • Storage
  • Recovery
  • Compute
  • VM
  • Security

Based on the CARR and the AMPRS considerations, we made design decisions. These decisions affect each of the infrastructure categories. At this point, we can review each of our decisions and mapping the requirements to the infrastructure will ease the creation of a high-level logical design.

Summary

Let me try to simplify this complex process a bit.

We were asked to solve a problem for a company. To solve this problem, we have to design a solution. To create this design, we have to identify the relevant stakeholders. These stakeholders will help us to gather information about the goals, the scope, about constraints, assumptions, requirements and risks. Especially when it comes to the requirements, we have to take availability, manageability, performance, recoverability and security considerations into account.

We can use different tools to collect information about the current environment.

At this point we know WHAT the company want, and we know WHAT they are currently running.

Now we can start with the creation of a conceptual design, which has to be approved by the relevant stakeholders.

To prepare the logical design, we need to map the documented requirements to the different categories of the infrastructure.

Links

VMware Certified Advanced Professional 6.5 – Data Center Virtualization Design Exam (VCAP-DCV Design 2021)

In August 2018 I’ve passed the VCAP6-DCV Deployment exam. After a busy first half of 2019 it’s time to start preparing the VMware Certified Advanced Professional — Data Center Virtualization Design 2019 exam. But I lost focus and in 2020 I had a lot to do – but not VMW related and so I also missed my goal to take the VCAP-DCV Design exam.

I have to push myself, so I decided to re-cap my half finished blog series to get myself back on track.

There are many great study guides out there, but in most cases I need “my own study guide” to feel well prepared. I hope this blog series will keep me on track, and I stay focused. This is my third try to prepare for this exam… :/

Image by Pexels from Pixabay

In opposite to the Deploy exam, the Design exam is a MC exam. 130 Minutes for 60 questions. Sounds easy, but it’s told that it’s one of the hardest exams available by VMware.

The exam consists of three sections:

  • Section 1 – Create a vSphere 6.5 Conceptual Design
  • Section 2 – Create a vSphere 6.x Logical Design from an Existing Conceptual Design
  • Section 3 – Create a vSphere 6.x Physical Design from an Existing Logical Design

Each section contains several objects.

  • Objective 3.1 – Transition from a logical design to a vSphere 6.x physical design
  • Objective 3.2 – Create a vSphere 6.x physical network design from an existing logical design
  • Objective 3.3 – Create a vSphere 6.x physical storage design from an existing logical design
  • Objective 3.4 – Determine appropriate computer resources for a vSphere 6.x physical design
  • Objective 3.5 – Determine virtual machine configuration for a vSphere 6.x physical design
  • Objective 3.6 – Determine data center management options for a vSphere 6.x physical design

I will try to cover each objective in a blog post and add a link here. Feel free to add comments, corrections and questions. :) The already added links link to already written blog posts, but I will revise the alreay posted blog posts.

Leave a comment if you have questsions. :)

Two registry changes to improve physical Horizon View Agent experience

Using physical clients as Horizon View agents is pretty common for me. My office pc, as well as my Lenovo X250 are often used by using the Horizon View Client and the Blast protocol. But as good as the performance is, there were a couple of things that bugged me.

Bild von Mediamodifier auf Pixabay 

On my office pc, I encountered pretty often a black screen, either on first connect, or on reconnect. The typical issue caused by misconfigured firewall policies, but this was completly out of scope in this case, because my collegues never had issues with black screens. The problem occured with different versions of View Agent.

I finally fixed it after I tried to connected per HTML5 client. I got an error, that the connection server was unable to connect to 172.28.208.1. Huh? I don’t know this address… I checked my office pc and found out that this IP was assigned to the Hyper-V virtual switch.

I fixed this by adding this registry change:

HKLM\Software\VMware, Inc.\VMware VDM\IpPrefix = n.n.n.n/m (REG_SZ)

n.n.n.n/m is the subnet on which your View Agent should be connected by the Connection Server.

The second issue was that my client constantly failed to reboot. I clicked reboot, and the machine was gone. No TeamViewer, no RDP, no PING. Only a hard power-off helped to get it back to work.

I fixed this by adding another registry change:

HKCU\Control Panel\Desktop\AutoEndTasks = 1 (REG_SZ)
HKLM\SYSTEM\CurrentControlSet\Control\WaitToKillAppTimeout = 2000 (REG_SZ)
HKU\.DEFAULT\Control Panel\Desktop\AutoEndTasks = 1 (REG_SZ)

This fixed both issues, black screen and hung on reboot, for me.

vCenter Server migration to 6.7 fails with “Failed to check VMware STS. The SSL certificate of STS service cannot be verified”

This posting is ~12 months years old. You should keep this in mind. IT is a short living business. This information might be outdated.

There are still customers out there that are running vCenter Server on a Windows host. This year, despite the fact that most customers have set project on hold, I managed some of them to migrate to a vCenter Server Appliance.

Some days ago I had an meeting with one of my favorite customers to migrate their vCenter Server 6.5 to a vCenter Server Appliance 6.7 U3l. They were still on 6.5 because of some legacy ESXi 5.5 hosts, but they managed it to remove them from their vCenter and we were able to start the migration.

Healthcheck & Stage 1

We did a healthcheck the day before, so we pretty sure that everything should will went smooth. Stage 1 was easy. We deployed the appliance (X-Large… *cough* *cough*) and moved to stage 2. We had ~20 GB of data to migrate. IMHO nothing fancy, I had vCenter withs more data to migrate.

Stage 2… and FAIL

To make a long story short: Stage 2 failed pretty hard with an unrecoverable error.

Encountered an internal error. Traceback (most recent call last): File "/usr/lib/vmidentity/firstboot/vmidentity-firstboot.py", line 1641, in main vmidentityFB.boot() File "/usr/lib/vmidentity/firstboot/vmidentity-firstboot.py", line 345, in boot self.checkSTS(self.__stsRetryCount, self.__stsRetryInterval) File "/usr/lib/vmidentity/firstboot/vmidentity-firstboot.py", line 1179, in checkSTS raise Exception('Failed to initialize Secure Token Server.') Exception: Failed to initialize Secure Token Server.

Deep dive into the log files

The messaged indicated a problem with the Secure Token Server (STS). Without any troubleshooting we did a second try, which also fails with the same message. Time to dive into the logs…

The log file of vmidentity-firstboot.py was pretty helpful, because the end of the log file pointed us to the right direction.

Failed to check VMware STS.
com.vmware.vim.sso.client.exception.CertificateValidationException: The SSL certificate of STS service cannot be verified

This message led us to VMware KB76144 (“Failed to check VMware STS. The SSL certificate of STS service cannot be verified” while upgrading VCSA from 6.5 to 6.7/7.0)

According to the KB article, the cause for our problem is a certificate in STS_INTERNAL_SSL_CERT store which is used by the STS. For sure: This vCenter was upgraded from 5.5 at some time in the past.

So we checked the certificate stores and found further evidence, that a certificate seemed to be our main problem. As you can see, this certificate from the STS_INTERNAL_SSL_CERT store was expired some days ago.

Fortunately, KB76144 offers a simple solution to this problem. In short:

  • remove certificates from the STS_INTERNAL_SSL_CERT store, and
  • re-import the certificate from the MACHINE_SSL_CERT store

It’s DNS… or NTP… or a expired certificate

Because we had a Windows-based vCenter, we had to modify the commands from the KB article for Windows.

C:\Program Files\VMware\vCenter Server\vmafdd>vecs-cli entry getcert --store MACHINE_SSL_CERT --alias __MACHINE_CERT --output c:\temp\machine_ssl.crt

C:\Program Files\VMware\vCenter Server\vmafdd>vecs-cli entry getkey --store MACHINE_SSL_CERT --alias __MACHINE_CERT --output c:\temp\machine_ssl.key

C:\Program Files\VMware\vCenter Server\vmafdd>vecs-cli entry getcert --store STS_INTERNAL_SSL_CERT --alias __MACHINE_CERT --output c:\temp\STS_INTERNAL_SSL_CERT-__MACHINE_CERT.crt

C:\Program Files\VMware\vCenter Server\vmafdd>vecs-cli entry getkey --store STS_INTERNAL_SSL_CERT --alias __MACHINE_CERT --output c:\temp\STS_INTERNAL_SSL_CERT-__MACHINE_CERT.key

C:\Program Files\VMware\vCenter Server\vmafdd>vecs-cli entry delete --store STS_INTERNAL_SSL_CERT --alias __MACHINE_CERT -y
Deleted entry with alias [__MACHINE_CERT] in store [STS_INTERNAL_SSL_CERT] successfully

C:\Program Files\VMware\vCenter Server\vmafdd>vecs-cli entry create --store STS_INTERNAL_SSL_CERT --alias __MACHINE_CERT --cert c:\temp\machine_ssl.crt --key c:\temp\machine_ssl.key
Entry with alias [__MACHINE_CERT] in store [STS_INTERNAL_SSL_CERT] was created successfully

The last step was to restart the VMware STS service. After this, we tried the migration again and the migration went smooth.

Horizon View – Why Automated Desktop Pools with Full Clones are still a thing

This posting is ~1 year years old. You should keep this in mind. IT is a short living business. This information might be outdated.

We have to deal with COVID19 for a year now and from the IT perspective, 2020 was a pretty strange year. Many project were not cancelled, but were placed on-hold. But two kinds of projects went through the roof:

  • Microsoft 365, and
  • Horizon View

As you might noticed I blogged a lot about Exchange, Exchange Online and Horizon this year. The reason for this is pretty simple: That was driving my business this year.

In early 2020, when we decided to move into our home offices, we deployed Horizon View on physical PCs at ML Network (my employer). This was a simple solution and it works for us until today.

Some of my customers also deployed Horizon View for the same reason: A secure and easy way to get a desktop. For some of them, the tech was new and they struggled with DEM, Linked Clones, customization etc. The solution in this case was easy: Full Clones with dedicated assignment.

One customer moved from Windows 7 and floating-assignment and Linked Clones to Windows 10 and Full Clones and dedicated assignment (not my project).

Another customer started to implement Horizon View with Horizon 2006 and he started with Instant-Clones, dedicated assignment and DEM. I told him to go with Full Clones, but his IT-company moved on with Instant Clones. Now he’s complaining about gaining complexity.

My 2 cents

Many customers struggle with Windows 10 and the customization of Windows 10. Tools like Dynamic Environment Manager (DEM) are powerful, but they can be quite complex, especially when it comes down to small IT orgs with 50,100 oder 200 desktops, were each member of the IT has to be a jack of all trades.

I always recommend to start with Full Clones, just to get in touch with the technology. And I always recommend to get the requirements clear with the stakeholders and the user. Things like not working software, missing settings after a logoff/ logon or slow response are the main difficutiles who will force a VDI project to fail.

When you are familiar with the technology, proceed further with DEM, Instant Clones, floating assignment. But you should learn to walk, before you start to run.

Maybe I’m getting old. :D I’m not against modern technology and new features. I’m not a grumpy old senior consultant. But I think I’ve learned the hard way why it’s a bad idea to overburden IT-orgs and their users with new tech, especially in times like these.

Adobe Flash will die and how does this affects VMware

This posting is ~1 year years old. You should keep this in mind. IT is a short living business. This information might be outdated.

December 31, 2020 will not only be the end of the miserable year 2020, it will also be the end of an era – the era of Adobe Flash! Adobe has announced that they will stop supporting Adobe Flash after December 31, 2020. Furthermore, Adobe will block Flash from running in Flash Player on January 12, 2021. Adobe strongly recommends that all users immediately uninstall Flash Player. I got a popup a couple of times, asking me if I want to uninstall Adobe Flash. It’s still installed… :/

Adobe Flash isn’t a big thing in web development anymore, but there is a reason why I still have Adobe Flash installed – Admin Interfaces!

Source: 9GAG

We all had to deal with Flash after VMware started with the vSphere Web Client. It was slow and partially painful buggy. New newer HTML5 based Web Client was much better, but not feature complete until vSphere 6.7.

But the vSphere Web Client was not the only admin interface based on Flash used in a VMware product. The Horizon Administrator, which was the main administration interface until Horizon 7.8, is also based on Flash. Or vRealize Operations uses Flash until version 6.6.

Update now!

If you want to remove Adobe Flash from your computer, you have to update your whole, or at least parts, of your VMware infrastructure.

The simple rule is: Update to the latest release and everything will be fine. If you are running vSphere 6.7 U3, the HTML5 based Web Client is feature complete. The same applies to Horizon View. If you are running 7.10 or a newer release, everything is fine.

VMware has published and KB article which summarizes the update paths: VMware Flash End of Life and Supportability (78589).

But what if I can’t/ or I’m unwilling to update?

In this case, there is an easy approach: Disconnect your systems from the internet or at least block the internet access for them. The alternative approach is not recommended! Stop the automatic updates on your web browser and use the Flash-based User Interfaces on a browser which still supports Flash. Again: This is really not recommended!

VMware ESXi 6.7 memory health warnings after ProLiant SPP

This posting is ~1 year years old. You should keep this in mind. IT is a short living business. This information might be outdated.

During the deployment of a vSAN cluster consisting of multiple HPE ProLiant DL380 Gen10 hosts, I noticed a memory health warning after updating the firmware using the Support Pack for ProLiant. The error was definitely not shown before the update, so it was clear, that this was not a real issue with the hardware. Furthermore: All hosts showed this error.

Memory health status after SPP

The same day, a customer called me and asked me about a strange memory health error after he has updated all of his hosts with the latest SPP…

My first guess, that this was not caused by a HW malfunction was correct. HPE published a advisory about this issue:

The Memory Sensor Status Reported in the vSphere Web Client Is Not Accurate For HPE ProLiant Gen10 and Gen10 Plus Servers Running VMware ESXi 6.5/6.7/7.0 With HPE Integrated Lights-Out 5 (iLO 5) Firmware Version 2.30

To fix this issue, you have to update the ILO5 firmware to version 2.31. You can do this manually using the ILO5 interface, or you can add the file to the SPP. I’ve added the BIN file to the USB stick with the latest SPP.

If you want to update the firmware manually, simply upload the BIN file using the built-in firmware update function.

  1. Navigate to Firmware & OS Software in the navigation tree, and then click Update Firmware
  2. Select the Local file option and browse to the BIN file
  3. To save a copy of the component to the iLO Repository, select the Also store in iLO Repository check box
  4. To start the update process, click Flash

You can download the latest ILO5 2.31from HPE using this link. After the FW update, the error will resolve itself.

Only ESXi 6.7 is affected, and only ESXi 6.7 running on HPE ProLiant hosts, regardless if ML, DL or BL series.