Author Archives: Patrick Terlisten

About Patrick Terlisten

vcloudnine.de is the personal blog of Patrick Terlisten. Patrick has a strong focus on virtualization & cloud solutions, but also storage, networking, and IT infrastructure in general. He is a fan of Lean Management and agile methods, and practices continuous improvement whereever it is possible. Feel free to follow him on Twitter and/ or leave a comment.

VCAP6.5-DCV Design – Objective 1.2 – Gather and analyze application requirements

This blog post covers objective 1.2 (Gather and analyze application requirements) of the VCAP6.5-DCV Design exam. It is based on the VMware Certified Advanced Professional 6.5 in Data Center Virtualization Design (3V0-624) Exam Preparation Guide (last update August 2017).

The first objective of the exam prep guide has covered the business requirements. Now we have to do similar for the affected applications.

The necessary skills and abilities are documented in the exam prep guide for the older VCAP6-DCV Design exam (3V0-622). I think they also apply to the current version of the exam:

  • Gather and analyze application requirements for a given scenario
  • Determine the requirements for a set of applications that will be included in the design
  • Collect information needed in order to identify application dependencies
  • Given one or more application requirements, determine the impact of the requirements on the design

Gather and analyze application requirements for a given scenario

As a result of our already done work, we should know with what applications we have to deal in our project. Now ee have to gather the requirements of those applications. The necessary techniques are already known to us:

  • interviews with the relevant stakeholders and/ or developers or engineers
  • existing documentation about the deployment
  • our documented baseline from objective 1.1
  • vendor documentation/ support/ knowledge base articles

It is pretty important to understand what requirements these applications have. It depends on the workload and the applications itself. Tools like perfmon or capacity planning tools can help us to get a solid knowledge about the current and planned capacity/ performance requirements.

But we should not only focus on performance. There is much more to take into account, to be more specific: AMPRS

It stands for

  • Availability
  • Manageability
  • Performance
  • Recoverability, and
  • Security

You can read an detailed explanation here.

Determine the requirements for a set of applications that will be included in the design

This is similar to the written above. When we talk about a set of applications, we have to take the dependencies between these applications into account.

Collect information needed in order to identify application dependencies

To gain the necessary information, we have to talk to the right people, which means that we have to talk to developers, engineers and/ or end-users. We have to deep dive into existing customer and/ or vendor documentation. And we need to use the right tools to map the found dependencies. This can be done with Microsoft Visio, OmniGraffle or similar.

Given one or more application requirements, determine the impact of the requirements on the design

With the knowledge about the applications and the dependencies between them, it is time to make some design decisions. These decisions must support the documented requirements, especially when we think about the requirements in regard of availability, manageability, performance, recoverability, and security.

The key is to understand the impact of the made decisions for the rest of the design.

Summary

I will try to summarize this objective. The last blog post has covered the business requirements and the process from gathering the required information, over the documentation, until the point at which we can start creating a design. This blog post covers the same, but not for the business requirements, but for the applications and the requirements of these applications.

We can gather the necessary information by talking to the relevant stakeholders, engineers, developers etc. Customer and/ or vendor documentation and other sources can be used to get a better understanding of the different application requirements. We also need to understand the dependencies between the different applications, especially if only a subset of applications is virtualized. Our work is supported by different tools, especially for performance analysis, capacity planning and documentation.

With the gathered information we will able to make design decisions that fulfill the requirements (Think about AMPRS).

Links

VCAP6.5-DCV Design – Objective 1.1 Gather and analyze business requirements

This blog post covers objective 1.1 (Gather and analyze business requirements) of the VCAP6.5-DCV Design exam. It is based on the VMware Certified Advanced Professional 6.5 in Data Center Virtualization Design (3V0-624) Exam Preparation Guide (last update August 2017).

When you get the task to design something , you will instinctively start gathering information about the requirements that have to be fulfilled. Everything IT is doing should support the business in some way.

The necessary skills and abilities are documented in the exam prep guide for the older VCAP6-DCV Design exam (3V0-622). I think they also apply to the current version of the exam:

  • Associate a stakeholder with the information that needs to be collected
  • Utilize inventory and assessment data from a current environment to define a baseline state
  • Analyze customer interview data to explicitly define customer objectives for a conceptual design
  • Determine customer priorities for defined objectives
  • Ensure that Availability, Manageability, Performance, Recoverability and Security (AMPRS) considerations are applied during the requirements gathering process
  • Given results of the requirements gathering process, identify requirements for a conceptual design
  • Categorize requirements by infrastructure qualities to prepare for logical design requirements

Associate a stakeholder with the information that needs to be collected

Let’s start with the stakeholders and why they are important for us. But what is a stakeholder? A stakeholder is a person with an interest or concern in something, especially a business (Oxford). Stakeholders can be internal or external parties. An internal stakeholder is someone with a direct relationship to the company. An external stakeholder has no direct connection to the company, but it is affected in some way. This can be suppliers, the government, or other groups. A stakeholder can be anyone, but in our context stakeholders are typically

  • C-Level Executives (CEO, CFO, CIO etc.)
  • Vice Presidents
  • Managers, but also
  • Engineers and end users

As always: It depends. :)

Utilize inventory and assessment data from a current environment to define a baseline state

We also need to understand the current environment and what is currently deployed at the company. Interviews with the stakeholders are important, but in most cases they will not answer all questions. Depending on what is currently deployed, different tools can be used to gain the necessary data. Some examples:

  • RVTools, PowerCLI, vSphere Web Client, vROps etc
  • Custom scripts
  • Windows Server Manager
  • Network Monitoring Tools, like HPE Intelligent Management Center
  • Asset Management

It is important to document the results of the assessment. This is the baseline state of the current environment.

Analyze customer interview data to explicitly define customer objectives for a conceptual design

Now we need to get back to the results of the interviews that we did with the stakeholders to define the goals and the scope of the design. We also need to understant the the

  • Constraints
  • Assumptions,
  • Requirements, and
  • Risks

When we talk about requirements, we have to differ between functional (WHAT) and non-functional (HOW) requirements.

These information will allow us to create a conceptual design, which is written down in a workbook document.

Determine customer priorities for defined objectives

The next step is to define the priorities over the defined objectives. It is important to weight e.g. requirements and risks. Milestones have to be defined. They will help us to measure the success of the project and keep it on track.

Ensure that AMPRS considerations are applied during the requirements gathering process

AMPRS stands for

  • Availability
  • Manageability
  • Performance
  • Recoverability, and
  • Security

It is important to understand the meaning of each of these terms.

Availability considerations address the availability requirements of our design. These are typically expressed by percent uptime of a specific system. For example: 99,5% availability for file services.

Manageability considerations address the management and operational requirements of our design. This can be alerting, reports, access concepts etc.

Performance considerations express the required performance characteristics of the design. For example: Mails per second by a given size.

Recoverability considerations cover the ability to recover from an unexpected incident or disaster. This topic typically addresses backup and recovery of our design.

Security considerations cover the requirements around data control, access management, governance, risk management etc.

Given results of the requirements gathering process, identify requirements for a conceptual design

Now we have collected information from the relevant stakeholders, including the goals, scope, and CARR (constraints, assumptions, requirements, risks), and we have collected details about the current environment. Now it is time to put these information together and create a conceptual design.

The conceptual design must be approved by the stakeholders. This assures that everything is covered. Creating a conceptual design is an iterative process. The conceptual design is finished when the relevant stakeholders have approved it.

Categorize requirements by infrastructure qualities to prepare for logical design requirements

Sounds simple, but it can be challenging: The documented requirements have to be grouped by infrastructure categories, eg.

  • Networking
  • Storage
  • Recovery
  • Compute
  • VM
  • Security

Based on the CARR and the AMPRS considerations, we made design decisions. These decisions affect each of the infrastructure categories. At this point, we can review each of our decisions and mapping the requirements to the infrastructure will ease the creation of a high-level logical design.

Summary

Let me try to simplify this complex process a bit.

We were asked to solve a problem for a company. To solve this problem, we have to design a solution. To create this design, we have to identify the relevant stakeholders. These stakeholders will help us to gather information about the goals, the scope, about constraints, assumptions, requirements and risks. Especially when it comes to the requirements, we have to take availability, manageability, performance, recoverability and security considerations into account.

We can use different tools to collect information about the current environment.

At this point we know WHAT the company want, and we know WHAT they are currently running.

Now we can start with the creation of a conceptual design, which has to be approved by the relevant stakeholders.

To prepare the logical design, we need to map the documented requirements to the different categories of the infrastructure.

Links

VMware Certified Advanced Professional — Data Center Virtualization Design 2019 Study Guide

Last year in September I’ve passed the VCAP6-DCV Deployment exam. After a busy first half of 2019 it’s time to start preparing the VMware Certified Advanced Professional — Data Center Virtualization Design 2019 exam.

There are many great study guides out there, but in most cases I need “my own study guide” to feel well prepared. I hope the step to publish my notes helps me to stay focused and motivated.

Image by Pexels from Pixabay

In opposite to the Deploy exam, the Design exam is a MC exam. 135 Minutes for 60 questions. Sounds easy, but it’s told that it’s one of the hardest exams available by VMware.

The exam is split into three sections:

  • Section 1 – Create a vSphere 6.5 Conceptual Design
  • Section 2 – Create a vSphere 6.x Logical Design from an Existing Conceptual Design
  • Section 3 – Create a vSphere 6.x Physical Design from an Existing Logical Design

Each section contains several objects.

  • Objective 2.1 – Map business requirements to a vSphere 6.x logical design
  • Objective 2.2 – Map service dependencies
  • Objective 2.3 – Build availability requirements into a vSphere 6.x logical design
  • Objective 2.4 – Build manageability requirements into a vSphere 6.x logical design
  • Objective 2.5 – Build performance requirements into a vSphere 6.x logical design
  • Objective 2.6 – Build recoverability requirements into a vSphere 6.x logical design
  • Objective 2.7 – Build security requirements into a vSphere 6.x logical design
  • Objective 3.1 – Transition from a logical design to a vSphere 6.x physical design
  • Objective 3.2 – Create a vSphere 6.x physical network design from an existing logical design
  • Objective 3.3 – Create a vSphere 6.x physical storage design from an existing logical design
  • Objective 3.4 – Determine appropriate computer resources for a vSphere 6.x physical design
  • Objective 3.5 – Determine virtual machine configuration for a vSphere 6.x physical design
  • Objective 3.6 – Determine data center management options for a vSphere 6.x physical design

I will try to cover each objective in a blog post and add a link here. Feel free to add comments, corrections and questions. :)

Out of space – first steps when a datastore runs out of space

This is a situation that never should happen, and I had to deal with it only a couple of times in more than 10y working with VMware vSphere/ ESXi. In most cases, the reason for this was the usage of thin-provisioned disks together with small datastores. Yes, that’s a bad design. Yes, this should never happen.

There is a nearly 100% chance that this setup will fail one day. Either because someone dumps much data into the VMs, or because of VM snapshots. But such a setip WILL FAIL one day.

Yesterday was one of these days and five VMs have stopped working on a small ESXi in a site of one of my customers. A quick look into the vCenter confirmed my first assumption. The datastore was full. My second thought: Why are there so many VMs on that small ESXi host, and why they are thin-provisioned?

The vCenter showed me the following message on each VM:

There is no more space for virtual disk $VMNAME.vmdk. You might be able to continue this session by freeing disk space on the relevant volume, and clicking Retry. Click Cancel to terminate this session.

Okay, what to do? First things first:

  1. Is there any unallocated space left on the RAID group? If yes, expand the VMFS.
  2. Are there any VM snapshots left? If yes, remove them
  3. Configure 100% memory reservation for the VMs. This removes the VM memory swap files and releases a decent amout of disk space
  4. Remove ISO files from the datastore
  5. Remove VMs (if you have a backup and they are not necessary for the business)

This should allow you to continue the operation of the VMs. To solve the problem permanently:

  1. Add disks to the server and expand the VMFS, or create a new datastore
  2. Add a NFS datastore
  3. Remove unnecessary VMs
  4. Setup a working monitoring , setup alarms, do not overprovision datastores, or switch to eager-zeroed disks

Such an issues should not happen. It is not rude to say here: This is simply due to bad design and lack of operational processes.

User vdcs does not have the expected uid 1006

Sorry for the long delay since my last blog post – busy times, but with lots of vSphere. :) Today, I did an upgrade of a standalone vCenter Server Appliance at one of my healthcare customers. The vCenter was on 6.0 U3 and I had to upgrade it to 6.7 U2. It was only a small deployment with three hosts, so nothing fancy. And as with in many other vSphere upgrades, I came across this warning message:

Warning User vdcs does not have the expected uid 1006
Resolution Please refer to the corresponding KB article.

I saw this message multiple times, but in the past, there was no KB article about this, only a VMTN thread. And this thread mentioned, that you can safely ignore this message, if you don’t use a Content Library. Confirmation enough to proceed with the upgrade. :)

Meanwhile, there is a KB article:

Uploading content to the library fails with error: Content Library Service does not have write permission on this storage backing (52559)

This is a statement from the KB article:

Note: You can safely ignore this message if you are not using Content Library Service before the upgrade, or using it only for libraries not backed by NFS storage.

Currently, I don’t have cusomters with NFS backed Content Libraries, but if you do, you might want to take a look at it. Especially if you have done an upgrade from 6.0 to 6.5 or 6.7 and you want to start using Content Libraries now.

Poor performance with Windows 10/ 2019 1809 on VMFS 6

TL;DR: This bug is still up to date and has not been fixed yet! Some user in the VMTN thread mentioned a hotpatch from VMware, which seems to be pulled. A fix for this issue will be available with ESXi 6.5 U3 and 6.7 U3. The only workaround is to place VMs on VMFS 5 datastores, or avoid the use of snapshots if you have to use VMFS 6. I can confirm, that Windows 1903 is also affected.

One of my customers told me that they have massive performance problems with a Horizon View deployment at one of their customers. We talked about this issue and they mentioned, that this was related to Windows 10 1809 and VMFS 6. A short investigation showed, that this issue was well known, and even VMware is working on this. In their case, another IT company installed the Cisco HyperFlex solution and the engineer was unaware of this issue.

Image by Manfred Antranias Zimmer from Pixabay

What do we know so far? In October 2018 (!), shortly after the release of Windows 10 1809, a thread came up in the VMTN (windows 10 1809 slow). According to the posted test results, the issue occurs under the following conditions.

  • Windows 10 1809
  • VMware ESXi 6.5 or 6.7 (regardless from build level)
  • VM has at least one snapshot
  • VM is placed on a VMFS 6 datastore
  • Space reclamation is enabled or disabled

The “official” statement of the VMware support is:

The issue is identified to be due to some guest OS behavior change in this version of windows 10, 1809 w.r.t thin provisioned disks and snapshots, this has been confirmed as a bug and will be fixed in the following releases – 6.5 U3 and 6.7U3, which will be released within End of this year (2019).

https://communities.vmware.com/message/2848206#2848206

I don’t care if the root cause is VMFS 6 or Windows 10. But VMware and Microsoft needs to get this fixed fast! Just to make this clear: You will face the same issues, regardless if you run Windows 10 in a VM, use Windows 10 with Horizon View, or Windows 10 with Citrix. When VMFS 6 and Snapshots comes into play, you will ran into this performance issue.

I will update this blog post when I get some news.

Make your life easier – KeeAgent for KeePass

Using a password safe, or password management system, is not a best practice – it’s a common practice. I’m using KeePass for years, because it’s available for different platforms, it can be used offline, it is Open Source, and it is not bound to any cloud services. Keepass allows me securely store usernames, passwords, recovery codes etc. for different services and websites, and together with features like autotype, Keepass offers a plus security and convenience.

I use 2FA or MFA wherever I can. That’s the reason why I’m a big fan of SSH public key authentication. But SSH key handling is sometimes inconvenient. You simple don’t want to store your SSH private keys on a cloud drive, and you don’t want to store them on a USB stick, or distribute them over different devices. In the past, I stored my SSH private keys on a cloud-drive in an encrypted container. When I needed a key, I encrypted the container and was able to use them. But this solution was inconvenient.

So what to do?

AbsolutVision/ pixabay.com/ Pixybay License

While searching for a solution I stumbled over KeeAgent, which is a plugin for KeePass. Keeagent allows you to store SSH keys in a KeePass database. KeeAgent then acts as SSH agent. I’m using this with PuTTY and MobaXterm and it works like a charm.

Setup KeeAgent

All you need is KeePass 2.x and the KeeAgent plugin. After installing the plugin (simply put the plgx file into C:\Program Files (x86)\KeePass Password Safe 2\Plugins), you can create a new entry in your KeePass database.

The password is the SSH private key passphrase. Then add the public and private key file to the newly created keepass database entry.

The KeeAgent.settings entry will be added automatically. Jump to the “KeeAgent” tab.

If required, keys can be loaded automatically if the database is locked, or you can add them later using the menu “Extras > KeeAgent”. Not every database entry can be used with KeeAgent, you have to enable the first checkbox to allow KeeAgent to use a specific database entry.

I create a database entry for each key pair I want to use with KeeAgent. And I only add frequently used keys automatically to KeeAgent. I have tons of keys and 99% of them are only added if I need them.

With KeeAgent in place, I can start new SSH sessions and KeeAgent delivers the matching key. You can see this in this screenshot “…from agent”.

I really don’t want to miss KeePass and KeeAgent. It makes my life easier and more secure.

Vembu CloudDR – Disaster Recovery as a Cloud Service

When it comes to disaster recovery (DR), dedicated offsite infrastructure is a must. If you follow the 3-2-1 backup rule, then you should have at least three copies of your data, on two different media, and one copy should be offsite.

But an offsite copy of your data can be expensive… You have to setup storage and networking in a suitable colocation. And even if you have an offsite copy of your data, you must be able to recover the data. This could be fun in case of terabytes of data and an offsite copy on tape.

A offsite copy in a cloud is much more interesting. No need to provide hardware, software, licenses. Just provide internet-connectivity, book a suitable plan, and you are ready to go.

Replication to Cloud using Vembu CloudDR

Vembu offers a cloud-based disaster recovery plan through its own cloud services, which is hosted in Amazon Web Services (AWS). This product is designed for businesses, who can’t afford, or who are not willing, to setup a dedicated offsite infrastructure for disaster recovery.

The data, which is backuped by the Vembu BDR server, is replicated to the Vembu Cloud. In case of any disaster, the backup data can be directly restored from the cloud at anytime and anywhere. The replication is managed and monitored using the CloudDR portal.

Before you can enable the offsite replication, you have to register your Vembu BDR server with your Vembu Portal account. You can either go to onlinebackup.vembu.com, or you can go to portal.vembu.com and sign up.

Vembu Technologies/ Vembu CloudDR/ Copyright by Vembu Technologies

After configuring schedule, retention and bandwidth usage, Vembu CloudDR is ready to go.

The end is near – time for recovery

CloudDR offers two types of recovery:

  • Image Based Recovery
  • Application Based Recovery

In case of an image based recovery, you can either download a VMDK or VHD(X) image, or you can do a file level recovery. In this case you can restore single files from inside of a chosen image.

You can even download a VHD(X) image of a VMware backup, which allows you some kind of V2V or P2V restores.

In case of a application based recovery, you can recover single application items from

  • Microsoft Exchange
  • Microsoft SharePoint
  • Microsoft SQL Server, or
  • MySQL

Depending on the type of restore, you will get an encrypted and password protected ZIP file with documents, or even MDF/ LDF files. These files can than be used to restore the lost data.

Summary

Vembu CloudDR is a pretty interesting add-on for Vembu customers. It’s easy to setup, has an attractive price tag and therefore consequently addresses the SMB customers.

Feel free to request a demo or try Vembu CloudDR.

Vembu BDR Essentials – Now up to 10 CPU Sockets

It is pretty common that vendors offer their products in special editions for SMB customers. VMware offers VMware vSphere Essentials and Essentials Plus, Veeam offers Veeam Backup Essentials, and Vembu has Vembu BDR Essentials.

Now Vembu has extended their Vembu BDR Essentials package significantly to address the needs of mid-sized businesses.

Vembu Technologies/ Vembu BDR Essentials/ Copyright by Vembu Technologies

Affordable backup for SMB customers

Most SMB virtualization deployments consists of two or three hosts, which makes 4 or 6 used CPU sockets. Because of this, Vembu BDR Essentials supportes up to 6 sockets or 50 VMs. Yes, 6 sockets OR 50 VMs. Vembu has no rised this limit to 10 Sockets OR 100 VMs! This allows customers to use up to five 2-socket hosts or 100 VMs with less than 10 sockets.

Feature Highlights

Vembu BDR Essentials support all important features:

  • Agentless VMBackup to backup VMs
  • Continuous Data Protection with support for RPOs of less than 15 minutes
  • Quick VM Recovery to get failed VMs up and running in minutes
  • Vembu Universal Explorer to restore individual items from Microsoft applications like Exchange, SharePoint, SQL and Active Directory
  • Replication of VMs Vembu OffsiteDR and Vembu CloudDR

Needless to say that Vembu BDR Essentials support VMware vSphere and Microsoft Hyper-V. If necessary, customer can upgrade to the Standard or Enterprise edition.

Securing VMs – vTPM, VBS, KMS and why you should not simply add a vTPM

Yesterday, I got one of these mails from a customer that make you think “Ehm, no”.

Can you please enable the TPM on all VMs.

The customer

The short answer is “Ehm, no!”. But I’m a kind guy, so I added some explanation to my answer.

Let’s add some context around this topic. The Trusted Platform Module (TPM) is a cryptoprocessor that offers various functions. For example, BitLocker uses the TPM to protect encryption keys. But there are another pretty interesting Windows features that require a TPM: “Virtualization-based Security“, or VBS. In contrast to BitLocker, VBS might be a feature that you want to use inside a VM.

VBS, uses virtualization features to create an isolated and secure region of memory, that is separated from the normal operating system. VBS is required if you want to use Windows Defender Credential Guard, which protects secrets like NTLM password hashes or Kerberos ticket-granting tickets against block pass-the-hash or pass-the-ticket (PtH) attacks. VBS is also required when you want to use Windows Defender Exploit Guard, or Windows Defender Application Control.

Credential Guard, Exploit Guard, and Application Control require a TPM 2.0 (and some other stuff, like UEFI, and some CPU extensions).

So, just add the vTPM module to a VM and you are ready to go? Ehm… no.

Prerequisites – or pitfalls

There are some prerequisites that must be met to use a vTPM:

  • the guest OS you use must be either Windows Server 2016, 2019 or Windows 10
  • the ESXi hosts must be at least ESXi 6.7, and
  • the virtual machine must use UEFI firmware

Okay, no big deal. But there is a fourth prerequisite that must be met:

  • your vSphere environment is configured for virtual machine encryption
imgflip.com

And now things might get complicated… or expensive… or both.

Why do you need VM encryption when you want to add a vTPM?

The TPM can be used to securly store encryption keys. So the vTPM must offer a similar feature. In case of the vTPM, the data is written to the “Non-Volatile Secure Storage” of the VM. This is the .nvram file in the VM directory. To protect this data, the .nvram file is encrypted using the vSphere VM Encryption feature. In addition to the .nvram file, parts of the VMX file, the swap file, the vmware.log, and some other files are also encrypted. But not the VMDKs, except you decide to encrypt them.

Before you can start using VM encryption, you have to add a Key Management Server (KMS) to your vCenter. And you better you add a KMS cluster to your vCenter, because you don’t want that the KMS is a single point of failure. The vCenter Server requests keys from the KMS. The KMS generates and stores the keys, and passes them to third party systems, like the vCenter, using the Key Management Interoperability Protocol (KMIP) 1.1

The KMS is not a part of the vCenter or of the PSC. It is a seperate solution you have to buy. The KMS must support KMIP 1.1. Take a look into the Key Management Server (KMS) compatibility documentation offered by VMware for supported KMS products.

Make sure that you think about administrator permissions, role-based access control (RBAC), or disaster recovery. When you have to deal with security, you don’t want to have users use a general, high priviedge administrator account. And think about disaster recovery! You won’t be able to start encrypted VMs, until you have re-established trust between your vCenter and your KMS (cluster). So be prepared, and do not implement a single KMS.

Summary

And this is why vTPM is nothing you simply enable on all VMs. Because it’s security. And security has to be done right.

Mike Foley has written two awesome blog posts about this topic. Make sure that you read them.

vSphere 6.7 – Virtual Trusted Platform Modules
Introducing support for Virtualization Based Security and Credential Guard in vSphere 6.7