VCAP6.5-DCV Design - Objective 2.3 Build availability requirements into a vSphere 6.x logical design

This blog post covers objective 2.3 (Build availability requirements into a vSphere 6.x logical design) of the VCAP6.5-DCV Design exam. It is based on the VMware Certified Advanced Professional 6.5 in Data Center Virtualization Design (3V0-624) Exam Preparation Guide (last update August 2017).

The necessary skills and abilities are documented in the exam prep guide for the older VCAP6-DCV Design exam (3V0-622). I think they also apply to the current version of the exam:

  • Evaluate which logical availability services can be used with a given vSphere solution
  • Differentiate infrastructure qualities related to availability
  • Describe the concept of redundancy and the risks associated with single points of failure
  • Explain class of nines methodology
  • Determine availability component of service level agreements (SLAs) and service level management processes
  • Determine potential availability solutions for a logical design based on customer requirements
  • Create an availability plan, including maintenance processes
  • Balance availability requirements with other infrastructure qualities
  • Analyze a vSphere design and determine possible single points of failure

Let’s start with…

Evaluate which logical availability services can be used with a given vSphere solution

VMware vSphere offers a broad band of features that allows you to create highly available solutions. When we take a look at the infrastructure, feature like VMware HA, FT, or even multiple NICs at a distributed vSwitch allow to increase availablility. When we look at the application layer, other techniques, like DRS can help us to increase availability to use DRS to place VMs on different hosts (anti-affinity rules) etc.

The infrastructure qualities are:

  • Availability
  • Manageability
  • Performance
  • Recoverability
  • Security

Availability and Recoverability are tight together. René van den Bedem has written an very good blog post about how recoverability affectes availability.

Describe the concept of redundancy and the risks associated with single points of failure

This topic is pretty clear and should be easy to explain. You should be able to identify what a single point of failure is, and how you can avoid them. Examples for a single point of failure are:

  • only a single-port HBA in a server
  • only one network uplink from a Top-of-Rack switch to a Core-Switch
  • using of RAID 0

Explain class of nines methodology

This is also easy:

  • Two Nines- 99% - 3.65 days downtime per year
  • Three Nines- 99,9% - 8.76 hours downtime per year
  • Four Nines- 99,99% - 52.6 minutes downtime per year
  • Five Nines - 99,999% - 5.26 minutes downtime per year
  • Six Nines - 99,9999% - 31.56 seconds downtime per year

Important note: “Downtime” means “unplanned downtime”, not planned downtime, like in maintenance windows.

Determine availability component of service level agreements (SLAs) and service level management processes

An Service Level Agreement (SLA) is a contact between two parties, usually a supplier and a customer. The SLA describes targets that should be met. This can be an availability expressed using the “class of nines methodology”. If this target is missed,the supplier ofthen has to pay a penalty to the customer.

So it is pretty important to build a design that can fulfill the availability requirements. Depending on the requirements you may have to use VMware FT. If the availability requirements are lower, VMware HA may be sufficient. It is important that you can choose the best technique for the given SLA.

Determine potential availability solutions for a logical design based on customer requirements

Now it’s time to put things together. You know the different techniques that are offered by VMware vSphere, and you know the customer requirements. This allows you to determine the potential availability solutions for a logical design.

Create an availability plan, including maintenance processes

Again, I’d like to recommend the blog post of René van den Bedem. It’s all about RPO, RTO, MTD and how much does an unplanned downtime costs (result of a Business Impact Analysis).

Balance availability requirements with other infrastructure qualities

At some point of your design you need to holistically look at your design and you have to ensure that a decision, that was made, doesn not impact other requirements or other decision.

Analyze a vSphere design and determine possible single points of failure

This is pretty self-explanatory and can be done together with the preceding step.

Summary

Availability is the main theme of this objective. Do not lose sight of the customer’s requirements. Increasing availability is often associated with immense additional costs.

Read the mentioned blog post from René and I rellay recommend this vBrownBag video with Rebecca Fitzhugh.