Tag Archives: hp

Checking the 3PAR Quorum Witness appliance

Two 3PAR StoreServs running in a Peer Persistence setup lost the connection to the Quorum Witness appliance. The appliance is an important part of a 3PAR Peer Persistence setup, because it acts as a tie-breaker in a split-brain scenario.

While analyzing this issue, I saw this message in the 3PAR Management Console:

3PAR Quorum Witness Status

Patrick Terlisten/ www.vcloudnine.de/ Creative Commons CC0

In addition to that, the customer got e-mails that the 3PAR StoreServ arrays lost the connection to the Quorum Witness appliance. In my case, the CouchDB process died. A restart of the appliance brought it back online.

How to check the Quorum Witness appliance?

You can check the status of the appliance with a simple web request. The documentation shows a simple test based on curl. You can run this direct from the BASH of the appliance.

But you can also use the PowerShell cmdlet Invoke-WebRequest.

If you add /witness to the URL, you can test the access to the database, which is used for Peer Persistence.

If you get a connection error, check if the beam process is running.

If not, reboot the appliance. This can be done without downtime. The appliance comes only into play, if a failover occurs.

Routed Port vs. Switch Virtual Interface (SVI)

Many years ago, networks consisted of repeaters, bridges and router. Switches are the successors of the bridges. A switch is nothing else than a multiport bridge, and a traditional switch doesn’t know how to pass traffic to a different broadcast domains (VLANs). Passing traffic between different broadcast domains, is a job for a router. A router has an IP interface in each broadcast domain, and the IP interface is used by the clients in the broadcast domain as a gateway.

Switch Virtual Interface

A Switch Virtual Interface, or SVI, is exactly this: An virtual IP interface in a broadcast domain (or VLAN). It’s used by the connected clients in the broadcast domain to send traffic to other broadcast domains.

This is how a SVI is created on HPE Comware 7. It’s similar to other vendors.

At least one port is assigned to this VLAN, and as soon as at least one port of this VLAN is online, the SVI is also reachable.

What happens, if you connect two switches with a cable? The broadcast domain spans both switches. Layer 2 traffic is transmitted between the switches. And what would happen if you connect a second cable between the same two switches? As long as you are running Spanning Tree Protocol (STP), or another loop detection mechanism, nothing would happen. But one of the two connection would be blocked. No traffic would be able to pass over this connection. If you want to use multiple, active connections between switches, you have to use Link Aggregation Groups (LAG), or things like Multiple Spanning Tree Protocol (MSTP) and Per VLAN Spanning Tree (PVST).

Routers don’t know this. Multiple connections between the same two routers can’t form a loop. Loops and STP (an some other crappy layer 2 stuff) are legacies of the bridges, still alive in modern switches. Loops are a typical “bridge problem”.

Routed Ports

Some switches offer a way, to change the operation mode of a switch port. After changing this operation mode, a switch port doesn’t act like a bridge port anymore. It’s acting like the port of a router, that only handles layer 3 traffic.

This is again a HPE Comware 7 example. I know that Cisco and Alcatel Lucent Enterprise also offer routed ports.

This is a normal switch port. Please note the “port link-mode bridge”.

To “convert” a switch into a routed port, simply change the link-mode of the port.

As you can see, you can now assign an IP address directly to the port.

Example

Let’s try to make this clear with an example. C1-1 and C1-2 are two HPE Comware based switched, configured as an IRF stack (virtual chassis). These two switches form the core switch C1. S1 and S2 are two access switches, also HPE Comware based. Each access switch has two uplinks: One uplink to C1-1 and another uplink to C1-2, the two chassis that form C1. The 40 GbE Ports between C1-1 and C1-2 are used for IRF. Please ignore them.

The uplinks between the switches, all ports are Gigabit Ethernet (GE) ports, are configured as routed ports.

Without routed ports, the uplinks must be configured as a LAG, or STP would block one of the two uplinks between the core switches and the access switch. But because routed ports are used, no loop is formed. Most layer 2 traffic can’t pass the routed ports (broadcasts, multicasts etc.)

routed_links_1

Patrick Terlisten/ www.vcloudnine.de/ Creative Commons CC0

THe Link Layer Discovery Protocol (LLDP) traffic can pass the routed port. This is what the core switch (C1) “sees” over LLDP.

Each routed port as an IP address assigned. The same applies to the routed ports on the access switches. Each uplink pair (core to access) uses a /30 subnet.

As you can see, the interfaces working in bridge mode start counting at GE1/0/3.

The same applies to STP. The ports, that were configured as routed ports, are not listed in the output. STP is not active on these ports.

What are the implications?

The example shows redundant links between access and core switches. There are no loops, but there’s also no layer 2 connectivity. VLANs are only located on the access switches. There are no VLANs spanning multiple switches. What does this mean? How can a client on S1 reach a server on S2? The answer is simple: You have to route the traffic on the access switches. But that’s a topic for another blog post.

HPE 3PAR OS updates that fix VMware VAAI ATS Heartbeat issue

Customers that use HPE 3PAR StoreServs with 3PAR OS 3.2.1 or 3.2.2 and VMware ESXi 5.5 U2 or later, might notice one or more of the following symptoms:

  • hosts lose connectivity to a VMFS5 datastore
  • hosts disconnect from the vCenter
  • VMs hang during I/O operations
  • you see the messages like these in the vobd.log or vCenter Events tab

  • you see the following messages in the vmkernel.log

Interestingly, not only HPE is affected by this. Multiple vendors have the same issue. VMware described this issue in KB2113956. HPE has published a customer advisory about this.

Workaround

If you have trouble and you can update, you can use this workaround. Disable ATS heartbeat for VMFS5 datastores. VMFS3 datastores are not affected by this issue. To disable ATS heartbeat, you can use this PowerCLI one-liner:

Solution

But there is also a solution. Most vendors have published firwmare updates for their products. HPE has released

  • 3PAR OS 3.2.2 MU3
  • 3PAR OS 3.2.2 EMU2 P33, and
  • 3PAR OS 3.2.1 EMU3 P45

All three releases of 3PAR OS include enhancements to improve ATS heartbeat. Because 3PAR OS 3.2.2 has also some nice enhancements for Adaptive Optimization, I recommend to update to 3PAR OS 3.2.2.

HPE StoreVirtual – Managers and Quorum

HPE StoreVirtual is a scale-out storage platform, that is designed to meet the needs of virtualized environments. It’s based on LeftHand OS and because the magic is a piece of software, HPE StoreVirtual is available as HPE ProLiant/ BladeSystem-based hardware, or as Virtual Storage Appliance (VSA) for VMware ESXi, Microsoft Hyper-V and KVM. It comes with an all-inclusive enterprise feature set. This feature set provides

  • Storage clustering
  • Network RAID
  • Thin Provisioning (with support for space reclamation)
  • Snapshots
  • Asynchronous and synchronous replication across multiple sites
  • Automated software upgrades and self-healing storage
  • Adaptive Optimization (Tiering)

The license is alway all-inclusive. There is no need to license individual features.

HPE StoreVirtual is not a new product. Hewlett-Packard has acquired LeftHand Networks in 2008. The product had several names since 2008 (HP LeftHand, HP P4000 and since a couple of years it’s StoreVirtual), but the core intelligence, LeftHand OS, was constantly developed by HPE. There are rumours that HPE StoreOnce Recovery Manager Central will be available for StoreVirtual soon.

Management Groups & Clusters

A management group is a collection of multiple (at least one) StoreVirtual P4000 storage systems or StoreVirtual VSA. A management group represents the highest administrative domain. Administrative users, NTP and e-mail notification settings are configured on management group level. Clusters are created per management group. A management group can consist of multiple clusters. A cluster represents a pool of storage from which volumes are created. A volume spans all nodes of a cluster. Depending on the Network RAID level, multiple copies of data are distributed over the storage systems in a cluster. Capacity and IO are expanded by adding more storage systems to a cluster.

As in each cluster, there are aids to ensure the function of the cluster in case of node failes. This is where managers and quorums comes into play.

Managers & Quorums

HPE StoreVirtual is a scale-out storage platform. Multiple storage systems form a cluster. As in each cluster, availability must be maintained if one or more cluster nodes fail. To maintain availability, a majority of managers must be running and be able to communicate with each other. This majority is called “a quorum”. This is nothing new. Windows Failover Clusters can also use a majority of nodes to gain a quorum. The same applies to OpenVMS clusters.

A manager is a service running on a storage system. This service is running on multiple storage systems within a cluster, and therefore in a management group. A manager has several functions:

  • Monitor the data replication and the health of the storage systems
  • Resynchronize data after a storage system failure
  • Manage and monitor communication between storage systems in the cluster
  • Coordinate configuration changes (one storage system is the coordinating manager)

This manager is called a “regular manager”. Regular managers are running on storage systems. The number of managers are counted per management group. You can have up to 5 managers per management group. Even if you have multiple storage systems and clusters per management group, you can’t have more than 5 managers running on storage systems. Sounds like a problem, but it’s not. If you have three 3-node clusters in a single management group, you can start managers on 5 of the 6 storage systems. Even if two storage systems fail, the remaining three managers gain a quorum. But if the quorum is lost, all clusters in a management group will be unavailable.

I have two StoreVirtual VSA running in my lab. As you can see, the management group contains two regular managers and vsa1 is the coordinating manager.

storevirtual_manager_1

Patrick Terlisten/ www.vcloudnine.de/ Creative Commons CC0

There are also specialized manager. There are three types of specialized managers:

  • Failover Manager (FOM)
  • Quorum Witness (NFS)
  • Virtual Manager

A FOM is a special version of LeftHand OS and its primary function is to act as a tie breaker in split-brain scenarios. it’s added to a management group. It is mainly used if an even number of storage systems is used in a cluster, or in case of multi-site deployments.

The Quorum Witness was added with LeftHand OS 12.5. The Quorum Witness can only be used in 2-node cluster configurations. It’s added to the management group and it uses a file on a NFS share to provide high availability. Like the FOM, the Quorum Witness is used as the tie breaker in the event of a failure.

The Virtual Manager is the third specialized managers. It can be added to a management group, but its not active until it is needed to regain quorum. It can be used to regain quorum and maintain access to data in a disaster recovery situation. But you have to start it manually. And you can’t add it, if the quorum is lost!

As you can see in this screenshot, I use the Quorum Witness in my tiny 2-node cluster.

storevirtual_manager_2

Patrick Terlisten/ www.vcloudnine.de/ Creative Commons CC0

Regardless of the number of storage systems in a management group, you should use an odd number of managers. An odd number of managers ensures, that a majority is easily maintained. In case of a even number of manager, you should add a FOM. I don’t recommend to add a Virtual Manager.

# of storage systems# of Manager
11 regular manager
22 regular manager + 1 specialized manager
33 regular manager or 2 + 1 FOM or Virtual Manager
43 regular manager or 4 + 1 FOM or Virtual Manager
> 55 regular manager or 4 + 1 FOM or Virtual Manager

In case of a multi-site deployment, I really recommend to place a FOM at a third site. I know that this isn’t always possible. If you can’t deploy it to a third site, place it at the “primary site”. A multi-site deployment is characterized by the fact, that the storage systems of a cluster are located in different locations. But it’s still a single cluster! This might lead to the situation, where a site failure causes the quorum gets lost. Think about a 4-node cluster with two nodes per site. In this case, the remaining two nodes wouldn’t gain quorum (split-brain situation). In this case, a FOM at a third site would help to gain quorum in case of a site failure. If you have multiple clusters in a management group, balance the managers across the clusters. I recommend to add a FOM. If you have a clusters at multiple sites, (primary and a DR site with remote copy), ensure that the majority of managers are at the primary site.

Final words

It is important to understand how managers, quorum, management groups and clusters are linked. Network RAID protects the data by storing multiple copies of data across storage systems in a cluster. Depending on the chosen Network RAID level, you can lose disks or even multiple storage systems. But never forget to have a sufficient number of managers (regular and specialized). If the quorum can’t be maintained, the access to the data will be unavailable. It’s not sufficient to focus on data protection. The availability of, or more specifically, the access to the data is at least as important. If you follow the guidelines, you will get a rock-solid, high performance scale-out storage.

I recommend to listen to Calvin Zitos podcast (7 Years of 100% uptime with StoreVirtual VSA) and to read Bart Heungens blog post about his experience with HPE StoreVirtual VSA (100% uptime for 7 years with StoreVirtual VSA? Check!).

HPE StoreVirtual REST API

Representational State Transfer (REST) APIs are all the rage. REST was defined by Roy Thomas Fielding in his PhD dissertation “Architectural Styles and the Design of Network-based Software Architectures“. The architectural style of REST describes six constraints:

  • Uniform interface
  • Stateless
  • Cacheable
  • Client – Server communication
  • Layered system
  • Code on demand

RESTful APIs typically use HTTP and HTTP verbs (GET, POST, PUT, DELETE, etc.) to send data to, or retrieve data from remote systems. To do so, REST APIs use Uniform Resource Identifiers (URIs) to interact with remote systems. Thus, a client can interact with a remote system over a REST API using standard HTTP URIs and HTTP verbs. For the data transfer, common internet media types, like JSON or XML are used. It’s important to understand that REST is not a standard per se. But most implementations make use of standards such as HTTP, URI, JSON or XML.

Because of the uniform interface, you have different choices in view of a client. I will use PowerShell and the Invoke-RestMethod cmdlet in my examples.

HPE StoreVirtual REST API

With the release of LeftHand OS 11.5 (the latest release is 12.6), HPE added a REST API for management and storage provisioning. Due to a re-engineered management stack, the REST API is significantly faster than the same task processed on the CLI or using the  Centralized Management Console (CMC). It’s perfect for automation and scripting. It allows customers to achieve a higher level of automation and operational simplicity. The StoreVirtual REST API is using JavaScript Object Notation (JSON) for data transfer between client and the StoreVirtual management group. With the REST API, you can

  • Read, create, and modify volumes
  • Create and delete snapshots
  • Create, modify, and delete servers
  • Grant and revoke access of servers to volumes

I use two StoreVirtal VSA (LeftHand OS 12.6) in my lab. Everything I show in this blog post is based on LeftHand OS 12.6.

The REST API in LeftHand OS 12.6 uses:

  • HTTPS 1.1
  • media types application/JSON
  • Internet media types application/schema+JSON
  • UTF-8 character encoding

RESTful APIs typically use HTTP and HTTP verbs (GET, POST, PUT, DELETE, etc.). I case of the StoreVirtual REST API:

  • GET is used to retrieve an object. No body is necessary.
  • PUT is used to update an object. The information to update the object is sent within the body.
  • POST is used to create of an object, or to invoke an action or event. The necessary information are sent within the body.
  • DELETE is used to delete an object.

Entry point for all REST API calls is /lhos, starting from a node, eg.

Subsequent resources are relative to this base URI. Resources are:

Resource pathDescription
/lhos/managementGroupManagement group entity
/lhos/clustersCluster collection
/lhos/cluster/<id>Cluster entity
/lhos/credentialsCredentials collection
/lhos/credentials/<session token>Credentials entity
/lhos/serversServer collection
/lhos/servers/<id>Server entity
/lhos/snapshotsSnapshot collection
/lhos/snapshots/<id>Snapshot entity
/lhos/volumesVolume collection
/lhos/volumes/<id> Volume entity

The object model of the StoreVirtual REST API uses

  • Collections, and
  • Entities

to address resources. An entity is used to address individual resources, whereas a collection is a group of individual resources. Resources can be addressed by using a URI.

Exploring the API

First of all, we need to authenticate us. Without a valid authentication token, no REST API queries can be made. To create a credential entity, we have to use the POST method.

$cred is a hash table which includes the username and the password. This hash table is converted to the JSON format with the ConvertTo-Json cmdlet. The JSON data will be used as body for our query. The result is an authentication token.

This authentication token must be used for all subsequent API queries. This query retrieves a collection of all valid sessions.

The GET method is used, and the authentication token is sent with the header of the request.

To retrieve an individual credential entity, the URI of the entity must be used.

The result of this query is the individual credential entity

It’s important to know, that if a session has not been used for 15 minutes, it is automatically removed. The same applies to constantly active sessions after 24 hours. After 24 hours, the credential entity will be automatically removed.

Let’s try to create a volume. The information about this new volume has to be sent within the body of our request. We use again the ConvertTo-Json cmdlet to convert a hash table with the necessary information to the JSON format.

The size must be specified in bytes. As a result, Invoke-RestMethod will output this:

Using the CMC, we can confirm that the volume was successfully created.

storevirtual_rest_api_vol_1

Patrick Terlisten/ www.vcloudnine.de/ Creative Commons CC0

Since we have a volume, we can create a snapshot. To create a snapshot, we need to invoke an action on the volume entity. We have to use the POST method and the URI of our newly created volume.

In case of a successful query, Invoke-RestMethod will give us this output.

Again, we can use the CMC to confirm the success of our operation.

storevirtual_rest_api_vol_2

Patrick Terlisten/ www.vcloudnine.de/ Creative Commons CC0

To delete the snapshot, the DELETE method and the URI of the snapshot entity must be used.

To confirm the successful deletion of the snapshot, the GET method can be used. The GET method will retrieve a collection of all snapshot entities.

The result will show no members inside of the snapshot collection.

At the end of the day, we remove our credential entity, because it’s not longer used. To delete the credential entity, we use the DELETE method with the URI of our credential entity.

The next query should fail, because the credential entity is no longer valid.

HTTPS workaround

The StoreVirtual API is only accessable over HTTPS. By default, the StoreVirtual nodes use an untrusted HTTPS certifificate. This will cause Invoke-RestMethod to fail.

After a little research, I found a workaround. This workaround uses the System.Security.Cryptography.X509Certificates namespace. You can use this snippet to build a function or add it to a try-catch block.

Final words

The StoreVirtual REST API is really handy. It can be used to perform all important tasks. It’s perfect for automation and it’s faster than the CLI. I’ve used PowerShell in my examples, but I’ve successfully tested it with Python. Make sure to take a look in to the HPE StoreVirtual REST API Reference Guide.

HPE Hyper Converged 380 – A look under the hood

In March 2016, HPE CEO Meg Whitman announced a ProLiant-based HCI solution, that should be easier to use and cheaper than Nutanix.

This isn’t HPEs first dance on this floor. In August 2015, HP launched the Hyper Converged 250 System (HC250), which is based on the Apollo server platform. The HW design of the HC250 comes close to a Nutanix Block, because the Apollo platform supports up to four nodes in 2U. Let me say this clear: The Hyper Converged 380 (HC380) is not a replacement for the HC250! And before the HC250, HPE offered the Converged System 200-HC StoreVirtual and 200-HC EVO:RAIL (different models).

The HC380 is based on the ProLiant DL380 Gen9 platform. The DL380 Gen9 is one of the, if not the best selling x86 server on the market. Instead of developing everything from scratch, HPE build their new HC380 from different already available HPE products. With one exception: HPE OneView User Experience (UX). IT was developed from scratch and consolidates all management and monitoring tasks into a single console. The use of already available components was the reason for the low time-to-market (TTM) of the HC380.

Currently, the HC380 can only run VMware vSphere (HPE CloudSystem uses VMware vSphere). Support for Microsoft Hyper-V and Citrix XenServer will be added later. If you wish to run Microsoft Hyper-V, check the HC250 or wait until it’s supported with the HC380.

What flavor would you like?

The HC380 is available in three editions (use cases):

  • HC380 (Virtualization)
  • HC380 (HPE CloudSystem)
  • HC380 (VDI)

All three use cases are orderable using a single SKU and include two DL380 Gen9 nodes (2U). You can add up to 14 expansion nodes, so that you can have up to 16 dual-socket DL380 Gen9.

Each node comes with two Intel Xeon E5 CPUs. The exact CPU model has to be selected before ordering. The same applies to the memory (128 GB or 256 GB per node, up to 1,5 TB) and disk groups (up to three disk groups, each with 4,5 to 8 TB usable capacity per block, 8 drives either SSD/ HDD or all HDD with a maximum of 25 TB usable per node). The memory and disk group configuration depends on the specific use case (virtualization, CloudSystem, VDI). The same applies to the number of network ports (something between 8x 1 GbE and 6x 10 GbE plus 4x 1 GbE). For VDI, customers can add NVIDIA GRID K1, GRID K2 or Telsa M60 cards.

VMware vSphere 6 Enterprise or Enterprise Plus are pre-installed and licences can be bought from HPE. Interesting note from the QuickSpecs:

NOTE: HPE Hyper Converged 380 for VMware vSphere requires valid VMware vSphere Enterprise or higher, and vCenter licenses. VMware licenses can only be removed from the order if it is confirmed that the end-customer has a valid licenses in place (Enterprise License Agreement (ELA), vCloud Air Partner or unused Enterprise Purchasing Program tokens).

Hewlett Packard Enterprise supports VMware vSphere Enterprise, vSphere Enterprise Plus and Horizon on the HPE Hyper Converged 380.

No support for vSphere Standard or Essentials (Plus)! Let’s see how HPE will react on the fact, that VMware will phase out vSphere Enterprise licenses.

The server includes 3y/ 3y/ 3y onsite support with next business day response. Nevertheless, at least 3-year HPE Hyper Converged 380 solution support is requires according to the latest QuickSpecs.

What’s under the hood?

As I already mentioned, the HC380 was built from well known HPE products. Only HPE OneView User Experience (UX) was developed from scratch. OneView User Experience (UX) consolidates the following tasks into a single console (source QuickSpecs):

  • Virtual machine (VM) vending (create, edit, delete)
  • Hardware/driver and appliance UI frictionless updates
  • Advanced capacity and performance analytics (optional)
  • Backup and restore of appliance configuration details
  • Role-based access
  • Integration with existing LDAP or Active Directory
  • Physical and virtual hardware monitoring

Pretty cool fact: HPE OneView User Experience (UX) will be available for the HC250 later this year. Part of a 2-node cluster are not only the two DL380 Gen9 servers, but also three VMs:

  • HC380 Management VM
  • HC380 OneView VM
  • HC380 Management UI VM

The Management VM is used for VMware vCenter (local install) and HPE OneView for vCenter. You can use a remote vCenter (or a vCenter Server Appliance), but you have to make sure that the remote vCenter has HPE Oneview for vCenter integrated. The OneView VM running HPE OneView for for HW/ SW management. The Management UI VM is running HPE OneView User Experience.

The shared storage is provided by HPE StoreVirtual VSA. A VSA is running on each node. As you might know, StoreVirtual VSA comes with an all-inclusive license. No need to buy additional licenses. You can have it all: Snapshots, Remote Copy, Clustering, Thin Provisioning, Tiering etc. The StoreVirtual VSA delivers sustainable performance, a good VMware vSphere integration and added value, for example support for Veeam Storage Snapshots.

When dealing with a 2-node cluster, the 25 TB usable capacity per node means in fact 25 TB usable for the whole 2-node cluster. This is because of the Network RAID 1 between the two StoreVirtual VSA. The data is mirrored between the VSAs. When adding more nodes, the data is striped accross the nodes in the cluster (Network RAID 10+2).

Also important in case of the 2-node cluster: The quorum. At least two StoreVirtual VSA build a cluster. As in every cluster, you need some kind of quorum. StoreVirtual 12.5 added support for a NFSv3 based quorum witness. This is in fact a NFS file share, which has to be available for both nodes. This is only supported in 2-node clusters and I highly recommend to use this. I have a customer that uses a Raspberry Pi for this…

Start the engine

You have to meet some requirements before you can start.

  • 1 GbE connections for each nodes iLO and 1 GbE ports
  • 1 GbE or 10 GbE connections for each node FlexLOM ports
  • Windows-based computer directly connected to a node (MacOS X or Linux should also work)
  • VMware vSphere Enterprise or Enterprise Plus licenses
  • enough IP addresses and VLANs (depending on the use case)

For general purpose server virtualization, you need at least three subnets and three VLANs:

  • Management
  • vMotion
  • Storage (iSCSI)

Although you have the choice between a flat (untagged) and a VLAN-tagged network design, I would always recomment a VLAN-tagged approach. It’s highly recommended to use multiple VLANs to get the traffic seperated. The installation guide includes worksheets and examples to help you planning the deployment. For a 2-node cluster you need at least:

  • 5 IP addresses for the management network
  • 2 IP addresses for the vMotion network
  • 8 IP addresses for the iSCSI storage network

You should leave space for expansion nodes. A proper planning saves you later trouble.

HP OneView InstantOn is used for the automated deployment. It guides you through the necessary configuration steps. HPE says that the deployment requires less than 60 minutes and all you need to enter are

  • IP addresses
  • credentials
  • VMware licenses

After the deployment, you have to install the StoreVirtual VSA licenses. Then you can create datastores and, finally, VMs.

hpehc380_ux

HPE/ hpw.com

Summary

Hyper-Converged has nothing to do with the form factor. Despite the fact that a 2-node cluster comes in 4U, the HC380 has everything you would expect from a HCIA. The customers will decide if HPE held promise. The argument for the HC380 shouldn’t be the lower price compared to Nutanix or other HCI players. Especially, HPE should not repeat the mistake of the HC200 EVO:RAIL: To buggy and to expensive. The HC380 combines known and mature products (ProLiant DL380 Gen9, StoreVirtual VSA, OneView). It’s now up to HPE.

I have several small and mid-sized customers that are running two to six nodes VMware vSphere environments. Also the HC380 for VDI can be very interesting.

End of support for HPE Data Protector 7.0x & 8.0x

Today I got an email from HPE, which has informed me of the imminent end of support for HPE Data Protector 7.0x 8.0x. As of June 30, 2016, HPE will offer no new updates or patches for Data Protector 7.0x and 8.0x. This means that

  • Telephone and email support
  • new security updates, and
  • new product updates

will be phased out. The self-help support will be continued until June 30, 2018. Self-help includes access to the knowledge base, current patches and access to known problems.

Data Protector 8.1x will be under support until June 30, 2017. The self-help support for Data Protector 8.1x will be continued until June 30, 2019.

Please note, that you need new license keys if you want to update Data Protector 7.0x or 8.0x to Data Protector 9. To gain new license keys, you need an active support contract. If you have valid Data Protector 8.1 license keys, you don’t need new license keys.

Don’t hesitate to leave a comment if you need further information.

HPE Data Protector 9.05: SAN backups failing back to NBDSSL

Last year in December, I updated the first customer from HPE Data Protector 9.04 to 9.05. Immediately after the first tests I noticed, that backups were made using the NBDSSL transport. I expected that the SAN transport would be used, because the prerequisites were met and it has worked until the update. I opened a case at the HPE support und I was advised to install the hotfix QCIM2A65619. With this hotfix, several files were replaced:

x8664\A.09.00\VEPA\DP_HOME_DIR\bin\components\DpSessionLogger.dll
x8664\A.09.00\VEPA\DP_HOME_DIR\bin\ViAPI.dll
x8664\A.09.00\VEPA\DP_HOME_DIR\bin\vCloudAPI.dll
x8664\A.09.00\VEPA\DP_HOME_DIR\bin\DPComServer.exe
x8664\A.09.00\VEPA\DP_HOME_DIR\bin\components\vepalib_vmware.dll
x8664\A.09.00\VEPA\DP_HOME_DIR\bin\vepa_util.exe
x8664\A.09.00\VEPA\DP_HOME_DIR\bin\vepa_bar.exe
x8664\A.09.00\VEPA\DP_HOME_DIR\bin\components\vepalib_vcd.dll
x8664\A.09.00\VEPA\DP_HOME_DIR\bin\components\DPHostingEnvironmentComponent.dll
x8664\A.09.00\VEPA\DP_HOME_DIR\bin\components\CDpDataMoverComponent.dll
x8664\A.09.00\VEPA\DP_HOME_DIR\bin\components\vepalib_hyperv.dll
x8664\A.09.00\VEPA\DP_HOME_DIR\bin\components\DpBackendService.dll
x8664\A.09.00\VEPA\DP_HOME_DIR\lib\vddk

The hotfix solved the issue. And to be honest: I didn’t care why it has worked after applying the hotfix. I had the same issue at multiple customers and applying the hotfix solved the issue in each case.

Today, I was reading through the HPE Data Protector 9.06 Integration Guide and the HPE Data Protector 9.0x Virtualization Support Matrix and I stumbled over this table:

Data Protector versionsVMware VDDK componentSupported backup / mount proxy operating systems
9.00, 9.01VDDK 5.5.0Windows Server 2003 R2 (x64)
Windows Server 2008, 2008 R2 (x64)
Windows Server 2012 (x64)
RHEL 5.9 (x64)
RHEL 6.2, 6.3 (x64)
SLES 10.4 (x64)
SLES 11 (x64)
9.02, 9.03VDDK 5.5.3Windows Server 2003 R2 (x64)
Windows Server 2008, 2008 R2 (x64)
Windows Server 2012 (x64)
RHEL 5.9 (x64)
RHEL 6.2, 6.3, 6.4 (x64)
SLES 10.4 (x64)
SLES 11 (x64)
9.04VDDK 6.0Windows Server 2008 R2 (x64)
Windows Server 2012, 2012 R2 (x64)
RHEL 6.6, 7.0 (x64)
SLES 11, 12 (x64)
9.05VDDK 6.0 U1Windows Server 2008 R2 (x64)
Windows Server 2012, 2012 R2 (x64)
RHEL 6.6, 7.0 (x64)
SLES 11, 12 (x64)
9.06VDDK 6.0 U2Windows Server 2008 R2 (x64)
Windows Server 2012, 2012 R2 (x64)
RHEL 6.6, 7.0 (x64)
SLES 11, 12 (x64)

There was a footnote for VDDK 6.0 U1.

The VM backups does not use SAN transport mode on vSphere 5.1, 5.5 (and its updates) environment and falls back to NBDSSL/NBD. This is because of VDDK 6.0 U1 issue. For more information, see VMware Knowledge Base.

Ups… that’s my issue! The footnote inclued a link to VMware KB2135621 (Virtual Disk Development Kit 6.0 U1 Backup and Restore commands fail using SAN transport mode on ESXi 5.5.x hosts on both Windows and Linux proxies). Described symptoms:

  • Virtual Disk Development Kit 6.0 Update 1 backup and restore commands fail using SAN transport mode on ESXi 5.5.x hosts.
  • This issue occurs on both Windows and Linux proxies.

Yep, that’s my issue. The customers that were observing this issue were running vSphere 5.5, not 6.0. With this knowledge, I checked the version of the vixDiskLib.dll on one of the patched Data Protector hosts. And there it was:

vixDiskLib

Patrick Terlisten/ www.vcloudnine.de/ Creative Commons CC0

The vixDiskLib.dll had the build version 6.0.0 build-2498720, which is the build version of the Virtual Disk Development Kit 6.0. So it seems, that the Data Protector hotfix QCIM2A65619 makes a downgrade of the VDDK that is used by Data Protector.

KB2135621 describes, that this issue is resolved in in VMware vCenter Server 6.0 Update 2. This also implies, that this is fixed for VDDK 6.0 U2 and therefore Data Protector 9.06.

I’m sorry Data Protector. It was not your fault!

HPE Data Protector VE Integration/ VMware best practice

The Virtual Environment Integration (VE Integration) provides protection of VMs in virtual server environments. It is used o integrate HPE Data Protector with various virtualization environments, currently VMware vSphere and Microsoft Hyper-V. For Citrix XenServer is a script solution available. I will focus on VMware vSphere.

What is possible?

I took this table from the “HPE Data Protector 9.00 Integration Guide for Virtualization”.

FeatureVE Integration
Online backup
Crash-consistent backup
Application-consistent backup
Granularityvmdk, vmx
Full/ Incremental/ Differential✓/ ✓/ ✓
Support for changed block tracking (CBT)
Where does the Data Protector component need to be installed?backup host
Extra licenses needed1x On-Line Extension per ESXi host

As you can see, Data Protector offers all you need to create a crash-consistent backup of your VMs. HPE Data Protector relies on the VMware vSphere Storage APIs – Data Protection (formerly known as VMware vStorage APIs for Data Protection or VADP). Data Protector has to use the same API as Veeam, CommVault Simpana or any other product that can be used to backup VMs in a VMware vSphere environment. Therefore, most software products offer the same features.

How does it work?

HPE Data Protector uses the vStorage Image backup method to create a crash-consistent backup of your VMs. With this method, a backup host is used to create a backup of VMs hosted on a single or multiple ESXi hosts. The backup host can be a dedicated physical host, a virtual machine, or the Cell Manager (CM) itself (physical or virtual). All you need to make sure is, that the Data Protector Virtual Environment Integration component (VEAgent) is installed. During a vStorage Image backup, the VEAgent

  1. establishes a connection between the backup host and the ESXi or vCenter server (depending if it’s a standalone host or a vCenter environment)
  2. locks the VM, so that it can’t be migrated off the host by VMware vMotion
  3. requests a snapshot of the VM
  4. reads the VM data across LAN or SAN
  5. initializes the Media Agent (MA) and controls the transfer of the data to to backup device

After finishing the backup of the VM, the snapshot is released and the VM is unlocked. I took this picture from the “HPE Data Protector 9.00 Integration Guide for Virtualization” to illustrate the data flow and what components interact with each other.

hpe_dp_vepa

HPE/ hpe.com

If Data Protector requests the creation of a snapshot, the snapshot is always named “_DP_VEPA_SNAP_”. I often use this simple PowerCLI one-liner to search orphaned VEAgent snapshots:

To be honest: Orphaned snapshots only occur if a VEAgent backup failes before Data Protector can delete the snapshot. So an orphaned snapshot indicates some kind of failure during the backup. The number of snapshots that remain in the snapshot chain after a backup depends on three factors:

  • Wheather CBT is used or not
  • Selected snapshot handling mode
  • Backup type specified

The snapshot, that remain in a snapshot chain play a great role for incremental and differential VM backups. Data Protector can detect changes on

  • file level, or at
  • block level

Without CBT, Data Protector uses snapshots to identify changes on file level. With CBT, Data Protector identifies changes on block level. With CBT, the number of snapshots remaining after a backup is always 0. Without CBT, Data Protector keeps up to 2 snapshots (mixed snapshot handling). You must not delete these snapshots. Otherwise a full backup of a VM is necessary to create a new, valid backup chain.

Even if CBT is enabled, Data Protector requests the creation of a snapshot to get a consistent state of the VM. Because of this, a VM backup requires sufficient free disk space on the datastore where the VMDKs of the VM reside. The longer a backup takes, and the more changes are made, the bigger the snapshot gets. Here comes the free space required option into play. You can specify the amount of free disk space, that must be available at the start of the backup, e.g. 10% or 20%. The required free space is calculated based on the size of VMDKs of a VM just before the snapshot is created. Data Protector checks all datastores where the virtual machine disks reside. If a VM has a 100 GB VMDK and you set the free space required option to 10%, at least 10 GB free disk space is required in each datastore, where the VM has VMDKs located. The check is per VM!

By default, VMs are backed up in parallel. This greatly improves the overall backup performance. But in rare cases it can lead to problems. You can disable parallel backups by adding

to the omnirc on the VEAgent backup host.

By default, a maximum of 10 concurrent threads are executed when backing up VMs using the VEAgent integration. This os good for the backup performance, but it also places load on the infrastructure. You can change this by adding the OB2_VEAGENT_VCENTER_CONNECTION_LIMIT variable to the omnirc on the VEAgent backup host.

I had several cases where VEAgent backups failed because the VEAgent (vepa_bar.exe) or the Backup Media Agent (bma.exe) failed with a memory dump during the backup, or during the initial environment discovery. In all cases, the VEAgent, the MA and the CM were located on a single physical host. This is highly not recommended according to the Data Protector Support. A possible solution is to deploy a Windows Server VM and push the VEAgent onto it. You can use this VM as VEAgent backup host, and the physical host acts only as MA and CM.

With the OB2_VEAGENT_BACKUP_DISK_BUFFER_SIZE option, you can modify the buffer size used during the backup. The SAN and the HotAdd transport mode support disk buffer sizes from 1 MB to 256 MB. By default, they use 8 MB disk buffers. The NBD and NBDSSL transport are always using 1 MB. Using bigger disk buffer sizes can improve the backup performance, but it also increases the memory consumption.

On Windows VMs it is possible to use Volume Shadow Copy Service (VSS) to quiesce the states of the applications running within a virtual machine before a snapshot is created. A ZIP archive is created that contains all the BCD and writer manifests. Please note that quiescence can slow down the performance of a backup sessions considerably.

TL;DR

During my last projects, I collected a number of common or best practices. I provide this “AS IS” with no warranties! Thanks to the HPE Data Protector support team for helping me during several support cases. Special thanks to Dimitar, Jose, Zhulien and Stephen!

Use multiple, smaller jobs instead of a few, bigger jobs

You should use jobs with a maximum of 30 VMs. Try to keep the size of a backup equal, but don’t add more than 30 VMs into a single job. If a job fails, you have to restart the job for 30 VMs, not for 200 or more VMs. With more jobs, you can execute jobs in parallel.

Use different hosts as Cell Manager, Media Agent and VEAgent

You shouldn’t combine CM, MA and VEAgent on a single physical or virtual server. Try to separate at least the VEAgent backup host. You can use a VM for this.

If you had to pack all services on a single server, reduce the load

Use OB2_VEAGENT_THREADED_BACKUP, or OB2_VEAGENT_VCENTER_CONNECTION_LIMIT, and/ or reduce the number of running MAs.

Always try to utilize CBT

Whenever possible, use CBT instead of single or mixed snapshot handling.

Use SAN Transport

Whenever possible, use SAN transport. If you can utilize SAN transport, try to use a virtual VEAgent backup host. In this case Data Protector will use HotAdd transport mode.

In case of StoreOnce: Single Object per Store Media

If you use a StoreOnce appliance (or a StoreOnce Software store), make sure that you have enabled “Single Object per Store Media”. I wrote a blog post about it: HPE Data Protector & StoreOnce Catalyst: Single Object per Store Media

Data Protector: Exchange backup failes because of database lock

Today I had a customer call, where a Exchange 2010 backup repeatedly failed. HPE Data Protector was unable to create a differential or incremental backup. For each database, the following error was logged:

Interestingly, there was no other backup session running. But the night before, the backup jobs failed because of a network failure.

The solution is easy. This error is caused by a wrong information in the Data Protector database. To remove this, open an administrative CMD on the Data Protector Cell Manager and run this omnidbutil command:

This command  will free up the locked resources in the Data Protector database.Then, run the job again.