Tag Archives: san

HPE 3PAR OS updates that fix VMware VAAI ATS Heartbeat issue

Customers that use HPE 3PAR StoreServs with 3PAR OS 3.2.1 or 3.2.2 and VMware ESXi 5.5 U2 or later, might notice one or more of the following symptoms:

  • hosts lose connectivity to a VMFS5 datastore
  • hosts disconnect from the vCenter
  • VMs hang during I/O operations
  • you see the messages like these in the vobd.log or vCenter Events tab

  • you see the following messages in the vmkernel.log

Interestingly, not only HPE is affected by this. Multiple vendors have the same issue. VMware described this issue in KB2113956. HPE has published a customer advisory about this.

Workaround

If you have trouble and you can update, you can use this workaround. Disable ATS heartbeat for VMFS5 datastores. VMFS3 datastores are not affected by this issue. To disable ATS heartbeat, you can use this PowerCLI one-liner:

Solution

But there is also a solution. Most vendors have published firwmare updates for their products. HPE has released

  • 3PAR OS 3.2.2 MU3
  • 3PAR OS 3.2.2 EMU2 P33, and
  • 3PAR OS 3.2.1 EMU3 P45

All three releases of 3PAR OS include enhancements to improve ATS heartbeat. Because 3PAR OS 3.2.2 has also some nice enhancements for Adaptive Optimization, I recommend to update to 3PAR OS 3.2.2.

HPE StoreVirtual – Managers and Quorum

HPE StoreVirtual is a scale-out storage platform, that is designed to meet the needs of virtualized environments. It’s based on LeftHand OS and because the magic is a piece of software, HPE StoreVirtual is available as HPE ProLiant/ BladeSystem-based hardware, or as Virtual Storage Appliance (VSA) for VMware ESXi, Microsoft Hyper-V and KVM. It comes with an all-inclusive enterprise feature set. This feature set provides

  • Storage clustering
  • Network RAID
  • Thin Provisioning (with support for space reclamation)
  • Snapshots
  • Asynchronous and synchronous replication across multiple sites
  • Automated software upgrades and self-healing storage
  • Adaptive Optimization (Tiering)

The license is alway all-inclusive. There is no need to license individual features.

HPE StoreVirtual is not a new product. Hewlett-Packard has acquired LeftHand Networks in 2008. The product had several names since 2008 (HP LeftHand, HP P4000 and since a couple of years it’s StoreVirtual), but the core intelligence, LeftHand OS, was constantly developed by HPE. There are rumours that HPE StoreOnce Recovery Manager Central will be available for StoreVirtual soon.

Management Groups & Clusters

A management group is a collection of multiple (at least one) StoreVirtual P4000 storage systems or StoreVirtual VSA. A management group represents the highest administrative domain. Administrative users, NTP and e-mail notification settings are configured on management group level. Clusters are created per management group. A management group can consist of multiple clusters. A cluster represents a pool of storage from which volumes are created. A volume spans all nodes of a cluster. Depending on the Network RAID level, multiple copies of data are distributed over the storage systems in a cluster. Capacity and IO are expanded by adding more storage systems to a cluster.

As in each cluster, there are aids to ensure the function of the cluster in case of node failes. This is where managers and quorums comes into play.

Managers & Quorums

HPE StoreVirtual is a scale-out storage platform. Multiple storage systems form a cluster. As in each cluster, availability must be maintained if one or more cluster nodes fail. To maintain availability, a majority of managers must be running and be able to communicate with each other. This majority is called “a quorum”. This is nothing new. Windows Failover Clusters can also use a majority of nodes to gain a quorum. The same applies to OpenVMS clusters.

A manager is a service running on a storage system. This service is running on multiple storage systems within a cluster, and therefore in a management group. A manager has several functions:

  • Monitor the data replication and the health of the storage systems
  • Resynchronize data after a storage system failure
  • Manage and monitor communication between storage systems in the cluster
  • Coordinate configuration changes (one storage system is the coordinating manager)

This manager is called a “regular manager”. Regular managers are running on storage systems. The number of managers are counted per management group. You can have up to 5 managers per management group. Even if you have multiple storage systems and clusters per management group, you can’t have more than 5 managers running on storage systems. Sounds like a problem, but it’s not. If you have three 3-node clusters in a single management group, you can start managers on 5 of the 6 storage systems. Even if two storage systems fail, the remaining three managers gain a quorum. But if the quorum is lost, all clusters in a management group will be unavailable.

I have two StoreVirtual VSA running in my lab. As you can see, the management group contains two regular managers and vsa1 is the coordinating manager.

storevirtual_manager_1

There are also specialized manager. There are three types of specialized managers:

  • Failover Manager (FOM)
  • Quorum Witness (NFS)
  • Virtual Manager

A FOM is a special version of LeftHand OS and its primary function is to act as a tie breaker in split-brain scenarios. it’s added to a management group. It is mainly used if an even number of storage systems is used in a cluster, or in case of multi-site deployments.

The Quorum Witness was added with LeftHand OS 12.5. The Quorum Witness can only be used in 2-node cluster configurations. It’s added to the management group and it uses a file on a NFS share to provide high availability. Like the FOM, the Quorum Witness is used as the tie breaker in the event of a failure.

The Virtual Manager is the third specialized managers. It can be added to a management group, but its not active until it is needed to regain quorum. It can be used to regain quorum and maintain access to data in a disaster recovery situation. But you have to start it manually. And you can’t add it, if the quorum is lost!

As you can see in this screenshot, I use the Quorum Witness in my tiny 2-node cluster.

storevirtual_manager_2

Regardless of the number of storage systems in a management group, you should use an odd number of managers. An odd number of managers ensures, that a majority is easily maintained. In case of a even number of manager, you should add a FOM. I don’t recommend to add a Virtual Manager.

# of storage systems# of Manager
11 regular manager
22 regular manager + 1 specialized manager
33 regular manager or 2 + 1 FOM or Virtual Manager
43 regular manager or 4 + 1 FOM or Virtual Manager
> 55 regular manager or 4 + 1 FOM or Virtual Manager

In case of a multi-site deployment, I really recommend to place a FOM at a third site. I know that this isn’t always possible. If you can’t deploy it to a third site, place it at the “primary site”. A multi-site deployment is characterized by the fact, that the storage systems of a cluster are located in different locations. But it’s still a single cluster! This might lead to the situation, where a site failure causes the quorum gets lost. Think about a 4-node cluster with two nodes per site. In this case, the remaining two nodes wouldn’t gain quorum (split-brain situation). In this case, a FOM at a third site would help to gain quorum in case of a site failure. If you have multiple clusters in a management group, balance the managers across the clusters. I recommend to add a FOM. If you have a clusters at multiple sites, (primary and a DR site with remote copy), ensure that the majority of managers are at the primary site.

Final words

It is important to understand how managers, quorum, management groups and clusters are linked. Network RAID protects the data by storing multiple copies of data across storage systems in a cluster. Depending on the chosen Network RAID level, you can lose disks or even multiple storage systems. But never forget to have a sufficient number of managers (regular and specialized). If the quorum can’t be maintained, the access to the data will be unavailable. It’s not sufficient to focus on data protection. The availability of, or more specifically, the access to the data is at least as important. If you follow the guidelines, you will get a rock-solid, high performance scale-out storage.

I recommend to listen to Calvin Zitos podcast (7 Years of 100% uptime with StoreVirtual VSA) and to read Bart Heungens blog post about his experience with HPE StoreVirtual VSA (100% uptime for 7 years with StoreVirtual VSA? Check!).

HPE StoreVirtual REST API

Representational State Transfer (REST) APIs are all the rage. REST was defined by Roy Thomas Fielding in his PhD dissertation “Architectural Styles and the Design of Network-based Software Architectures“. The architectural style of REST describes six constraints:

  • Uniform interface
  • Stateless
  • Cacheable
  • Client – Server communication
  • Layered system
  • Code on demand

RESTful APIs typically use HTTP and HTTP verbs (GET, POST, PUT, DELETE, etc.) to send data to, or retrieve data from remote systems. To do so, REST APIs use Uniform Resource Identifiers (URIs) to interact with remote systems. Thus, a client can interact with a remote system over a REST API using standard HTTP URIs and HTTP verbs. For the data transfer, common internet media types, like JSON or XML are used. It’s important to understand that REST is not a standard per se. But most implementations make use of standards such as HTTP, URI, JSON or XML.

Because of the uniform interface, you have different choices in view of a client. I will use PowerShell and the Invoke-RestMethod cmdlet in my examples.

HPE StoreVirtual REST API

With the release of LeftHand OS 11.5 (the latest release is 12.6), HPE added a REST API for management and storage provisioning. Due to a re-engineered management stack, the REST API is significantly faster than the same task processed on the CLI or using the  Centralized Management Console (CMC). It’s perfect for automation and scripting. It allows customers to achieve a higher level of automation and operational simplicity. The StoreVirtual REST API is using JavaScript Object Notation (JSON) for data transfer between client and the StoreVirtual management group. With the REST API, you can

  • Read, create, and modify volumes
  • Create and delete snapshots
  • Create, modify, and delete servers
  • Grant and revoke access of servers to volumes

I use two StoreVirtal VSA (LeftHand OS 12.6) in my lab. Everything I show in this blog post is based on LeftHand OS 12.6.

The REST API in LeftHand OS 12.6 uses:

  • HTTPS 1.1
  • media types application/JSON
  • Internet media types application/schema+JSON
  • UTF-8 character encoding

RESTful APIs typically use HTTP and HTTP verbs (GET, POST, PUT, DELETE, etc.). I case of the StoreVirtual REST API:

  • GET is used to retrieve an object. No body is necessary.
  • PUT is used to update an object. The information to update the object is sent within the body.
  • POST is used to create of an object, or to invoke an action or event. The necessary information are sent within the body.
  • DELETE is used to delete an object.

Entry point for all REST API calls is /lhos, starting from a node, eg.

Subsequent resources are relative to this base URI. Resources are:

Resource pathDescription
/lhos/managementGroupManagement group entity
/lhos/clustersCluster collection
/lhos/cluster/<id>Cluster entity
/lhos/credentialsCredentials collection
/lhos/credentials/<session token>Credentials entity
/lhos/serversServer collection
/lhos/servers/<id>Server entity
/lhos/snapshotsSnapshot collection
/lhos/snapshots/<id>Snapshot entity
/lhos/volumesVolume collection
/lhos/volumes/<id> Volume entity

The object model of the StoreVirtual REST API uses

  • Collections, and
  • Entities

to address resources. An entity is used to address individual resources, whereas a collection is a group of individual resources. Resources can be addressed by using a URI.

Exploring the API

First of all, we need to authenticate us. Without a valid authentication token, no REST API queries can be made. To create a credential entity, we have to use the POST method.

$cred is a hash table which includes the username and the password. This hash table is converted to the JSON format with the ConvertTo-Json cmdlet. The JSON data will be used as body for our query. The result is an authentication token.

This authentication token must be used for all subsequent API queries. This query retrieves a collection of all valid sessions.

The GET method is used, and the authentication token is sent with the header of the request.

To retrieve an individual credential entity, the URI of the entity must be used.

The result of this query is the individual credential entity

It’s important to know, that if a session has not been used for 15 minutes, it is automatically removed. The same applies to constantly active sessions after 24 hours. After 24 hours, the credential entity will be automatically removed.

Let’s try to create a volume. The information about this new volume has to be sent within the body of our request. We use again the ConvertTo-Json cmdlet to convert a hash table with the necessary information to the JSON format.

The size must be specified in bytes. As a result, Invoke-RestMethod will output this:

Using the CMC, we can confirm that the volume was successfully created.

storevirtual_rest_api_vol_1

Since we have a volume, we can create a snapshot. To create a snapshot, we need to invoke an action on the volume entity. We have to use the POST method and the URI of our newly created volume.

In case of a successful query, Invoke-RestMethod will give us this output.

Again, we can use the CMC to confirm the success of our operation.

storevirtual_rest_api_vol_2

To delete the snapshot, the DELETE method and the URI of the snapshot entity must be used.

To confirm the successful deletion of the snapshot, the GET method can be used. The GET method will retrieve a collection of all snapshot entities.

The result will show no members inside of the snapshot collection.

At the end of the day, we remove our credential entity, because it’s not longer used. To delete the credential entity, we use the DELETE method with the URI of our credential entity.

The next query should fail, because the credential entity is no longer valid.

HTTPS workaround

The StoreVirtual API is only accessable over HTTPS. By default, the StoreVirtual nodes use an untrusted HTTPS certifificate. This will cause Invoke-RestMethod to fail.

After a little research, I found a workaround. This workaround uses the System.Security.Cryptography.X509Certificates namespace. You can use this snippet to build a function or add it to a try-catch block.

Final words

The StoreVirtual REST API is really handy. It can be used to perform all important tasks. It’s perfect for automation and it’s faster than the CLI. I’ve used PowerShell in my examples, but I’ve successfully tested it with Python. Make sure to take a look in to the HPE StoreVirtual REST API Reference Guide.