Tag Archives: 3par

VMware vSphere Metro Storage Cluster with HP 3PAR Peer Persistence – Part I

This posting is ~5 years years old. You should keep this in mind. IT is a short living business. This information might be outdated.

The title of this blog post mentions two terms that have to be explained. First, a VMware vSphere Metro Storage Cluster (or VMware vMSC) is a configuration of a VMware vSphere cluster, that is based on a a stretched storage cluster. Secondly, HP 3PAR Peer Persistence adds functionalities to HP 3PAR Remote Copy software and HP 3PAR OS, that two 3PAR storage systems form a nearly continuous storage system. HP 3PAR Peer Persistence allows you, to create a VMware vMSC configuration and to achieve a new quality of availability and reliability.

VMware vSphere Metro Storage Cluster

In a vMSC, server and storage are geographically distributed over short or medium-long distances. vMSC goes far beyond the well-known synchronous mirror between two storage systems. Virtualization hosts and storage belong to the same cluster, but they are geographically dispersed: They are stretched between two sites. This setup allows you to move virtual machines from one site to another (vMotion and Storage vMotion) without downtime (downtime avoidance). With a stretched cluster, technologies such as VMware HA can help to minimize the time of a service outage in case of a disaster (disaster avoidance).

The requirements for a vMSC are:

  • Storage connectivity using Fibre Channel/ Fibre Channel over Ethernet (FCoE), NFS or iSCSI
  • max. 10 ms round-trip time (RTT) for the ESXi management network (> 10 ms is supported with vSphere Enterprise Plus – Metro vMotion)
  • max. 5 ms round-trip time RTT for the synchronous storage replication links
  • at leat 250 Mbps per concurrent vMotion on the vMotion network

The complexity of the storage requirements is not the maximum round-trip time – It’s the requirement that a datastore must be accessible from both sites. This means, that a host in Site A must be able to access /read & write) a datastore on a storage in Site B and vice versa. vMSC knows to different methods of host access configuration:

  • Uniform host access configuration
  • Non-Uniform host access configuration

With a uniform host access configuration, the storage on both sides can accessed by all hosts. LUNs from both storage systems are zoned to all hosts and the Fibre-Channel fabric is stretched across the site-links. The following figure was taken from the “Implementing vSphere Metro Storage Cluster using HP 3PAR Peer Persistence” technical whitepaper and shows a typical uniform host access configuration.

uniform-host-access

HPE/ hpe.com

The second possible configuration is non-uniform host access configuration, in which the hosts only access the site-local storage system. The Fibre Channel fabrics are not stretched across the inter-site links. The following figure was taken from the “Implementing vSphere Metro Storage Cluster using HP 3PAR Peer Persistence” technical whitepaper and shows a typical non-uniform host access configuration. If a storage system fails, the ESXi in a datacenter will lose the connectivity and the virtual machine will fail. VMware HA will take care that the VM is restarted in other datacenter.

non-uniform-host-access

HPE/ hpe.com

Another possible non-uniform setup uses stretched Fibre Channel fabrics and some kind of virtual LUN. A LUN is mirrored between two storage systems and can be accessed from both sites. The storage systems take care of the consistency of the data. This figure was taken from the “VMware vSphere Metro Storage Cluster Case Study” technical whitepaper.

non-uniform-host-access-stretched-fabric

VMware/ vmware.com

The uniform host access configuration is currently used most frequently.

Regardless of the implementation it’s useful to think about the data locality. Let’s assume, that a host in datacenter A is running a VM, that is housed in a datastore on a storage system in datacenter B. As long as you’re using a stretched fabric between the sites, this is a potential scenario. What happens to the storage I/O of this VM? Right, it will travel across the inter-site links from datacenter A to datacenter B. To avoid this, you can use DRS groups and rules.

Examples for uniform host access configuration are:

Uniform host access configurationNon-Uniform host access configuration
vSphere 5.x support with NetApp MetroCluster (2031038)Implementing vSphere Metro Storage Cluster (vMSC) using EMC VPLEX (2007545)
Implementing vSphere Metro Storage Cluster (vMSC) using EMC VPLEX (2007545)
Implementing vSphere Metro Storage Cluster using HP 3PAR StoreServ Peer Persistence (2055904)
Implementing vSphere Metro Storage Cluster using HP LeftHand Multi-Site (2020097)
Implementing vSphere Metro Storage Cluster using Hitachi Storage Cluster for VMware vSphere (2073278)
Implementing vSphere Metro Storage Cluster using IBM System Storage SAN Volume Controller (2032346)

HP 3PAR Remote Copy, Peer Persistence & the Quorum Witness

HP 3PAR Peer Persistence uses synchronous Remote Copy and Asymmetric Logical Unit Access (ALUA) to realize a metro cluster configuration that allows host access from both sides. 3PAR Virtual Volumes (VV) are synchronous mirrored between two 3PAR StoreServs in a Remote Copy 1-to-1 relationship. The relationship may be uni- or bidirectional, which allows the StoreServs to act mutually as a failover system. To create a vMX configuration with HP 3PAR StoreServ storage systems, some requirements have to be fulfilled.

  • Firmware on both StoreServ storage systems must be 3.1.2 MU2 or newer (I recommend 3.1.3)
  • a remote copy 1-to-1 synchronous relationship
  • 2.6 ms or less round-trip time (RTT)
  • Quorum Witness VM must run at a 3rd site and must be reachable from each 3PAR StoreServ
  • same WWN and LUN ID for each source and target virtual volume
  • VMware ESXi 5.0, 5.1 or 5.5
  • Hosts must be created with Hostpersona 11
  • Hosts must be zoned to both 3PAR StoreServ storage systems (this requires a stretched Fibre Channel fabric between the sites)
  • iSCSI or FCoE for host connectivity is supported with 3PAR OS 3.2.1. Versions below 3PAR OS 3.2.1 only support FC for host connectivity with Peer Persistence
  • Both 3PAR StoreServ storage systems must be licensed for Remote Copy and Peer Persistence (I recommend to license the Replication Suite)

A VV can be a source or a target volume. Source VV belong to the primary remote copy group, target virtual volumes belong to a secondary remote copy groups. VV are grouped to remote copy groups to ensure I/O consistency. So all VV that require write order consistency should belong to a remote copy group. Even VV that don’t need write order consistency should belong to a remote copy group, just to simplify administration tasks. A typical uniform vMSC configuration with 3PAR StoreServs will have remote copy groups replicating in both directions. So both StoreServs act as source and target in a bi-directional synchronous remote copy relationship. It’s important to understand that the source and target volumes share the same WWN and they are presented using the same LUN ID. The ESXi hosts must use Hostpersona 11. During the process of creating the remote copy groups, the target volumes can be created automatically. This ensures that the source and target volumes use the same WWN. When the volumes from the source and target StoreServ are presented, the paths to the target StoreServ are marked as “Stand by”. In case of a failover the paths will become active and the I/O will continue. The Quorum Witness is a RHEL appliance that communicates with the StoreServs and triggers the failover in some specific scenarios. This table was taken from the “Implementing vSphere Metro Storage Cluster using HP 3PAR Peer Persistence” technical whitepaper. As you can see, the automatic failover is only triggered in one specific scenario.

Replication stoppedAutomatic failoverHost I/O impacted
Array to Array remote copy links failureYNN
Single site to Quorum Witness network failureNNN
Single site to Quorum Witness network and Array to Array
remote copy link failure
YYN
Both sites to Quorum Witness network failureNNN
Both sites to Quorum Witness network and Array to Array
remote copy link failure
YNY

Summary

VMware vSphere Metro Storage Cluster (vMSC) is a special configuration of a stretched compute and storage cluster. A vMSC is usually implemented to avoid downtime. A vMSC configuration makes it possible to move virtual machine, and thus workloads, between sites. Beyond this, vMSC can avoid downtime caused due to a failed storage system. Using HP 3PAR Remote Copy, 3PAR Peer Persistence and the Quorum Witness, two HP 3PAR StoreServ storage systems can form a uniform vMSC configuration. This allows movement of VMs/ workloads between sites and also a transparent failover between storage systems in case of a failure of one of the StoreServs.

Part II of this small series will cover the configuration of Remote Copy and Peer Persistence.

New HP 3PAR StoreServ AFA, VMware VVols and some thoughts

This posting is ~5 years years old. You should keep this in mind. IT is a short living business. This information might be outdated.

On the HP Discover in June 2013 (I wrote 2014, sorry for that typo). HP has announced the HP 3PAR StoreServ 7450 All-Flash Array. To optimize the StoreServ platform for all-flash workloads, HP made some changes to the hardware of the nodes. The 7450 uses 8-core Intel Xeon CPUs instead 6-core 1.8 Ghz CPUs, the cache was doubled from 64GB to 128GB and they added some changes to the 3PAR OS: HP added additional cache flush queues to separate the flushing of cache for rotating rust and SSD devices. They also made some write I/O optimizations and added the ability to perform fragmented writes. Instead of writing 16 KB blocks, 3PAR OS is now able to write only 4 KB of a 16 KB block. This software-based changes may be used also on the 7200 and 7400. This leads to the new…

HP 3PAR StoreServ 7200 All-Flash Array

HP has now announced the next StoreServ All-Flash Array: The HP 3PAR StoreServ 7200 All-Flash Array which is nothing else then a 7200 with 8x 480 GB cMLC drives. The 8 drives result in a raw capacity of ~ 3,5 TB (8 drives are at least necessary to create a CPG). The HP 3PAR StoreServ 7200 All-Flash Array is available for 35.000 US-$ (currently ~ 26.400 €). An interesting price if you consider, that a StoreServ 7200 with 8x 480 GB cMLC drives, no additional support or software has a list price of ~ 60.000 € or ~ 80.000 US-$. On the other side, the 7200 hardware wasn’t optimized for all-flash workloads, so the cache and CPUs are the same.

Some thoughts

HP states that you can achieve 7 TB usable space with only 3,5 TB raw space. First thought: WTF?! Second thought: Oh, there’s an asterisk behind the statement.

Usable capacity calculations based on 25% overhead and 4:1 compaction ratio.

My thoughts about that: First, it doesn’t match the “3,5 TB raw == 7 TB usable” quote. Later in the text HP writes

…you can scale the solution to 690 TB usable and 230 TB raw with our Thin Deduplication software.

Short calculation (230 x 0,75) x 4 = 690. That fits! It seems that HP is more conservative in perspective of the usable capacity on the 7200 AFA, if you take the “3,5 TB raw == 7 TB usable” quote into account (~ 2,5:1). Second, Thin Deduplication on the 7200? Currently HP speaks of it only in connection with the 7450 (Source 1, Source 2) You maybe know, that the Gen4 ASICs are used for Thin Deduplication. The 7200 and 7400 also use the Gen4 ASICs, so there is no constraint why Thin Deduplication shouldn’t work in the 7200 and 7400. I assume that HP will announce Thin Deduplication later for the 7200 and 7400. However, it has been mentioned only in connection with StoreServ AFA. I also think that the HP 3PAR StoreServ 7200 All-Flash Array is an attack on EMC XtremIO and Pure Storage. I will not comment the statement, that the new 7200 AFA is 50% cheaper then EMC XtremIO or Pure Storage:

Based on comparison of US list prices for the HP 3PAR StoreServ 7200 All-Flash Starter Kit and EMC XtremIO with 5TB of raw capacity and Pure Storage FA-405 entry-level configuration with 2.75TB raw capacity.

Finally I’m glad that HP has announced the 7200 AFA, especially for that price. HP 3PAR StoreServ is an awesome storage and I’m sure that it does not have to hide.

VMware VVols

HP has also announced that HP 3PAR StoreServ is ready for VMwares new storage architecture, Virtual Volume (VVols), which is currently tested in the VMware vSphere beta. VMware VVols will revolutionize the way how storage in VMware vSphere is treated by offering VM-level storage control, snapshots and quality of service. The support for VMware VVols will be available with the next release of HP 3PAR OS.

This video was released in 2012 by Calvin Zito and shows you a demo of VMware VVols with 3PAR StoreServ storage.

It is good to see how VMware and HP work together to get this great new technology ready for production.

Conflicting information: Setting iops option for VMW_PSP_RR for HP 3PAR StoreServ on ESXi

This posting is ~5 years years old. You should keep this in mind. IT is a short living business. This information might be outdated.

Yesterday I received the following tweet:

Later Craig Kilborn joined the conversation and I decided to clarify this 100 or 1 IOPS myth the next morning.

In order to give you some context: I wrote a blog post about adding a custom SATP claimrule for HP 3PAR StoreServ storage on ESXi. In this blog post I pointed out, that the claim rule is usually used to change the default behaviour for switching the path for active IO. For the VMW_PSP_RR this is 1000 IOPS, which means, that after 1000 IOPS for a specific device, the path for the active IO to this device ist changed to the next active and optimized IO path. I recommend to read this blog post from Duncan Epping for more information.

I checked two documents which are important and recommened to read, if you want to implement a HP 3PAR StoreServ with VMware ESXi:

The implementation guide provides you the necessary information for the implementation. The best practice whitepaper will cater to its name: it is a best practice whitepaper. So the first one shows you how to do the right things, the second document shows you how to do things right.

Lets have a look into the implementation guide on page 46. There are several commands to add a custom claimrule, depending on the selected host persona.

3par_implementation_guide_100_iops

HPE/ www.hpe.com

On page 43 HP writes:

As part of PSP Round-Robin configuration, the value of IOPS can be specified. IOPS is the number of IO operations scheduled for each path during path changes within the Round-Robin path selection scheme. The default IOPS value is 1000. HP recommends IOPS=100 as an initial value and starting point for further optimization of IO throughput with PSP Round-Robin. With the exception of ESX/ESXi 4.0 versions, it is preferable to set the IOPS value within a SATP custom rule.

Okay. Lets take a look into the best practice guide. On page 8 HP describes how to add a custom claimrule. This time with IOPS=1.

3par_best_practice_guide_1_iops

HPE/ www.hpe.com

On the bottom of page 7 HP writes:

Managing a Round Robin I/O path policy scheme through the vSphere Web Client on a per datastore basis does not allow setting important Round Robin policy details that can be specified when using the command line on the ESXi host. To achieve better load balancing across paths, the –iops option may be issued on the command line to specify that the path should be switched after performing the specified number of I/Os on the current path. By default, the –iops option is set to 1000. The recommended setting for HP 3PAR Storage is 1, and this setting may be changed as needed to suit the demands of various workloads.

In a version of March 2013 of the HP 3PAR StoreServ Storage and VMware vSphere 5 best practices guide, this statement isn’t included! I assume that this happened due to the release of 3PAR OS 3.1.2, which was announced in December 2012. With 3PAR OS 3.1.2 the host persona 11 was released and it was recommended to use this with VMware ESXi. With host persona 11 the hosts use VMW_SATP_ALUA instead of VMW_SATP_DEFAULT_AA.

Depending on the document and the age of the document, different IOPS values are stated. IOPS=100 is stated in the implementation guide and should be used as a start value for further optimizations. The best practice guide clearly recommended IOPS=1. I recommend to do some tests with your customers workload and then choose whether 100 or 1 IOPS.

As a reference:

HP 3PAR VMware ESX Implementation Guide
HP 3PAR StoreServ Storage and VMware vSphere 5 best practices (April 2014)
HP 3PAR StoreServ Storage and VMware vSphere 5 best practices (March 2013)

Add custom SATP claimrule for HP 3PAR to VMware ESXi

This posting is ~5 years years old. You should keep this in mind. IT is a short living business. This information might be outdated.

One of the tasks that I finish before I present the first Virtual Volumes (VV) to hosts is to discuss the need of a custom SATP claimrule with the customer. Requirement for a custom claimrule is usually, that the active and optimized path should be switched after each IO and not after 1000 IOs. Duncan Epping wrote a nice blog post some years ago. I recommend to read it.

Some basics

The Storage Array Type Plug-In (SATP) is responsable for array-specific operations, like health monitoring of physical paths, reporting of path state changes and path failover. Each SATP is linked to a Path Selection Policy (PSP), which controls the selection of active paths for IO. VMware ESXi provides a couple of SATPs:

Claimrules are used to select a valid SATP for a storage system. When a SATP for a given device is searched, the Native Multipathing Provider (NMP) searches the SATP driver part of the rules first. If there is no match, then the SATP vendor/ model part, and finally the SATP transport part is searched. If there is still no match, a valid default SATP rule is selected. And because of this, a custom SATP claimrule is selected instead the default SATP claimrule, because it’s more specific for a given device. The selected SATP determines the PSP.

Adding a custom SATP claimrume

I show you how to add a custom claimrume, in this case for a HP 3PAR storage system. This is an excerpt of some default SATP claimrules.

Usually the highlighted claimrule will cause, that VMW_SATP_ALUA is used for a HP 3PAR storage system, if the ESX hosts use host persona 11 (VMware). For hosts with host persona 6 the VMW_SATP_DEFAULT_AA is used. Please note, that host persona 6 isn’t supported for ESX hosts with 3PAR OS 3.1.3! Since 3PAR OS 3.1.2 you should use host persona 11 instead of 6.

With esxcli you can add a new custom rule:

The switch -s adds a rule for VMW_SATP_ALUA, -P sets VMW_PSP_RR as the default PSP for this rule, -O enables that an active path is switches after 1 IO (default is 1000), -c enables ALUA target port group support, -V is a valid string from the vendor part of the SCSI inquiry string, -M is a valid string from the model part of the SCSI inquiry string and -e is for a description.

If you have to add this rule to multiple hosts, you may want to use these lines of PowerCLI code (make sure that you adjust the $esxCluster variable):

You should add this rule before you present the first VVs to the hosts.

And there it is. Now you can present VVs to your ESX hosts and you can be sure, that this rule is claimed. If you add this rule after you have presented VVs to the ESX hosts, you have to reboot the hosts.

Some thoughts about HP 3PAR Adaptive Optimization

This posting is ~5 years years old. You should keep this in mind. IT is a short living business. This information might be outdated.

HP 3PAR Adaptive Optimization (AO) enables autonomic storage tiering on HP 3PAR storage arrays. With this feature the HP 3PAR storage system analyzes IO and then migrates regions of 128 MB between different storage tiers. Frequently accessed regions of volumes are moved to higher tiers, less frequently accessed regions are shifted to lower tiers. I often talk with customers about AO and I know that this feature is sometimes misunderstood and misconfigured. This blog post is a summary of in my opinion important topics.

Basis about CPGs, LDs, VVs

A physical disks is divided into 1 GB portions, so called chunklets. A Common Provisioning Group (CPG) creates a pool of logical disks (LD) and therefore a pool of storage, that can be used to create virtual volumes (VV). A CPG defines properties like the device type (SAS/ FC, NL, SSD), disk RPM, RAID level, availability level etc. These properties are used to create LD. A LD is a collection of chunklets arranged in RAID sets. The size of a LD is determined by the number of data chunklets in the RAID set. If a CPG uses RAID 5 (3+1), the LD has about 3 GB. On creation of VV, LDs are created in size of the growth increment (which is usually 32 GB for SAS/ FC and NL, and 8 GB for SSD). So if using RAID 5 (3+1) ~ 11 LDs will be created for a 32 GB growth increment. A VV allocates space in size of 128 MB regions (user and snapshot space) or 32 MB (admin space). Each region is on another LD, so a VV is striped across LDs and therefore across physical disks. I hope this drawing makes it easier to understand.

3par_chunklet_cpg_volume

Patrick Terlisten/ www.vcloudnine.de/ Creative Commons CC0

If the space of a CPG is nearly fully allocated, space in size of of the growth increment is allocated – more LDs are created. Thin Reclamation can reclaim space from VVs in 16 KB increments, but free VV space is only returned in 128 MB increments to a CPG. A defragment process goes over the LDs and consolidates smaller pages to bigger contiguous regions. With time the LDs can become less efficient in space usage. Due a process called “compacting”, mapped regions of VVs can consolidated to fewer, more utilized LDs. This may free disk space and increases the efficency of space usage. VVs can allocate space from free space on LD, or if no or not enough contiguous free space is available, new LDs are created. Different VV can share the same LD.

Context between Adaptive Optimization and CPG

An Adaptive Optimization (AO) configuration consists, in simple terms, of CPGs, a mode configuration and optional a schedule. An AO config must have configured at least two tiers and can have up to three tiers (tier 0, tier 1 and tier 2). Usually you configure a CPG with SSDs for tier 0, with SAS/ FC disks for tier 1 and with SAS-NL disks for tier 2. But there’s nothing wrong with it, if you configure a SAS/ FC CPG with RAID 1 for tier 1, and a SAS/ FC CPG with RAID 5 for tier 2. Many combinations are possible. It’s important to understand, that tier 1 should meet the performance requirements of your applications. It’s not a good idea to use a “slow” tier 1 and let AO move all data to tier 0, because your workload heat up chunklets. So everytime you create a VV, this should be associated with your tier 1 CPG.

Mode configuration

The tiering analysis algorithm considers three different things:

  • available space in tiers
  • average latency
  • average tier access rate densities

If allocated space in a tier (a CPG) exceed the tier size (or the CPG warning limit), AO will try to move data to other tiers. Busy regions will be moved to faster tiers, more idle regions will be moved to lower tiers. If your tier 0 exceeds the limit, but there’s space left in tier 1, AO will try to move more idle regions from tier 0 to tier 1. If all tiers exceeds their limits, AO will do nothing.

If a higher tier gets to busy, the latency for this tier can become higher than for lower tiers. To prevent this, a region will not be moved to a faster tier, if the latency for the destination tier is higher than for the current tier. An exception is made, if the IOPS load on the destination tier is lower than an internal threshold. Then the region will be moved to the faster tier.

The last point is the hardest and most complex. The average tier access rate densities is considered, if the system is not limited by tier latencies or tier space. It describes how busy the regions in a tier are on average and it’s measured in units of IOPS per gigabyte per minute. Thre results are compared to individual regions. Depending of the result of this comparison, a region is moved to a lower (it’s less busier than other regions) or higher tier (more busy than other regions).

The mode configuration parameter has three different options:

  • Performance
  • Cost
  • Balanced

If it’s set to “Performance” more data is moved to faster tiers. In contrast to this, the “Cost” mode moves more data to lower tiers. The “Balanced” mode balances between performance and costs. This should be the default setting.

Tier configuration

You need to configure at least two tiers. Best practice is to configure three tiers. The fastest CPG should be configured as tier 0, the slowest CPG should be configured as tier 2.

ConfigurationTier 0Tier 1Tier 2
2-Tier SSD – SAS/ FC
  • at least 5% of the capacity or min. disk requirement for SSD (8 disks)
  • 95% of the capacity
none
2-Tier SAS/ FC – NLnone
  • min. 60% of the capacity
  • 100% of the IOPS
  • max. 40% of the capacity
  • 0% of the IOPS
3-Tier SSD – SAS/ FC – NL
  • at least 5% of the capacity or min. disk requirement for SSD (8 disks)
  • min. 55% of the capacity
  • max. 40% of the capacity

Source: HP 3PAR StoreServ Storage best practices guide, Table 2. Recommended Adaptive Optimization configurations

To ensure that only AO moves data to other tiers, you should use the tier 1 CPG for provisioning VV. No VV should be associated directly with the tier 0 and tier 2 CPG. You should also ensure, that all CPGs that are used in an AO config have the same availability level (Cage, Magazine or Port). If tier 0 and tier 1 have cage availability and tier 2 only magazine availability, the VV will effectively have only magazine availability.

Schedule

You can configure a schedule or you can run AO immediately. If you have multiple AO configs, schedule them all to the same start time. They will run sequential, but the calculation which regions have be moved, is done at the same time. If you check the schedule on the CLI, you will notice another interesting fact:

Do you notice the “-compact” in the command line of each AO schedule? If you use AO, you don’t have to schedule “compactcpg” to compact CPGs, that belong to an AO config. This is done as part of AO. Compacting moves regions of less efficient LDs to fewer, higher utilized LDs. You don’t have to run the run AO every hour. It’s sufficient to run it once a day. Run it at periods with low IO. You can exclude the weekend, if your company or customer isn’t working at the weekend.

Other things to consider

If you use AO, you should avoid using automated techniques, that move data between different storage tiers. Yes, if you think of VMware SDRS, that would be such a technique. But only if you use it in fully-automated mode. You can use it in manual mode and apply recommendations if necessary.

Final Words

I don’t say that these are the best practices, but with these topics in mind, it should be easy for you to discuss the requirements of your customer and impacts of different AO settings with your customer. If you take a look into the HP 3PAR StoreServ Storage best practices guide, you will recognize some of the above mentioned practices. But always keep in mind: Even the best practice can miss the customers requirements. So don’t just apply “best practices” without reflecting the impact to the customers requirements.