Category Archives: Storage

Wartungsfenster Podcast

This posting is ~1 year years old. You should keep this in mind. IT is a short living business. This information might be outdated.

Ausnahmsweise ein Blogpost in deutscher Sprache. Grund dafür ist, dass Claudia Kühn und ich seit Januar 2022 einen gemeinsamen Podcast rund um den Themenkomplex Datacenter, Cloud und IT ein. Eine lockere Kaminzimmerrunde in der wir entspannt über unseren Job, und alles was damit zu tun hat, plaudern.

Der Podcast erscheint alle zwei Wochen auf den üblichen Kanälen, oder ihr schaut auf der Homepage des Podcasts vorbei. Lasst gernen einen Kommentar/ Feedback da, und gebt uns eine Bewertung auf iTunes.

Use Windows MPIO for DataCore backend storage connections

This posting is ~8 years years old. You should keep this in mind. IT is a short living business. This information might be outdated.

When you install DataCore SANsymphony-V (SSV), you will be asked during the setup to allow the installation of some special drivers. DataCore SANSymphony-V needs this drivers to act as a storage target for hosts and other storage servers. Usually you have three different port roles in a DataCore SSV setup:

  • Frontend Ports (FE)
  • Mirror Ports (MR)
  • Backend Ports (BE)

Frontend (FE) ports act only in target-only mode. These ports will be disabled, if you stop a DataCore storage server. Mirror (MR) ports (can) act as target AND initiator. You can set (if you like) a mirror port to a specific mode (target or initiator), but I wouldn’t recomment this. Theoretically you can set one MR port to act as initiator, and a second to target-only mode. If the port is set to target-only, the port is also stopped when the DataCore storage server is stopped. A backend (BE) port acts as initiator for backend storage. Usually the FE ports act as target-only, the MR as target/ initiator and the BE ports as initiator-only. If you use local storage (or SAS connected), there will be no BE ports.

When it comes to ALUA-capable backend storage, DataCore SSV is a bit old-school. But stop! What is ALUA? Asymmetric Logical Unit Access (ALUA), somtimes also known as Target Port Groups Support (TPGS), is a set of commands that allows to define priorized paths for SCSI devices. With ALUA, a dual-controller storage system is capable to “tell” the server which of the controllers is the “owning” controller for a specific volume. All paths are active, but only a subset of paths to a volume is active and optimized. This is often refered as “Active/ Optimized” and “Active/ Non-Optimized”. Using the non-optimized path is not a problem when it comes to write IO (due to the fact that IO must travel the mirror link between the controllers to the other controllers cache), but when it comes to read IO, using a non-optimized path is a mess. And at this point, DataCore SSV is a bit old-school. Even if the backend storage is ALUA-capable (all new storage systems should be ALUA-capable), DataCore SSV doesn’t care about the optimized and non-optimized paths. It simply chooses one paths and uses it. And this can cause performance problems. There are two solutions for this problem:

  1. You check the chosen backend paths and manually select one (!) active and optimized path.
  2. You replace the DataCore drivers for the backend ports and use Microsoft MPIO to handle the backend paths.

Solution 1 is okay if you only have a few backend disks. DataCore also uses the Microsoft Windows MPIO framework, so it’s available by default if you install DataCore SANsymphony-V. DataCore allows the usage of 3rd party MPIO software (like EMC PowerPath), but they will never support it. If you have trouble with you backend connection, you will be on your own.

Using a Third Party Failover product

Using a Third Party Failover product (such as EMC’s PowerPath or the many MPIO variants from Storage Vendor) can be used directly on the DataCore Server. In the case where Storage Arrays are attached by Fibre Channel connections, do not use the DataCore Fibre Channel back-end driver when using any Third Party Failover product – use the Third Party’s preferred Fibre Channel Driver instead.

This statement was taken from FAQ 1302 (Storage Hardware Guideline for use with DataCore Servers). Some things won’t work with the native backend drivers. You won’t be able to see the backend paths or monitor the performance using the DataCore SSV GUI. Backend storage that is handled by native drivers and 3rd party MPIO products will be treated as “local” storage. DataCore describes this also in FAQ 1302:

A Storage Array that is connected to a DataCore Server but without using the DataCore Fibre Channel back-end driver can still be used. This includes SAS or SATA-attached, all types of SSD, iSCSI connections and any Fibre Channel connection using the Vendor’s own driver.

Any storage that is connected in this way will appear to the SANsymphony-V software as if it were an ‘Internal’ or ‘direct-attached’ Storage Array and some SANsymphony-V functionality will be unavailable to the user such not being able to make use of SANsymphony-V’s performance tools to get some of the available performance counters related to Storage attached to DataCore Servers. Also some potentially useful logging information regarding connections between the DataCore Server and the Storage Array is lost (i.e. not being able to monitor any SCSI connection alerts or errors directly) and may hinder some kinds of troubleshooting, should any be needed.

You should replace the backend drivers AFTER the installation of SSV, but BEFORE you map storage to the newly installed storage server. To identify the correct FC-HBA or NIC, take a look at the PCI bus number. You can find this information on the info tab of the server port details in the DataCore SSV GUI. Then check the FC-HBA/ NIC in the Windows device manager for the same PCI bus number.

devicemgr_datacore_hba_driver_1

Patrick Terlisten/ www.vcloudnine.de/ Creative Commons CC0

Switch to the “Driver” tab and replace the DataCore driver with the native HBA driver.

devicemgr_datacore_hba_driver_install_1

Patrick Terlisten/ www.vcloudnine.de/ Creative Commons CC0

devicemgr_datacore_hba_driver_install_2

Patrick Terlisten/ www.vcloudnine.de/ Creative Commons CC0

devicemgr_datacore_hba_driver_install_3

Patrick Terlisten/ www.vcloudnine.de/ Creative Commons CC0

After replacing the drivers you will notice that the backend ports are marked as missing in the DataCore SSV GUI. Simply remove them. You don’t need them anymore. Now you can map storage to the storage server. Make sure that you scan for new devices in the MPIO console or that you run

mpclaim –r –i –a ""

to claim new devices. You can check the path status using the device manager (check the properties of the disk device, then switch to the MPIO tab) or you can use this command

mpclaim.exe –s –d

Somethimes the storage vendor offers monitoring tool (this is from Nexsan) that provide information about path state, mpio policy and statistics.

nexsan_msio_setup_2

Patrick Terlisten/ www.vcloudnine.de/ Creative Commons CC0

I can’t see much reasons to use the DataCore drivers and the native backend path handling from SSV. But you should be clear about the limitations when it somes to support!

First experience: Nexsan E-Series

This posting is ~8 years years old. You should keep this in mind. IT is a short living business. This information might be outdated.

One of my longtime DataCore customers has started a project to replace their current DataCore storage servers and backend storage with new hardware. In opposite of the current setup, the newly installed backend storage is now FC-attached. The customer has selected Nexsan E-Series E32V, E32XV and E48V storage systems in combination with DataCore SANsymphony-V10.

Who is Nexsan?

The question should be: Who is Imation? Nexsan was founded in 1999 in Derby, England, but was aquired by Imation in December 2012. Since December 2012, Nexsan is one of Imations brands and offers, as a storage-only company, three different product lines: Assureon Secure StorageE-Series High Density Storage and NST Hybrid Storage.

Assureon Secure Storage is amied to customers that needs to implement storage optimization, regulatory and corporate compliance, and/ or long-term archiving of unstructured data.

NST Hybrid Storage offers unified storage and access to it using standard NAS and SAN protocols (CIFS, NFS, FTP FC, iSCSI). It also offers high scalability and Imation’s FASTier caching technology to provide performance for mixed application workloads.

E-Series High Density Storage is aimed to customers that needs a lot capacity on a small footprint with low costs. Typically customers use the E-Series for backup-2-disk or as a high-capacity storage. The E-Series offers four different models:

ModelDisksForm factor 
E18V18 disks2U
E32V32 disks2U
E48V48 disks4U
E60V60 disks4U

The E18XV, E32XV, E48XV and E60XV are the corresponding expansion enclosures. You can add up to 2 enclosures to a so called head unit. All models support 3,5″ disks (or 2,5″ in a 3,5″ cage), except the E32(X)V, which offers only support for 2,5″ disks. You can mix different disk technologies (NL, SAS, SSD) in a single system. All controllers offer 1 GbE iSCSI ports (up to 8 ports per controller pair). There is also support for 6G SAS, 8 Gb FC, 16 Gb FC and 10 GbE iSCSI (up to 4 ports per controller pair). All E-Series support array-based snapshots and replication. The E60V with two enclosures can hold up to 1.44 PB (!) on 12U.

Why choose Nexsan E-Series storage as a backend for DataCore SANsymphony-V?

My first two thoughts were “Where are the cool features, like thin-provisioning, tiering etc.?” and “Oh cool, 32 disk on only 2U/ 48 disks on 4U”. And I think this ultimately describes the benefits of the Nexsan E-Series: No unnecessary frills and high capacity. To be honest: The main reason for using dumb SAS JBODs in the past was, that they doesn’t offer unnecessary frills. I don’t need snapshots, replication or thin-provisioning on array level. These features are offered by DataCore SANsymphony-V. In this case, the customer has chosen to switch from a SAS to a FC-attached storage backend. And in this case, a high performance and high density storage without unnecessary frills is the best thing that can happen.

Disclaimer: I’ve never worked with Nexsan before, and they do not pay me for this blog post.

Delivery and first impression

The delivery came directly from Nexsan, Derby. Six pallets, one pallet for each system (2x E32V, 2x E32XV and 2x E48V). The enclosure and disks were packed separately, but in the same package. The packaging was adequate and it was all neatly packed and well padded. Each system was intensively tested by Nexsan before the shipment.

The first step was to mount the rail kits. The rail kits are very solidly built and partially milled from aluminum. Mounting the rail kits was an easy thing using the included template. You have to use the enclosed screws. Unfortunately no screwdriver-less mounting. The rails were anything but smoothly. And it was a precision job to mount the enclosures. But to be honest: How often do you rack the enclosures? One time.

Installation

Each controller has a management port. The first controller has the ip address 10.11.12.13/8, the second controller 10.11.12.14/8. Simply hook up your laptop and connect to one of these IPs. You don’t need login credentials at this point. But you better should assign a password for the ADMIN (case-sensitive!) after finishing the installation.

nexsan_gui_1

Patrick Terlisten/ www.vcloudnine.de/ Creative Commons CC0

If you have multiple Nexsan storage arrays in your network, please make sure that have connected only one system at one time. All Nexsan E-Series storage arrays use the same default ip addresses.

The hardware

The Nexsan E-Series is a classy dual-controller storage. This is a picture of a E32V. As you can see, 16 hard drives are mounted in one drawer. Nexsan’s so called ActiveDrawer Technology allows you to service disks and both fan units (one at the front and one at the end of the drawer) online. All important components are hot-swappable.

nexsan_e32_pod

Patrick Terlisten/ www.vcloudnine.de/ Creative Commons CC0

The expansion unit E32XV uses the same technology as the E32V. But instead of two controllers, the expansion enclosures has two IO modules. This is the back of the E32V with the SAS-connected E32XV.

nexsan_e32_controller

Patrick Terlisten/ www.vcloudnine.de/ Creative Commons CC0

This is a E48V with one pulled out drawer. It also uses the the ActiveDrawer Technology. The controllers are nearly the same, but they have 8 GB cache instead of 4 GB.

nexsan_e48_pod

Patrick Terlisten/ www.vcloudnine.de/ Creative Commons CC0

The GUI

The GUI is really puristic and fast. Pure HTML with a bit Java Script. You can enable HTTPS if you like. I haven’t noticed any problems with different browsers. The home page gives you a brief overview about the hardware status.

nexsan_webui_1

Patrick Terlisten/ www.vcloudnine.de/ Creative Commons CC0

The storage arrays are coming with pre-build RAID arrays and volumes. If you want another setup, simply use the Quick Start feature. Simply enter the number of RAID sets, spares and volumes and the wizard make the rest.

nexsan_webui_3

Patrick Terlisten/ www.vcloudnine.de/ Creative Commons CC0

One very cool feature is the multi view. Multi view allows you get a quick overview over multiple E-Series storage arrays.

nexsan_webui_2

Patrick Terlisten/ www.vcloudnine.de/ Creative Commons CC0

The GUI is very talkative. Lots of useful information. Very clear and a short help on every page. This is the system info page:

nexsan_webui_4

Patrick Terlisten/ www.vcloudnine.de/ Creative Commons CC0

The same applies to the RAID info page. Nice: Disk and hosts stats! Very handy.

nexsan_webui_5

Patrick Terlisten/ www.vcloudnine.de/ Creative Commons CC0

For our DevOps friends: Each controller offers a config dump page which is plain text and can parsed by a monitoring system or script. The E-Series offers all necessary monitoring features, like e-mail notification, SNMP, Syslog etc. If technical support is needed, the GUI offers a “Technical Support” page which can be used to open a support ticket right from the GUI.

nexsan_webui_6

Patrick Terlisten/ www.vcloudnine.de/ Creative Commons CC0

nexsan_webui_7

Patrick Terlisten/ www.vcloudnine.de/ Creative Commons CC0

Final words

I can’t say anything bad about the Nexsan E-Series. Sure, this is not a HP 3PAR, but should it? It’s a solid block storage and it’s a perfect fit with DataCore SANsymphony-V. The look and feel of the hardware is quite good. Good quality designed in UK. Make sure that you take a look at Nexsan if you’re searching for a solid DataCore backend storage. Patrick Schulz has also some experience with Nexsan and DataCore. Check out his blog posts about DataCore SANsymphony-V and Nexsan!

Chicken-and-egg problem: 3PAR VSP 4.3 MU1 & 3PAR OS 3.2.1 MU3

This posting is ~8 years years old. You should keep this in mind. IT is a short living business. This information might be outdated.

Since monday I’m helping a customer to put two HP 3PAR StoreServ 7200c into operation. Both StoreServs came factory-installed with 3PAR OS 3.2.1 MU3, which is available since July 2015. Usually, the first thing you do is to deploy the 3PAR Service Processor (SP). These days this is (in most cases) a Virtual Service Processor (VSP). The SP is used to initialize the storage system. Later, the SP reports to HP and it’s used for maintenance tasks like shutdown the StoreServ, install updates and patches. There are only a few cases in which you start the Out-of-the-Box (OOTB) procedure of the StoreServ without having a VSP. I deployed two (one VSP for each StoreServ) VSPs, started the Service Processor Setup Wizard, entered the StoreServ serial number and got this message:

3par_vsp_error

Patrick Terlisten/ www.vcloudnine.de/ Creative Commons CC0

“No uninitialized storage system with the specified serial number could be found”. I double checked the network setup, VLANs, switch ports etc. The error occured with BOTH VSPs and BOTH StoreServs. I started the OOTB on both StoreServs using the serial console. My plan was to import the StoreServs later into the VSPs. To realize this, I tried was to setup the VSP using the console interface. I logged in as root (no password) and tried the third option: Setup SP with original SP ID.

3par_vsp_error_console

Patrick Terlisten/ www.vcloudnine.de/ Creative Commons CC0

Not the worst idea, but unsuccessful. I entered the SP ID, SP networking details, a lot other stuff, the serial number of the StoreServ, the IP address, credentials finally got this message:

StoreServ HP 3PAR OS version validation failed - unable to retrieve StoreServ's HP 3PAR OS version.

Hmm… I knew that P003 was mandatory for the VSP 4.3 MU1 and 3PAR OS 3.2.1 MU3. But could cause the missing patch this behaviour? I called HP and explained my guess. After a short remote session this morning, the support case was escalated to the 2nd level. While waiting for the 2nd level support, I was thinking about a solution. I knew that earlier releases of the VSP doesn’t check the serial number of the StoreServ or the version of the 3PAR OS. So I grabbed a copy of the VSP 4.1 MU2 with P009 and deployed the VSP. This time, I was able to finish the “Moment of Birth” (MOB). This release also asked for the serial number, the IP address and login credentials, but it didn’t checked the version of the 3PAR OS (or it doesn’t care if it’s unknown). At this point I had a functional SP running software release 4.1 MU2. I upgraded the SP to 4.3 MU1 with the physical SP ISO image and installed P003 afterwards. Now I was able to import the StoreServ 7200c with 3PAR OS 3.2.1 MU3.

I don’t know how HP covers this during the installation service. AFAIK there is no VSP 4.3 MU1 with P003 available and I guess HP ships all new StoreServs with 3PAR OS 3.2.1 MU3. If you upgrade from an earlier 3PAR OS release, please make sure that you install P003 before you update the 3PAR OS. The StoreServ Refresh matrix clearly says that P003 is mandatory. The release notes for the HP 3PAR Service Processor (SP) Software SP-4.3.0 MU1 P003 also indicate this:

SP-4.3.0.GA-24 P003 is a mandatory patch for SP-4.3.0.GA-24 and 3.2.1.MU3.

I’m excited to hear from the HP 2nd level support. I will update this blog post if I have more information.

EDIT

Together with the StoreServ 8000 series, HP released a new version of the 3PAR Service Processor. The new version 4.4 is necessary for the new StoreServ models, but it also supports 3PAR OS < 3.2.2 (which is the GA release for the new StoreServ models). So if you get a new StoreServ 7000 with 3PAR OS 3.2.1 MU3, simply deploy a SP version 4.4.

DataCore mirrored virtual disks full recovery fails repeatedly

This posting is ~8 years years old. You should keep this in mind. IT is a short living business. This information might be outdated.

Last sunday a customer suffered a power outage for a few hours. Unfortunately the DataCore Storage Server in the affected datacenter weren’t shutdown and therefore it crashed. After the power was back, the Storage Server was started and the recoveries for the mirrored virtual disks started. Hours later, three mirrored virtual disks were still running full recoveries and the recovery for each of them failed repeatedly.

virtual_disk_error_ds10_mirror

Patrick Terlisten/ www.vcloudnine.de/ Creative Commons CC0

The recovery ran until a specific point, failed and started again. When the recovery failed, several events were logged on the Storage Server in the other datacenter (the Storage Server that wasn’t affected from the power outage):

Source: DcsPool, Event ID: 29

The DataCore Pool driver has detected that pool disk 33 failed I/O with status C0000185.

Source: disk, Event ID: 7

The device, \Device\Harddisk33\DR33, has a bad block.

Source: Cissesrv, Event ID: 24606

Logical drive 2 of array controller P812 located in server slot 4 returned a fatal error during a read/write request from/to the volume.

Logical block address 391374848, block count 1024 and command 32 were taken from the failed logical I/O request.

Array controller P812 located in server slot 4 is also reporting that the last physical drive to report a fatal error condition (associated with this logical request), is located in bay 18 of box 1 connected to port 1E

The DataCore support quickly confirmed what we already knew: We had trouble with the backend storage on the DataCore Storage Server that was serving the full recovies for the recovering Storage Server. The full recoveries ran until the point at which a non-readable block was hit. Clearly a problem with the backend storage.

Summary

To summarize this very painful situation:

  • VMFS datastore with productive VMs on DataCore mirrored virtual disks with no redundancy
  • Trouble with the backend storage on the DataCore Storage Server, that was serving the mirrored virtual disks with no redundancy

Next steps

The customer and I decided to evacuate the VMs from the three affected datastores (each mirrored virtual disks represents a VMFS datastore). To avoid more trouble, we decided to split the unhealthy mirrors. So we had three single virtual disks. After the shutdown of the VMs on the affected datastores, we started a single storage vMotions at a time to move the VMs to other datastores. This worked until the storage vMotion hit the non-readable blocks. The storage vMotions failed and the single virtual disks went also into the status “Failed”. After that, we mounted the single virtual disks from the other DataCore Storage Server (that one, that was affected from the power outage and which was running the full recoveries). We expected that the VMFS on the single virtual disks was broken, but to our suprise we were able to mount the datastores. We moved the VMs from the datastores to other datastores. This process was flawless. Just to make this clear: We were able to mount the VMFS on virtual disks, that were in the status “Full Recovery pending”. I was quite sure that there was garbage on the disks, especially if you consider, that there was a full recovery running that never finished.

The only way to remove the logical block errors is to rebuild the logical drive on the RAID controller. This means:

  • Pray for good luck
  • Break all mirrored virtual disks
  • Remove the resulting single virtual disks
  • Remove the disks from the DataCore disk pool
  • Remove the DataCore disk pool
  • Remove the logical drives on the RAID controller
  • Remove the arrays on the RAID controller
  • Replace the faulty physical disks
  • Rebuild the arrays
  • Rebuild the logical drives
  • Create a new DataCore disk pool
  • Add disks to the DataCore disk pool
  • Add mirrors to the single virtual disks
  • Wait until the full recoveries have finished
  • Treat yourself to a beer

Final words

This was very, very painful and, unfortunately, not the first time I had to do this for this customer. The customer is in close contact to the vendor of the backend storage to identify the root cause.

Is Nutanix the perfect fit for SMBs?

This posting is ~8 years years old. You should keep this in mind. IT is a short living business. This information might be outdated.

There’s a world below clouds and enterprise environments with thousands of VMs and hundered or thousands of hosts. A world that consists of maximal three hosts. I’m working with quite a few customers, that are using VMware vSphere Essentials Plus. Those environments consist typically of two or three hosts and something between 10 and 100 VMs. Just to mention it: I don’t have any VMware vSphere Essentials customer. I can’t see any benefit for buying these license. Most of these environments are designed for a lifeime of three to four years. After that time, I come again and replace it with new gear. I can’t remember any customer that upgraded his VMware vSphere Essentials Plus. Even if the demands to the IT infrastructure increases, the license stays the same. The hosts and storage gets bigger, but the requirements stays the same: HA, vMotion, sometimes vSphere Replication, often (vSphere API for) Data Protection. Maybe this is a german thing and customers outside of german are growing faster and invest more in their IT.

Hyperconverged, scale-out IT infrastructure for SMBs?

Think enterprise and break it down to smaller customers. That is said easily, but we saw so many technologies coming from the enterprise down to the SMBs over the last years. Think about SAN. 15 years ago, no SMB even thought about it. Today it’s standard.

I’ve taken this statement from the Nutanix webseite.

Nutanix simplifies datacenter infrastructure by integrating server and storage resources into a turnkey appliance that is deployed in just 30 to 60 minutes, and runs any application at any scale.

When working with SMBs, most of them have to deal with a tight budget. This means that they use the maximum principle, to get most hardware, software and service for their money. Customers do not like long implementation phases. Long implementation phases means, that lots of money can’t invested in hardware or software. Every single Euro/ Dollar/ $CURRENCY invested for service can’t be invested in hardware and software.

Another important requirement for the most SMBs is simple operation. I know a lot customers with only one, two or three people, that doing all that stuff around helpdesk, server, networking etc. IT infrastructure, or IT in general, isn’t the main focus for many of them. It should just work. Every day. Until it’s replaced.This applies not only to the area of server virtualization, it applies to IT in general. This often requires lean and simple designs, designs that follow the principle of error prevention. Because of this, it’s a good practice to reduce the components used in a design and automate where it’s useful and valuable. And if a solution is robust, then this can only be an advantage.

Why Nutanix?

In my opinion, simplicity is the key to sucess. If you see Nutanix for the first time, you will be surprised how easy it is to manage. Deployment, operation, updates. It’s slick, it’s simple, it’s lightweight. Everything the customer needs, is combined on 2U. The same applies to the support. I’ve followed the discussion on Twitter between Nutanix and VMware on who may/ can/ is allowed to provide support for VMware. It was started by a blog post of Chuck Hollis (10 Reasons why VMware is leading the hyperconverged industry). To make it short: I don’t share his opinion. In my opinion, Nutanix focus on customer experience is the key.

Simplicity and the ability to change

I don’t think that pre-configured systems like Fujitsu Cluster-in-a-boxVCE vBlocks or HP ConvergedSystems are the answer to simplified IT infrastructure for SMBs. They are not hyperconverged. They are pre-configured. That’s an important difference. Pre-configured doesn’t mean that it’s easy to manage or fast and easy to implement. SMBs want hyperconverged platforms to simplify their IT infrastructure. Okay, so why not buy any other offered hyperconverged platform on the market, like SimpliVity OmniCubeHP ConvergedSystems HC or VMware EVO:RAIL? Because these offerings are focused on VMware. The question was: Why Nutanix? Because you can run KVM, Microsoft Hyper-V and VMware ESXi on it. That’s an unique selling point (USP). You can offer the customer a hyperconverged platform, that allows him to change to another hypervisor later. I think we all agree that VMware is the market leader. But Microsoft is catching up. All features of the Essentials Plus kit can be delivered with Microsoft Hyper-V (and much more if you add SCVMM). Remeber: I talk about the typical Essentials Plus customer. VMware vSphere Essentials Plus includes all what a customer needs: Failover, live migration, data protection, and if needed, replication. In my experience, DRS, Host Profiles and vSphere Distributed Switches are nice, but SMBs can’t take advantage of it (exceptions are not excluded…). Add the Microsofts SCVMM and the gap between VMware vSphere and Microsoft Hyper-V is even smaller. The licensing of Microsoft Windows Server makes it interesting for customers to take a look at Microsoft Hyper-V, especially if you take the licensing costs into account. Sure, it’s not all about CAPEX (capital expenditure), OPEX (operational expenditures) is also important. Don’t get me wrong, I love VMware. But it’s important to be prepared. If the customer decides to change to Microsoft Hyper-V, you should be able to deliver it.

How can it look like?

Depending on the computing and storage needs, take a closer look at the Nutanix NX-1000 or NX-3000 series. I think a NX-1350 or NX-3350/ 3360 block is a good catch. Add a VMware vSphere Essentials Plus kit (or some Microsoft Windows Server 2012 R2 licenses… maybe also System Center 2012), Veeam Backup Essentials, something to store the backups on, like a HP StoreOnce 2700, and your favorite switches for 10 GbE networking connectivity (for example two HP 2920 switches in a stack with 10 GbE modules). A complete datacenter on 5U. This is only an example, but I think this should fit for most SMB customers (depending how you define SMB…).

Famous last words

Is Nutanix the perfect fit for SMBs? Yes! Easy to implement, easy to manage and robust. Nutanix stands out with its platform independence. This allows customers to have a choice in regard of the used hypervisor. Investment protection is a valuable asset, if you constantly have to fight for budgets.

Tiering? Caching? Why it’s important to differ between them.

This posting is ~8 years years old. You should keep this in mind. IT is a short living business. This information might be outdated.

Some days ago I talked to a colleague from our sales team and we discussed different solutions for a customer. I will spare you the details, but we discussed different solutions and we came across PernixData FVP, HP 3PAR Adaptive OptimizationHP 3PAR Adaptive Flash Cache and DataCore SANsymphony-V. And then the question of all questions came up: “What is the difference?”.

Simplify, then add Lightness

Lets talk about tiering. To make it simple: Tiering moves a block from one tier to another, depending on how often a block is accessed in a specific time. A tier is a class of storage with specific characteristics, for example ultra-fast flash, enterprise-grade SAS drives or even nearline drives. Characteristics can be the drive type, the used RAID level or a combination of characteristics. A 3-tier storage design can consist of only one drive type, but they can be organized in different RAID levels. Tier 1 can be RAID 1 and tier 3 can be RAID 6, but all tiers use enterprise-grade 15k SAS drives. But you can also mix drive types and RAID levels, for example tier 1 with flash, tier 2 with 15k SAS in a RAID 5 and tier 3 with SAS-NL and RAID 6. Each time a block is accessed, the block “heats up”. If it’s hot enough, it is moved one tier up. If it’s less often accessed, the block “cools down” and at a specific point, the block is moved a tier down. If a tier is full, colder blocks will to be moved down and hotter block have to be moved up. It’s a bit simplified, but products like DataCore SANsymphony-V with Auto-Tiering or HP 3PAR Adaptive Optimization are working this way.

Lets talk about caching. With caching a block is only copied to a faster region, which can be flash or even DRAM. The original block isn’t moved, only a copy of the accessed block is copied to a faster medium. If this block is accessed, the data is served from the faster medium. This also works for write I/O. If a block is written, the data is written to the faster medium and will be moved later to the underlying, slower medium. You can’t store block copies until infinity, so less accessed blocks have to be removed from cache if they are not accessed, or if the cache fills up. Examples for caching solutions are PernixData FVP, HP 3PAR Adaptive Flash Cache or NetApp Flash Pool (and also Flash Cache). I lead storage controller cache explicitly not appear in this list. All of the listed caching technologies (except NetApp Flash Cache) can do write-back caching. I wouldn’t recommend read-cache only solutions like VMware vSphere Flash Read Cache, except two situations: Your workload is focused on read I/O and/ or you already own a vSphere Enterprise Plus license, and you do not want to spend extra money.

Tiering or caching? What to choose?

Well… it depends. What is the main goal when using these techniques? Accelerate workloads and making best use of scarce and expensive storage (commonly flash storage).

Regardless of the workload, tiering will need some time to let the often accessed blocks heat up. Some vendors may anticipate this partially by writing data always to the fastest tier. But I don’t think that this is what I would call efficient. One benefit of tiering is, that you can have more then two tiers. You can have a small flash tier, a bigger SAS tier and a really big SAS-NL tier. Usually you will see a 10% flash / 40% SAS / 50% SAS-NL distribution. But as I also mentioned: You don’t have to use flash in a tiered storage design. That’s a plus. On the downside tiering can make mirrored storage designs complex. Heat maps aren’t mirrored between storage systems. If you failover your primary storage, all blocks need to be heaten up again. I know that vendors are working on that. HP 3PAR and DataCore SANsymphony-V currently have a “performance problem” after a failover. It’s only fair to mention it. Here are two examples of products I know well and both offer tiering: In a HP 3PAR Adaptive Optimization configuration, data is always written to the tier, from which the virtual volume was provisioned. This explains the best practice to provision new virtual volumes from the middle tier (Tier 1 CPG). DataCore SANsymphony-V uses the performance class in the storage profile of a virtual disk to determine where data should be written. Depending on the performance class, data is written to the highest available tier (tier affinity is taken into account). Don’t get confused with the tier numbering: Some vendors use tier 0 as the highest tier, others may start counting at tier 1.

Caching is more “spontaneous”. New blocks are written to the cache (usually flash storage, but it can also be DRAM). If a block is read from disks, it’s placed in the cache. Depending on the cache size, you can hold up a lot data. You can lose the cache, but you can’t lose the data ins this case. The cache only holds block copies (okay, okay, written blocks shouldn’t be acknowledged until they are in a second cache/ hose/ $WHATEVER). If the cache is gone, it’s relatively quickly filled up again. You usually can’t have more then two “tiers”. You can have flash and you can have rotating rust. Exception: PernixData FVP can also use host memory. I would call this as an additional half tier. ;) Nutanix uses a tiered storage desing in ther hyper-converged platform: Flash storage is used as read/ write cache, cost effective SATA drives are used to store the data. Caching is great if you have unpredictable workloads. Another interesting point: You can cache at different places in the stack. Take a look at PernixData FVP and HP 3PAR Adaptive Flash Cache. PernixData FVP is sitting next to the hypervisor kernel. HP 3PAR AFC is working at the storage controller level. FVP is awesome to accelerate VM workloads, but what if I have physical database servers? At this point, HP 3PAR AFC can play to its advantages. Because you usually have only two “tiers”, you will need more flash storage as compared to a tiered storage design. Especially then, if you mix flash and SAS-NL/ SATA.

Final words

Is there a rule when to use caching and when to use tiering? I don’t think so. You may use the workload as an indicator. If it’s more predictable you should take a closer look at a tiered storage design. In particular, if the customer wants to separate data from different classes. If you have more to do with unpredictable workloads, take a closer look at caching. There is no law that prevents combining caching and tiering. At the end, the customer requirements are the key. Do the math. Sometimes caching can outperform tiering from the cost perspective, especially if you mix flash and SAS-NL/ SATA in the right proportion.

What to consider when implementing HP 3PAR with iSCSI in VMware environments

This posting is ~8 years years old. You should keep this in mind. IT is a short living business. This information might be outdated.

Some days ago a colleague and I implemented a small 3-node VMware vSphere Essentials Plus cluster with a HP 3PAR StoreServ 7200c. Costs are always a sore point in SMB environments, so it should not surprise that we used iSCSI in this design. I had some doubt about using iSCSI with a HP 3PAR StoreServ, mostly because of the performance and complexity. IMHO iSCSI is more complex to implement then Fibre Channel (FC). But in this case I had to deal with it.

iSCSI options for HP 3PAR StoreServ 7000

If you decide to use iSCSI with a HP 3PAR StoreServ, you have only one option: Adding a 2-port 10GbE iSCSI/ FCoE adapter to each node. There is no other iSCSI option. The available 2-port 10GbE ethernet adapter and 4-port 1GbE ethernet adapter can’t be used for iSCSI connectivity. These adapters can only be used with the HP 3PAR File Persona Software Suite.

The 2-port 10GbE iSCSI/ FCoE adapter is a converged network adapter (CNA) and supports iSCSI or Fibre Channel over Ethernet (FCoE). The adapter can only be used for host connectivity and you have to select iSCSI or FCoE. You can’t use the CNA for remote copy. You have to add a CNA to each nodes in a node pair. You can have up to four 10 GbE ports in a 3PAR 7200 series, or up to eight 10 GbE ports in a 3PAR 7400 series.

Network connectivity

10 GbE means 10 GbE, there is no way to connect the CNA to 1 GbE transceivers. The 2-port 10GbE iSCSI/ FCoE includes two 10 GbE SR SFP+ transceivers. With 3PAR OS 3.1.3 and later, you can use Direct Attach Copper (DAC) cables for network connectivity, but not for FCoE. Make sure that you use the correct cables for your switch! HP currently offers the following cables in different length:

  • X242 for HP ProVision switches
  • X240 for HP Comware switches
  • HP B-series SFP+ to SFP+ Active Copper for Brocade switches, or
  • HP C-series SFP+ to SFP+ Active Copper for Cisco switches

If you use any other switch vendor, I strongly recommend to use the included 10 GbE SR SFP+ transceivers and 10 GbE SR SFP+ transceivers on the switch-side. In this case you have to use fiber cable to connect the 3PAR to the network. In any other case I recommend to use DAC for network connectivity.

It’s a common practice to run iSCSI traffic in its own VLAN. Theoretically a single iSCSI VLAN is sufficient. I recommend to use two iSCSI VLANs in case of a 3PAR, one for each iSCSI subnet. Why two subnets? The answer is easy: Persistent Ports. Persistent Ports allows an host port to assume the identity (port WWN for Fibre Channel or IP address for iSCSI ports) of a failed port while retaining its own identity. This minimizes I/O disruption during failures or upgrade. Persistent Ports uses the NPIV feature for Fibre Channel/ FCoE and IP address failover for iSCSI. With the release of 3PAR OS 3.1.3, Persistent Ports was also available for iSCSI. A hard requirement of Persistent Ports is, that the same host ports of nodes of a node pair must be connected to the same IP network on the fabric. An example clarifies this:

Host port (N:S:P)VLAN IDIP subnet
0:2:111192.168.173.0/27
0:2:212192.168.173.32/27
1:2:111192.168.173.0/27
1:2:212192.168.173.32/27

The use of jumbo frames with iSCSI is a often discussed topic. It’s often argued that complexity and performance gain would be disproportionate. I’m a bit biased. I think that the use of jumbo frames is a must when using iSCSI. I always configure jumbo frames for vMotion, so the costs for configuring Jumbo frames is low for me in an iSCSI environment. Don’t forget to configure jumbo frames on all devices in the path: VMkernel ports, vSwitches, physical switches and 3PAR CNAs.

Always use at least two physical switches for iSCSI connectivity. This concept is compareable to a Fibre Channel dual-fabric SAN. I like the concept of switch aggegration (the wording may vary between vendors). I often work with HP Networking and I like the HP 2920 or 5820 Switch Series. These switches can form stacks in which multiple physical switches act a as a single virtual device. These stacks provide redundancy and operational simplicity. In combination with two VLANs you can build a powerful, redundant and resilient iSCSI SAN.

Host port configuration

The CNA ports can only be used for host connectivity, therefore there is no way to use them for disk or remote copy connectivity. Before you can use the port for host connectivity, you have to select iSCSI or FCoE as storage protocol.

3par_iscsi_cna_config_01

Patrick Terlisten/ www.vcloudnine.de/ Creative Commons CC0

Host and Virtual Volume sets

You can organize hosts and volumes in host sets and volume sets. I recommend to create a host set for all ESXi hosts in a vSphere cluster. I also recommend to create a volume set to group all volumes that should be presented to a host or host set. When exporting Virtual Volumes (VV), you can export a volume set to a host set. If you add a host to the host set, the host will see all volumes in the volume set. If you add a volume to a volume set, the hosts in the host set will all see the newly added volume. This simplifies host and volume management and it reduced the possibilty of human errors.

3par_iscsi_host_set_01

Patrick Terlisten/ www.vcloudnine.de/ Creative Commons CC0

3par_iscsi_vv_set_01

Patrick Terlisten/ www.vcloudnine.de/ Creative Commons CC0

Custom SATP rule for ESXi 5.x and ESXi 6.0

3PAR OS 3.1.2 introduced the new Host Persona 11 for VMware which enables asymmetric logical unit access (ALUA). Beside Host Persona 11, Host Persona 6 for VMware is also available, but it doesn’t support ALUA. 3PAR OS 3.1.3 is the last release that included support Host Persona 6 AND 11 for VMware. All later releases only include Host Persona 11. I strongly recommend to use Host Persona 11 for VMware. You should also add a custom SATP rule. This rule can be added by using ESXCLI.

# esxcli storage nmp satp rule add -s "VMW_SATP_ALUA" -P "VMW_PSP_RR" -O iops=1 -c "tpgs_on" -V "3PARdata" -M "VV" -e "HP 3PAR Custom iSCSI/FC/FCoE ALUA Rule"

This custom rule sets VMW_PSP_RR as the default PSP and it evenly distribute the IOs over all active paths by switching to the next active path after each IO.

iSCSI discovery

Before you can use an exported volume, the host needs to discover the volume from the target. You have to configure the iSCSI discovery in the settings of the software iSCSI initiator. Typically you will use the dynamic discovery process. In this case, the initiator uses SendTargets request to get a list of available targets. After adding the IP addresses of the 3PAR CNAs to the dynamic discovery list, the static discovery list is filled automatically. In case of multiple subnets, the dynamic discovery process can carry some caveats. Chris Wahl has highlighted this problem in his blog post “Exercise Caution Using Dynamic Discovery for Multi-Homed iSCSI Targets“. My colleague Claudia and I stumbled over this behaviour in our last 3PAR project. Removing the IP addresses from the dynamic discovery will result in the loss of the static discovery entries. After a reboot, the entries in the static discovery list will be gone and therefore no volumes will be discovered. I added a comment to Chris blog post and he was able to confirm this behaviour. The solution is to use the dynamic discovery to get a list of targets, and then add the targets manually to the static discovery list.

Final words

HP 3PAR and iSCSI is a equivalent solution to HP 3PAR and Fibre Channel/ FCoE. Especially in SMB environments, iSCSI is a good choice to bring to the customer 3PAR goodness to a reasonable price.

Update OS or reinstall DataCore SANsymphony-V Storage Server

This posting is ~8 years years old. You should keep this in mind. IT is a short living business. This information might be outdated.

Sometimes you have to update the OS of your DataCore Storage Server, or the server is crashed and you have to reinstall it. In both cases, a configuration backup is the starting point. The procedure remains the same, regardless if it’s an update or a reinstall after a server crash:

  • Install Windows Server OS
  • Copy configuration backup file to C:\Program Files\DataCore\SANsymphony\Recovery
  • Install DataCore SANsymphony-V

Take a backup

You can take the configuration backup on different ways:

  • Using the DataCore SANsymphony-V Management Console
  • Using the SANsymphony-V Cmdlets for Windows PowerShell

Regardless of how you take the backup, be sure that you have a valid backup! I recommend to take backups in a regular and automated fashion, e.g. with a PowerShell script. I have written such a script in the past: Backup DataCore SANsymphony-V config using PowerShell

You can take the backup with the DataCore SANsymphony-V Management Console by right clicking the server group and then select “Backup Configuration” from the context menu.

ssv_backup_configuration

Patrick Terlisten/ www.vcloudnine.de/ Creative Commons CC0

When you use the PowerShell, you have to execute three different Cmdlets:

  • Connect-DcsServer to make a connection to the server group
  • Set-DcsBackupFolder to set the location of the backup folder
  • Backup-DcsConfiguration to backup the configuration and store it in the backup folder

The Cmdlets take the configuration backup for all servers in the server group! Make sure that you copy the configuration backups to a safe location. Make sure that you copy a valid backup of each server in the server group. On thing is important: If you take the backups online (DataCore Storage Server is running!), then you will see full recoveries in case of a restore. If you plan to reinstall a DataCore Storage Server, stop the DataCore Storage Server and then take the configuration backup. In case of a clean shutdown, only log recoveries will be necessary.

Restore the backup

Disconnect backend and mirror ports before you reinstall the OS! Install the Windows OS according to the DataCore guidelines (meet the prerequisites, read the know errors with 3rd party components PDF, check name resolution etc.). Make sure that you install the same build of DataCore SANsymphony-V that was used prior the reinstallation. Don’t install newer or older builds! Install exactly the build that was used when taken the configuration backup. Create the folder structure “C:\Program Files\DataCore\SANsymphony\Recovery” and copy the ZIP file into it. Start the DataCore SANsymphony-V installation. You will be prompted during the installation, that a saved configuration was found.

ssv_restore_config

Patrick Terlisten/ www.vcloudnine.de/ Creative Commons CC0

If not, log a call at DataCore or follow the instructions in the DataCore online help. You also need the DcsAdmin password during the installation! I hope you have written it down somewhere. ;) After finishing the installation, shutdown the server and reconnect backend and mirror ports. Power-on the server and open the SANsymphony-V Management Console. If everything’s fine, start the DataCore Storage Server and watch the mirror recoveries. Take a configuration backup and support bundles. Proceed with the next servers in the server group.

Final words

The process is quite simple. If you’re unsure about the correct steps, log a call at the DataCore support or take a look into the DataCore online help. Don’t try in-place upgrades to update the OS. I also don’t recommend to take images of running storage servers. Just reinstall the OS and use configuration backups to restore the configuration.

Shady upgrade path for NetApp ONTAP 7-Mode to cDOT

This posting is ~8 years years old. You should keep this in mind. IT is a short living business. This information might be outdated.

NetApp has offered Data ONTAP for some time in two flavours:

  • 7-Mode
  • Clustered Data ONTAP (cDOT)

With cDOT, NetApp has rewritten ONTAP nearly from scratch. The aim was to create an Storage OS, that leverages scale-out architecture and storage virtualization techniques, as well as providing non-disruptive operations. NetApp has needed some release cycles to get cDOT at that point, where it provides all features that customers know from 7-Mode. With Data ONTAP 8.3, NetApp has reached this point. Even Metrocluster is now supported. That’s a huge improvement and I’m glad that NetApp has made it. But NetApp wasted no time in cutting off old habits: With ONTAP 8.3, 7-Mode is no longer offered. Okay, no big deal. Customers can migrate from 7-Mode to cDOT. Yes, indeed. But it’s not that easy as you maybe think.

First of all: You can’t update to cDOT in-place. You have to wipe the nodes and re-install Data ONTAP. That makes it nearly impossible to migrate a running Filer without downtime and/ or buying or loaning additonal hardware. Most customers migrate to cDOT at the same time as they refresh the hardware. The data can be migrated on different ways. NetApp offers the 7-Mode Transition Tool (7MTT). 7MTT leverages SnapMirror to get the data from the 7-Mode to the cDOT Filer. But you can also use plain SnapMirror without 7MTT to migrate the data. The switchover from the old to the new volume is an offline process. The accessing servers have to be disconnected, and they must be connected to the new cDOT Filer and volume. 7MTT can only migrate NAS data! If you wish to migrate SAN data (LUNs), you have to use NetApps DTA2800 appliance or something like VMware Storage vMotion. Other migration techniques, like Storage vMotion, robocopy etc. can also be used.

I know that cDOT is nearly completely rewritten, but such migration paths are PITA. Especially if customers have just bought new equipment with ONTAP 8.1 or 8.2 and they now wish to migrate to 8.3.

Another pain point ist NetApps MetroCluster. With NetApp MetroCluster customers can deploy active/ active clusters between two sites up to 200 km apart. NetApp MetroCluster leverages SyncMirror to duplicate RAID groups to different disks. NetApp MetroCluster is certified for vSphere Metro Storage Cluster (vMSC). One can say that Metro cluster is a bestseller. I know many customers that use MetroCluster with only two nodes. That’s where a 2-node HA pair is cut in the middle and spread into to locations. Let’s assume that a customer is running a stretched MetroCluster with two nodes and Data ONTAP 8.2. The customer wants to migrate to ONTAP 8.3. This means, that he has to migrate to cDOT. No problem, because with ONTAP 8.3, cDOT offers support for NetApp MetroCluster.

  1. You can’t update to cDOT in-place. So either wipe the nodes or get (temporary) additional hardware.
  2. NetApp MetroCluster with cDOT requires a 2-node cluster at each of the two sites (four nodes in sum)

Especially when you look at the second statement, you will quickly realize that all customers that are running a 2-node MetroCluster, have to purchase additional nodes and disks. Otherwise they can’t use MetroCluster with cDOT. This allows only one migration path: Use ONTAP 8.2 with 7-Mode and wait until the hardware needs to be refreshed.

This is really bad… This is a shady upgrade path.

EDIT

NetApp is working hard to make the migration path better.

  • 7MTT is capable of migrating LUNs from 7DOT to cDOT in the newest Version
  • At NetApp Insight 2014 there was an announcement of 2-Node cDOT MetroCluster which will be released soon.

Thank you Sascha for this update.