Background
The physical capacity of storage arrays continues to grow at an enormous rate, year on year. Using EMC as a benchmark, we can see that a single array has grown over the years;
  • Symmetrix 3430 – 96 drives, 0.84TB
  • Symmetrix 5500 – 128 drives, 1.1TB
  • Symmetrix 8830 – 384 drives, 69.5TB
  • DMX3000 – 576 drives, 76.5TB
  • DMX-4 – 1920 drives, 1054TB

Note: these figures are indicative only!

DMX-3 and DMX-4 introduced arrays which scale to petabytes (1000TB) of available raw capacity. At some point, these petabyte arrays will need to be replaced and will represent a unique challenge to today’s storage managers. Here’s why.

Doing The Maths

From my experience, storage migrations from array to array can be complex and time consuming. Issues include:

  • Identifying all hosts for migration
  • Identifying all owners for storage
  • Negotiating migration windows
  • Gap analysis on driver, firmware, O/S, patch levels
  • Change Control
  • Migration Planning
  • Migration Execution
  • Cleanup

With all of of the above work to do, it’s not surprising that realistically, around 10 servers per week is a good estimate of the capability of a single FTE (Full Time Equivalent, e.g. a storage guy). Some organisations may find this figure can be pushed higher, but I’m talking about one person, day in day out, performing this work, so I’ll stick with my 10/week figure.

Assume an array has 250 hosts, each of an average 500GB, then this equates to about 125TB of data and almost 6 month’s effort for our single FTE! In addition, the weekly migration schedule requires moving on average 5TB of data. If the target array differs from the source (e.g. a new vendor, different LUN size) then the migration task can be time consuming and complex to execute.


Look at the following diagram. It shows the lifecycle of physical storage in an array over time. Initially the array is deployed and storage configured. Over the lifetime of the array, more storage is added and presented to hosts until either the array reaches a maximum physical capacity or an acceptable capacity threshold. This remains until migrations start to take place to another array. Up to the point migrations take place, storage is added and paid for as required, however once migrations start, there is no refund from the vendor for the unused resources (those represented in green). They have been purchased but remain unused until the entire array is decommissioned. If the decommissioning process is lengthy then the amount of unused resources becomes high, especially on petabyte arrays. Imagine a typical 4-year lifecycle; up to 1 year could be spent moving host to new arrays – at significant cost in terms of manpower and impact to the business.

Solutions

So how should we adapt migration processes to handle the issue of migrating these monster arrays?

  • Establish Standards.  This is an age old issue but one that comes up time and time again.  Get your standards right.  These include consistent LUN sizes, naming standards and support matrix (compatibility) standards.
  • Consider Virtualisation. Products including SVC, USP, InVista (EMC) and iNSP (Incipient) all allow the storage layer to be virtualised.  This can assist in the migration process.
  • Keep Accurate Records.  This may seem a bit obvious but it is amazing the number of sites who don’t know how to contact the owner of some of the servers connected to their storage.
  • Talk to Your Customers.  Migrations inevitably result in server changes and potentially an outage.  Knowing your customer and keeping them in the loop regarding change planning saves a significant amount of hassle.
Technology replacement is now part of standard operational work.  Replacing hardware is not all about technology; procedures and common sense will form a more and more important part of the process.

5 Responses to Enterprise Computing: Migrating Petabyte Arrays

  1. Chris "Saundo" Saunderson says:

    I’ve got another for you:

    1) Do currency!

    Don’t allow your volume managers, filesystems, operating systems and HBA/drivers to get too far out of date: support matrices are your friend on deciding what and when to replace the PB array with.

    Working in a site that has 5+PB of disk, the biggest problem we run into is the lack of currency/patching of OS’/volume managers. Sneaky stuff like the number of PVLinks in a disk group can be worked around, but getting into the quagmire of upgrading volume managers kills any timeline, especially when conflicting support matrices end up forcing OS upgrades along with the VM or HBA upgrades!

  2. Pete Steege says:

    Chris,
    If your scenario above were in a 100% virtualized server environment, how difficult would the migration be? What would it look like?

  3. tushar says:

    Chris,
    Vendors like qlogic have come up with an appliance which sits right there in the storage path and continues online migration to heterogeneous arrays – so there is no downtime and heterogeneity is addressed and negligible performance impact.

    • Chris Evans says:

      Tushar

      Have these kind of appliances ever taken off? Firstly they have to be inserted into the data path, which can be a time consuming process and what about data integrity? One solution I think was most elegant was Softek’s (now IBM) TDMF, but that relied on a host feature to make migrations work; of course on the mainframe there are only a handful of systems (or LPARs). Migration from large arrays could involve hundreds of hosts.

      Chris

  4. Chris Evans says:

    Pete, great question – I have a number of answers to that – I’ll put them together in a separate post.

Leave a Reply

Looking for something?

Use the form below to search the site:


Still not finding what you're looking for? Drop a comment on a post or contact us so we can take care of it!

Visit our friends!

A few highly recommended friends...