I had a tweet this morning from Leo Tudisca following on from the latest Infosmack Deep Dive podcast discussing storage and virtualisation. Leo’s question was regarding sub-LUN virtual machine replication and how it can be achieved. This requirement is something I’ve been mulling over for some time and can’t remember if I’ve committed my thoughts to words. If I have already, then I apologise for repeating myself. If not, then here’s my idea.
First of all we need to take a step back and look at how LUN updates are serialised. I/O integrity to a single LUN is achieved using the SCSI RESERVE command. This allows an initiator (i.e. a host) to lock out a LUN from other updates. This feature is especially necessary in clustered environments where a single LUN needs to be owned by a single cluster member during update. Maintaining a reserve on a LUN prevents concurrent and conflicting updates from occurring (although strictly speaking updates can’t occur concurrently). The “Reserve -> Update -> Release” process works well with environments where one or two hosts perform updates but in the virtual world where many hypervisors could be wanting to update large LUNs then the reserve process became a bottleneck.
SCSI reserve is essentially the same as the mainframe RESERVE macro, used to lock out a LUN to one specific address space and/or LPAR. Unfortunately it’s rather a blunt tool in terms of managing data integrity and as volumes or LUNs increase in size, then the level of contention also increases, having a direct impact on performance and throughput.
The mainframe solution was to use Enqueues, essentially a “gentleman’s agreement” between the LPARs to share information on access at a more granular level. Enqueues work on individual files and have a scope – they can be locked locally or globally. This means a file can be reserved out within a single LPAR or across all LPARs sharing enqueue information. LPARs implementing enqueues don’t then use RESERVE and so eliminate the performance overhead of constant volume reservation.
COMPARE and WRITE
VAAI within VMware uses the new COMPARE and WRITE command to perform a similar task to enqueues. It enables data to be updated by one hypervisor by validating the contents are what was expected before the update. The whole compare and write process has to happen as one operation (or atomically) to guarantee consistency. The hypervisors confer with each other to maintain their “gentleman’s agreement” on who currently owns a virtual guest.
Virtual Machine Replication
Where does that leave us in terms of replicating virtual machines? Well today replication is performed by the array, applying updates to the remote copy of a LUN in the order the updates were applied to the primary LUN. What if an array manufacturer allowed both copies of a LUN to be read/write? In this scenario hypervisors in two locations could update a single LUN over distance. In order to maintain integrity, the COMPARE and WRITE command could be extended to include writing to the remote LUN as part of the single atomic operation. The hypervisors then still retain their “gentleman’s agreement” on what’s being updated at any point in time.
Of course the above definition seems simple, but we have to consider latency and as distances increase, latency is likely to be the reason this solution breaks down. Remote replication would need to be synchronous to guarantee integrity and so distance has a direct impact on performance. Alternatively asynchronous replication could be used, with the understanding that data integrity isn’t completely guaranteed.
I’m looking forward to seeing how the vendors intend to tackle the LUN replication issue as it’s one we’ve all been waiting for, for a long time.