<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>The Storage Architect &#187; Data Progression</title>
	<atom:link href="http://thestoragearchitect.com/tag/data-progression/feed/" rel="self" type="application/rss+xml" />
	<link>http://thestoragearchitect.com</link>
	<description>Storage, Virtualisation &#38; Cloud</description>
	<lastBuildDate>Tue, 07 Feb 2012 10:08:05 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>Review: Compellent Storage Center &#8211; Part II</title>
		<link>http://thestoragearchitect.com/2010/09/16/review-compellent-storage-center-part-ii/</link>
		<comments>http://thestoragearchitect.com/2010/09/16/review-compellent-storage-center-part-ii/#comments</comments>
		<pubDate>Thu, 16 Sep 2010 09:14:50 +0000</pubDate>
		<dc:creator>Chris M Evans</dc:creator>
				<category><![CDATA[Featured]]></category>
		<category><![CDATA[Reviews]]></category>
		<category><![CDATA[Storage]]></category>
		<category><![CDATA[Ulitzer]]></category>
		<category><![CDATA[Compellent]]></category>
		<category><![CDATA[Data Progression]]></category>
		<category><![CDATA[Dynamic Block Architecture]]></category>
		<category><![CDATA[Replays]]></category>
		<category><![CDATA[Storage Center]]></category>

		<guid isPermaLink="false">http://www.thestoragearchitect.com/?p=1870</guid>
		<description><![CDATA[<p>This is a series of posts reviewing the Compellent Storage Center SAN Array.  Previous Posts:</p> <a href="http://www.thestoragearchitect.com/2010/09/06/review-compellent-storage-center-part-i/" target="_blank">Review: Compellent Storage Center &#8211; Part I</a> <p>In this post we&#8217;ll discuss the logical configuration, connectivity and protocols available on the Compellent Storage Center array, including the way disks are grouped and LUNs are created from the underlying [...]<!--Begin ClixTrac.com Rotator Code -->
<script type="text/javascript" language="javascript" src="http://www.clixtrac.com/rotate/321"></script>
<!--End ClixTrac.com Rotator Code -->]]></description>
			<content:encoded><![CDATA[<p>This is a series of posts reviewing the Compellent Storage Center SAN Array.  Previous Posts:</p>
<ul>
<li><a href="http://www.thestoragearchitect.com/2010/09/06/review-compellent-storage-center-part-i/"  target="_blank">Review: Compellent Storage Center &#8211; Part I</a></li>
</ul>
<p>In this post we&#8217;ll discuss the logical configuration, connectivity and protocols available on the Compellent Storage Center array, including the way disks are grouped and LUNs are created from the underlying storage.</p>
<h3>Where&#8217;s My LUN?</h3>
<div id="attachment_1885" class="wp-caption alignleft" style="width: 310px"><a href="http://31.222.189.99/wp-content/uploads/2010/09/Compellent_Post_2_SS_1.png" ><img class="size-medium wp-image-1885" title="Dynamic Block Architecture" src="http://50.57.85.110/wp-content/uploads/2010/09/Compellent_Post_2_SS_1-300x178.png" alt="Dynamic Block Archiecture" width="300" height="178" /></a><p class="wp-caption-text">Dynamic Block Archiecture</p></div>
<p>The first thing that should be noted as we dive into the detail on how the Compellent array stores data is that it does not operate like traditional storage arrays with disks in fixed RAID groups and LUN configurations, but uses the previously mentioned Dynamic Block Architecture.  RAID 10 RAID-10 DM (Dual Mirror), RAID-5 and RAID-6 configurations are supported (including both RAID-5 with 5 or 9 drives in a RAID set and RAID-6 with 6 or 10 drives), however that&#8217;s where the similarity with traditional arrays ends.  The underlying physical disks are actually simply grouped together to provide raw disk capacity and LUNs are configured from that storage.  RAID is applied to each individual block of a LUN and this can change over the lifetime of that block of data.  At the outset this may seem like a complicated design but in reality it isn&#8217;t.  By breaking down a LUN to the block level and then applying protection and performance criteria, Compellent can achieve higher performance from a system using less cache and crucially less high performance drives. Let&#8217;s start at the basic disk level and work up to define how the Compellent system works.</p>
<h3>Disk Pools</h3>
<p>Physical disks are classified by their rotational speeds, which effectively groups them by performance.  In the review hardware, drives were classified as SLCSSD (for STEC SSD drives), 15K (the fibre channel drives) and 7K (the SAS drives).  By default all disks are added to a single group (or folder) called &#8220;Assigned&#8221;.  Disks that are not in use are assigned to a dummy folder called &#8220;Unassigned&#8221; from where they can be added to a new or existing disk folder.  Compellent recommend keeping a single disk folder, as spares must be assigned within each group. Having multiple folders would both waste spares from a capacity perspective and reduce performance as I/O would be spread over less spindles.  Of course it is possible to create separate groups if you wish.</p>
<p>As part of the disk folder definition, either single or dual parity must be specified for each tier.  Screenshots 2 &amp; 3 show the setup of the default &#8220;Assigned&#8221; group and a second &#8220;New Disk Folder 1&#8243;.  There are two other things to say about disk folders.  Firstly disks can be removed from a folder.  This requires &#8220;evacuating the disk&#8221; which can be achieved by moving it to another dummy folder.  If the disk contains no data, it can simply be removed from the folder.  Second, as disks are added to a folder, there&#8217;s the risk of RAID imbalance, with all of the data existing on the disks already in the disk  folder.  Therefore as disks are added, the RAID configuration can be rebalanced to obtain optimum use of all spindles.</p>
<h3>Storage Profiles</h3>
<p>The concept of Storage Profiles is where the Compellent Storage Center &#8220;secret sauce&#8221; is to be found.    These determine how the system writes data to disk for each LUN (or volume as they are known in Compellent terms) and how data ages over time &#8211; a feature called Data Progression.  Let&#8217;s look first at the profiles.</p>
<p>For each volume/LUN in an array, the Storage Profile determines how data is written to disk.  Storage profiles have two components, specification of where data should be written and specification on where Replay data should be located.  It&#8217;s worth taking a moment to understand what Replays are as I&#8217;ve yet to mention them.  Replays are essentially snapshots, used to return a volume to a previous point in time.  By their nature, Replay data blocks are only ever used for reads as all writes made after a snapshot/replay is taken will be written to a new location in order to preserve the replay for a potential future restore.  Replay blocks are therefore not part of the active write set of data being written to a LUN and don&#8217;t always need to reside on high performance storage; if they are being read frequently then they will reside in cache.  Storage Profiles allow the administrator to indicate what should happen to both writable blocks and replay blocks for a volume.  A high performance LUN could, for example, have its writable data on tier 1 storage with RAID-10 configuration and have replays on RAID-5 SAS.  A medium performance volume could have writable data on tier 2 15K Fibre Channel and replay data on SATA.</p>
<p>The use of Storage Profiles provides some very important benefits in optimising the performance of the disks in a Storage Center array.  They allow on a LUN basis, the exact performance criteria to be specified.  In addition, only active data is retained on the highest performance storage with inactive data moved off to lower performing (and lower cost) devices.  As RAID is established at the block level, this means a granularity on writes of 2MB in a standard configuration.  If required, volumes can be created using 512KB blocks where writes are small.</p>
<h3>
<div id="attachment_1886" class="wp-caption alignleft" style="width: 310px"><a href="http://31.222.189.99/wp-content/uploads/2010/09/Compellent_Post2_SS_2.png" ><img class="size-medium wp-image-1886" title="Data Progression" src="http://50.57.85.110/wp-content/uploads/2010/09/Compellent_Post2_SS_2-300x176.png" alt="Data Progression" width="300" height="176" /></a><p class="wp-caption-text">Data Progression</p></div>
<p>Replays and Data Progression</h3>
<p>Although I&#8217;ve touched on the subject of classifying data into active writes and replays, I haven&#8217;t explained the actual mechanism in which data moves between these groups.  There are two methods by which data is migrated between tiers of storage; via Data Instant Replay and through Data Progression.  Replays as we have discussed are Point-In-Time snapshots of volumes.  When a replay is taken, all of the pages comprising a volume are frozen.  Subsequent writes to the volume are made to new blocks on storage.  This preserves the data at the point the Replay was taken and also quite helpfully allows the blocks that are being actively written to be distinguished from those which are inactive.  The Replay blocks can then be moved to a lower tier of storage.  Compellent recommend that every volume has a Replay taken on a daily basis.</p>
<p>Data Progression uses a similar technique to move blocks of data that are less frequently used, down to lower tiers of storage over time.  Initially all writes are made to the highest tier of storage and over time, migrated to lower tiers based on frequency of access.  This occurs at the block level and means Storage Center arrays can be configured with the optimal mix of different drive types.  For instance, if more performance is required, SSD could be added; if more capacity is required, SATA can be added.</p>
<p>In summary, the Compellent design enables data placement to optimised to the block level, with less frequently accessed data moved to lower tiers of storage.  In normal circumstances the default settings can be used but if specific high performance volumes are needed, this can be accommodated too.</p>
<h3>Protocol Support</h3>
<p>One final topic, as this is becoming a long post.  Storage Center supports both Fibre Channel and iSCSI connectivity.  Unusually for storage arrays it allows both protocols to be mixed to a single volume at the same time; so I/O can be actively using both fibre channel and iSCSI.  If you have the right version of switch firmware, Fibre Channel also supports NPIV, which enables the creation of virtual ports on physical ports.  I hope to go over this in a future post.</p>
<p>In the next post I&#8217;ll discuss performance and some of the other miscellaneous features such as replication and clustering.</p>
]]></content:encoded>
			<wfw:commentRss>http://thestoragearchitect.com/2010/09/16/review-compellent-storage-center-part-ii/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Enterprise Computing: Do We Need FAST v1, EMC?</title>
		<link>http://thestoragearchitect.com/2009/10/18/enterprise-computing-do-we-need-fast-v1-emc/</link>
		<comments>http://thestoragearchitect.com/2009/10/18/enterprise-computing-do-we-need-fast-v1-emc/#comments</comments>
		<pubDate>Sun, 18 Oct 2009 09:50:45 +0000</pubDate>
		<dc:creator>Chris M Evans</dc:creator>
				<category><![CDATA[GestaltIT]]></category>
		<category><![CDATA[barry whyte]]></category>
		<category><![CDATA[binfile]]></category>
		<category><![CDATA[Compellent]]></category>
		<category><![CDATA[Data Progression]]></category>
		<category><![CDATA[EMC]]></category>
		<category><![CDATA[FAST]]></category>
		<category><![CDATA[HiCommand Tiered Storage Manager]]></category>
		<category><![CDATA[hitachi]]></category>
		<category><![CDATA[hyper]]></category>
		<category><![CDATA[ibm]]></category>
		<category><![CDATA[Optimizer]]></category>
		<category><![CDATA[Storage Tiering 1.0]]></category>
		<category><![CDATA[Storage Tiering 2.0]]></category>
		<category><![CDATA[Symmetrix]]></category>

		<guid isPermaLink="false">http://thestoragearchitect.com/?p=772</guid>
		<description><![CDATA[<p>So, here&#8217;s my rash statement from Twitter last night: &#8220;If FAST isn&#8217;t free, I don&#8217;t want it!  All it&#8217;s doing is automating process I could script/do manually&#8221;.  It&#8217;s a bold statement, I know, so is FAST really offering something better than what could be achieved today using EMC&#8217;s <a href="http://uk.emc.com/products/detail/software/symmetrix-optimizer.htm" >Symmetrix Optimizer</a>?</p> <p>Hot Spots</p> <p>EMC&#8217;s [...]<!--Begin ClixTrac.com Rotator Code -->
<script type="text/javascript" language="javascript" src="http://www.clixtrac.com/rotate/321"></script>
<!--End ClixTrac.com Rotator Code -->]]></description>
			<content:encoded><![CDATA[<p>So, here&#8217;s my rash statement from Twitter last night: <em><span style="color:#0000ff;">&#8220;If FAST isn&#8217;t free, I don&#8217;t want it!  All it&#8217;s doing is automating process I could script/do manually&#8221;</span></em>.  It&#8217;s a bold statement, I know, so is <strong>FAST</strong> really offering something better than what could be achieved today using EMC&#8217;s <a href="http://uk.emc.com/products/detail/software/symmetrix-optimizer.htm" >Symmetrix Optimizer</a>?</p>
<p><strong>Hot Spots</strong></p>
<p>EMC&#8217;s Symmetrix architecture (18 years old and counting, I believe) uses the concept of disk <strong>hypers</strong> to present LUNs.  Each physical disk is carved into a number of slices, which are then recombined to create LUNs to present to a host.  A mirrored (RAID-1) LUN uses two hypers, a RAID-5 (3+1) LUN uses 4.  EMC ensure general performance by setting standards on how LUNs are created from hypers and that&#8217;s reflected in a <strong>&#8220;binfile&#8221;</strong> layout.  However despite this sensible planning, it is possible (especially as hard drives are now much larger and contain many more hypers) that two hypers on a single physical disk could be highly active and so contend against each other &#8211; in other words <strong>&#8220;hot spots&#8221;</strong> on disk.</p>
<p>Optimizer helps alleviate the issue of hot spots by <strong>exchanging</strong> the high I/O hypers with low I/O ones, distributing busy LUNs across more physical spindles.  This is classic load balancing where resources are distributed across the available infrastructure in order to obtain better overall generic performance.  EMC have now rebranded Optimizer as part of <strong>Ionix</strong> for Storage Resource Managment, but it&#8217;s still effectively the same product.  Hyper swaps can be managed automatically, based on historical performance data.  They can also be user-defined &#8211; a manual swap at the users request.</p>
<p>Although tedious (and not as well automated as Hitachi&#8217;s HiCommand Tiered Storage Manager), in theory Optimizer could be used to manually move workload between storage tiers.  In fact, Optimizer is already aware of a tiered storage infrastructure.  Here&#8217;s a quote directly from the ControlCenter 6.1 manual:</p>
<blockquote><p><span style="color:#0000ff;">&#8220;Optimizer is also aware of physical drives that operate at different speeds, as well as location of the data on the physical media, which influences the I/O rate. This information is used when determining which logical devices to move.&#8221;</span></p></blockquote>
<p>So with a little bit of knowledge on the layout of data on a Symmetrix array, it would be possible today to use Optimizer to perform LUN-based FAST.</p>
<p><strong>Load-Balancing Versus Policy</strong></p>
<p>Unfortunately, simple load-balancing of I/O across a storage array doesn&#8217;t offer what should be seen as the next generation of storage tiering.  Where <strong>Storage Tiering 1.0</strong> was about offering multiple layers of storage within the same physical infrastructure and manually placing or moving LUNs to the appropriate tier, <strong>Storage Tiering 2.0</strong> will be about establishing policies that determine more service-based measurements of the performance and availability customers receive. </p>
<p>A policy-based approach would allow rules to be established on how <strong>data at the application layer</strong> moves between tiers.  This is a critical distinction from the load-balancing  methodology earlier described.  As an example, where an application was known to require higher performance at a certain time of day or day of the week, data could be moved proactively to a faster tier of storage, returning later once the high I/O workload had completed.  Whilst achievable using Optimizer, there&#8217;s no doubt the process of application migration would be tedious and time consuming.  I expect the v1.0 implementation of FAST will simply package up Optimizer into a tool that automates the migration of related data between tiers.  Don&#8217;t forget, other vendors have been <strong>offering this feature for some time</strong> &#8211; for example Hitachi and Tiered Storage Manager.</p>
<p><strong>Increasing Granularity</strong></p>
<p>Now LUN-based migration has its benefits.  Where large numbers of disks exist in an infrastructure, application data can be placed or moved to the most appropriate location as required.  However with the introduction of <strong>solid state disks</strong> (SSDs), a more granular approach is needed as the number of SSDs deployed in an array is likely to be low due to their excessive cost.  Moving an entire application (or even LUN) to SSD will be undesirable unless that application can take full use of the SSD hardware.  There are <strong>very few</strong>, if any, applications that require high-intensity read/write activity from every piece of application data all the time.</p>
<p>Block-level tiering offers a higher level of granularity to the placement of data.  A LUN can be split into blocks and placed across multiple layers of storage technology including traditional HDDs and faster SSDs.  Selective placement will ensure the more efficient use of expensive SSD media by placing only the highly active data onto it.</p>
<p>All of a sudden with increased granularity we&#8217;re back to Storage Tiering 1.0 where data is being placed on faster technology purely based on <strong>increasing overall system performance</strong>.  This is a feature <a href="http://www.compellent.com" >Compellent</a> have been offering for some time.  Data is migrated up or down the tier hierarchy on a <strong>daily basis</strong>, subject to performance figures over a 12-day period.  This level of granular performance management is possible because data is stored in a block-based structure.  Unfortunately for EMC, the <strong>hyper design legacy</strong> represents a technical challenge in making FAST version 2 a reality. </p>
<p><strong>Patent Rights</strong></p>
<p>As just mentioned, Compellent already offer block-based data migration in their products.  At a recent dinner in London with the Compellent team, they highlighted their strong position in the market, protected by patents covering block-level data migration between tiers.  You can find the filed patent <a href="http://www.patentstorm.us/patents/7398418/fulltext.html" >here</a>.  Compellent use the term <strong>&#8220;Data Progression&#8221;</strong> to describe how blocks are moved between tiers based on I/O activity.  As I/O activity is monitored over time, it is possible to determine the most appropriate tier of storage to use when expanding capacity.  Typically these are lower tier SATA drives, as initial performance requirements are usually over-estimated.  This metholodogy is very much Storage Tiering 1.0 discussed earlier.</p>
<p>Compellent aren&#8217;t the only people claiming rights to block-level tiering within a storage array.  I&#8217;ve also found the following <a href="http://www.patentstorm.us/patents/7421556/fulltext.html" >patent application</a> from <strong>IBM</strong>, filed by Barry Whyte, Steve Legg and others.  If IBM and Compellent both claim to have invented the FAST concept, how does that position EMC?  Do they have an earlier patent which trumps these two?</p>
<p><strong>Summary</strong></p>
<p>Storage Tiering 1.0 provides performance management of storage arrays.  Storage Tiering 2.0 extends this to offer policy-driven optimisation offerings.  Both of these technologies are available today from existing vendors in one format or another.  EMC will simply be playing catchup with these vendors once FAST 1 &amp; FAST 2 are released.  I&#8217;d like to be surprised and see EMC offer something the competition currently don&#8217;t.  I&#8217;m not holding my breath&#8230;</p>
]]></content:encoded>
			<wfw:commentRss>http://thestoragearchitect.com/2009/10/18/enterprise-computing-do-we-need-fast-v1-emc/feed/</wfw:commentRss>
		<slash:comments>14</slash:comments>
		</item>
	</channel>
</rss>

