<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>The Storage Architect &#187; Tuning Manager</title>
	<atom:link href="http://thestoragearchitect.com/tag/tuning-manager/feed/" rel="self" type="application/rss+xml" />
	<link>http://thestoragearchitect.com</link>
	<description>Storage, Virtualisation &#38; Cloud</description>
	<lastBuildDate>Wed, 23 May 2012 22:20:49 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>Problems Problems</title>
		<link>http://thestoragearchitect.com/2007/09/21/problems-problems/</link>
		<comments>http://thestoragearchitect.com/2007/09/21/problems-problems/#comments</comments>
		<pubDate>Fri, 21 Sep 2007 05:26:00 +0000</pubDate>
		<dc:creator>Chris M Evans</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[HDS]]></category>
		<category><![CDATA[NTFS]]></category>
		<category><![CDATA[problems]]></category>
		<category><![CDATA[Storage]]></category>
		<category><![CDATA[Tuning Manager]]></category>

		<guid isPermaLink="false">http://thestoragearchitect.wordpress.com/2007/09/21/problems-problems/</guid>
		<description><![CDATA[<p>This week I&#8217;ve been working on two interesting (ish) problems. Well, one more interesting than the other, one a case of the vendor needing to think about requirements more.</p> <p>Firstly, <a href="http://www.storagewiki.com/ow.asp?Tuning%5FManager" >Tuning Manager</a> (my old software <a rel="nofollow" href="http://en.wikipedia.org/wiki/Nemesis_%28mythology%29" >nemesis</a>) strikes again. Within Tuning Manager it is possible to track performance for all LUNs [...]<!--Begin ClixTrac.com Rotator Code -->
<script type="text/javascript" language="javascript" src="http://www.clixtrac.com/rotate/321"></script>
<!--End ClixTrac.com Rotator Code -->]]></description>
			<content:encoded><![CDATA[<p>This week I&#8217;ve been working on two interesting (ish) problems.  Well, one more interesting than the other, one a case of the vendor needing to think about requirements more.</p>
<p>Firstly, <a href="http://www.storagewiki.com/ow.asp?Tuning%5FManager" >Tuning Manager</a> (my old software <a rel="nofollow" href="http://en.wikipedia.org/wiki/Nemesis_%28mythology%29" >nemesis</a>) strikes again.  Within Tuning Manager it is possible to track performance for all LUNs in an array.  The gotcha I found this week is that the list of monitored LUNs represents only those allocated to hosts and is a static list which must be refreshed each time an allocation is performed!</p>
<p>This is just lack of thought on behalf of the developers not to provide a &#8220;track everything&#8221; option so it isn&#8217;t necessary to keep going into the product, selecting the agent, refreshing the LUN list and tagging them all over again.  No wonder allocations can take so long and be fraught with mistakes when Storage Admins have to include in their process the requirement to manually update the tuning product.  I&#8217;m still waiting for confirmation that there isn&#8217;t a way to automatically report on all LUNs.  If there isn&#8217;t then a product enhancement will be required to meet what I want.  In the meantime, I&#8217;ll have to ensure things are updated manually.  So if you configured Tuning Manager and the LUN list when you first installed an array, have a quick look to see if you&#8217;re monitoring everything or not.</p>
<p>I&#8217;m sure some of you out there will point out, with good reason, why HTnM doesn&#8217;t automatically scan all LUNs, but from my perspective, I&#8217;m never asked by senior management to monitor a performance issue *before* it has occurred, so I always prefer to have monitoring enabled for all devices and all subsystems if it doesn&#8217;t have an adverse affect on performance.</p>
<p>Second was an issue with the way NTFS works.  A number of filesystems on our SQL Server machines show high levels of fragmentation, despite there being plenty of freespace on the volumes in question.  This fragmentation issue seems to occur even when a volume is cleared and files are reallocated from scratch.</p>
<p>A quick trawl around the web found me various assertions  that NTFS deliberately leaves file clusters between files in order to provide an initial bit of expansion.  I&#8217;m not sure this is true as I can&#8217;t find a trusted source to indicate this is standard behaviour.  In addition I wonder if it the way in which some products allocate files; for instance if a SQL backup starts to create a backup file it has no real idea how big the file will become.  NTFS (I assume) will choose the largest block of freespace available and allocate the file there.  If another process allocates a file almost immediately, then it will get allocated just after the first file (which may only be a few clusters in size at this stage).  Then the first file gets extended and &#8220;leapfrogs&#8221; the second file, and so on, producing fragmentation in both files.</p>
<p>I&#8217;m not sure if this is what is happening, but if this is the way NTFS is working then it would explain the levels of fragmentation we see (some files have 200,000+ fragments in a 24GB file).  In addition, I don&#8217;t know for definite that the fragmentation is having a detremental impact on performance (these are SAN connected LUNs).  Everything is still speculation.  I guess I need to do more investigation&#8230;
<div class="blogger-post-footer">
<p>_uacct = &#8220;UA-1104321-2&#8243;;<br />
urchinTracker();
</p></div>
]]></content:encoded>
			<wfw:commentRss>http://thestoragearchitect.com/2007/09/21/problems-problems/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Performance Part V</title>
		<link>http://thestoragearchitect.com/2007/07/17/performance-part-v/</link>
		<comments>http://thestoragearchitect.com/2007/07/17/performance-part-v/#comments</comments>
		<pubDate>Tue, 17 Jul 2007 20:59:00 +0000</pubDate>
		<dc:creator>Chris M Evans</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[alerting]]></category>
		<category><![CDATA[performance]]></category>
		<category><![CDATA[Tuning Manager]]></category>

		<guid isPermaLink="false">http://thestoragearchitect.wordpress.com/2007/07/17/performance-part-v/</guid>
		<description><![CDATA[<p>Here&#8217;s the last of the performance measurements for now.</p> <p>Logical Disk Performance &#8211; monitoring of LDEVs. There are three main groups Tuning Manager can monitor; IOPS, throughput (transfer) and response time. The first two are specific to particular environments and the levels for those should be set to local array performance based on historical measurement [...]<!--Begin ClixTrac.com Rotator Code -->
<script type="text/javascript" language="javascript" src="http://www.clixtrac.com/rotate/321"></script>
<!--End ClixTrac.com Rotator Code -->]]></description>
			<content:encoded><![CDATA[<p>Here&#8217;s the last of the performance measurements for now.</p>
<p>Logical Disk Performance &#8211; monitoring of LDEVs.  There are three main groups Tuning Manager can monitor; IOPS, throughput (transfer) and response time.  The first two are specific to particular environments and the levels for those should be set to local array performance based on historical measurement over a couple of weeks.  Normal &#8220;acceptable&#8221; throughput could be anything from 1-20MB/s or 100-1000 IOPS.  It will be necessary to record average responses over time and use these to set preliminary alert figures.  What will be more important is response time.  I would expect reads and writes to 15K drives in a USP to perform at 5-10ms maximum (on average) and for 10K drives to perform up to 15ms maximum.  Obviously synchronous write response will have a dependency on the latency of writing to the remote array and that overhead should be added to the above figures.  Write responses will also be skewed by block size and number of IOPS</p>
<p>Reporting every bad LDEV I/O response could generate a serious number of alerts, especially if tens of thousands of IOPS are going through a busy array.  It is sensible to set reporting alerts high and reduce them over time until alerts are generated.  These can then be investigated (resolved as required) and the thresholds reduced further.  LDEV monitoring can also benefit from using Damping.  This option on an Alert Definition allows an alert to be generated only if a specific number of occurrences of an alert are received within a number of monitoring intervals.  So, for instance, an LDEV alert could be created when 2 alert occurrences are received within 5 intervals.  Personally I like the idea of Damping as I&#8217;ve seen plenty of host IOSTAT collections where a single bad I/O (or handful of bad I/Os) are highlighted as a problem when 1000s of good fast IOPS are going through the same host.</p>
<p>This is the last performance post for now.  I&#8217;m going to do some work looking at the agent commands for Tuning Manager, which as has been pointed out here previously, can provide more granular data and alerting (actually I don&#8217;t think I should have to run commands on an agent host, I think it should all be part of the server product itself, but that&#8217;s another story).
<div class="blogger-post-footer">
<p>_uacct = &#8220;UA-1104321-2&#8243;;<br />
urchinTracker();
</p></div>
]]></content:encoded>
			<wfw:commentRss>http://thestoragearchitect.com/2007/07/17/performance-part-v/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Performance Part IV</title>
		<link>http://thestoragearchitect.com/2007/07/16/performance-part-iv/</link>
		<comments>http://thestoragearchitect.com/2007/07/16/performance-part-iv/#comments</comments>
		<pubDate>Mon, 16 Jul 2007 21:38:00 +0000</pubDate>
		<dc:creator>Chris M Evans</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[chillis]]></category>
		<category><![CDATA[Port performance]]></category>
		<category><![CDATA[Tuning Manager]]></category>

		<guid isPermaLink="false">http://thestoragearchitect.wordpress.com/2007/07/16/performance-part-iv/</guid>
		<description><![CDATA[<p>Next on the performance hitlist is port tracking. This one is slightly more tricky to collect in Tuning Manager as HTnM uses absolute values for port throughput (Port IOPS and Port Transfer) in alerting, rather than relative values like % busy. This is a problem because the figures of both IOPS and throughput (KB/MB/s) will [...]<!--Begin ClixTrac.com Rotator Code -->
<script type="text/javascript" language="javascript" src="http://www.clixtrac.com/rotate/321"></script>
<!--End ClixTrac.com Rotator Code -->]]></description>
			<content:encoded><![CDATA[<p>Next on the performance hitlist is port tracking.  This one is slightly more tricky to collect in Tuning Manager as HTnM uses absolute values for port throughput (Port IOPS and Port Transfer) in alerting, rather than relative values like % busy.  This is a problem because the figures of both IOPS and throughput (KB/MB/s) will vary wildly depending on the traffic profile.</p>
<p>For instance, a high number of very small blocksize I/O will overwhelm the processor managing the port (resulting an seemingly low throughput) however a large blocksize could max out the storage port in terms of throughput (i.e. pushing a 2Gb/s FC link to its limit).  As processor busy isn&#8217;t available for alerting either, then I&#8217;d set a threshold limit on Port Transfer based on the capacity of the fibre channel interface.  For example, with a 2Gb/s port, throughput should never be over 50% (if you want 100% redundancy in case of path loss) so I&#8217;d set figures of 40% of capacity for warning and 50% for critical in HTnM alerting.  That&#8217;s 40MB/s and 100MB/s respectively.  These are high figures and sustained throughput at this level may require some load balancing or attention.  If these levels generate no alerting then it may be possible to reduce them to start highlighting peaks.</p>
<p>Choosing a figure for IOPS is more problematic.  I&#8217;d suggest picking an arbitrary level based on a few week&#8217;s data from all ports.  Set a limit based on historical data that&#8217;s likely to trigger the occasional altert but will not create critical errors continually.  Alerts can then be monitored and if there are alert trends, action can be taken.  Hopefully it should be then possible to start to reduce alerting thresholds as more issues are solved.</p>
<p>Although TrueCopy initiators can&#8217;t be monitored, RCU Targets can, so the same logic can be applied to throughput values.</p>
<p>On a totally unrelated subject, I have a recommendation to never de-seed chillis without gloves.  I&#8217;ve just removed the seeds from my first crop of the year (including the wonderful looking purple tiger) and my entire face is glowing where I&#8217;ve wiped it.  I have no idea how I&#8217;m now going to get my contact lenses out&#8230;
<div class="blogger-post-footer">
<p>_uacct = &#8220;UA-1104321-2&#8243;;<br />
urchinTracker();
</p></div>
]]></content:encoded>
			<wfw:commentRss>http://thestoragearchitect.com/2007/07/16/performance-part-iv/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Using Tuning Manager for something useful &#8211; Part I</title>
		<link>http://thestoragearchitect.com/2007/06/26/using-tuning-manager-for-something-useful-part-i/</link>
		<comments>http://thestoragearchitect.com/2007/06/26/using-tuning-manager-for-something-useful-part-i/#comments</comments>
		<pubDate>Tue, 26 Jun 2007 20:50:00 +0000</pubDate>
		<dc:creator>Chris M Evans</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[data storage]]></category>
		<category><![CDATA[HDS]]></category>
		<category><![CDATA[performance]]></category>
		<category><![CDATA[Tuning Manager]]></category>

		<guid isPermaLink="false">http://thestoragearchitect.wordpress.com/2007/06/26/using-tuning-manager-for-something-useful-part-i/</guid>
		<description><![CDATA[<p>Stephen <a href="http://blogs.rupturedmonkey.com/?p=102" >commented</a> about Tuning Manager and doing something useful with the data. I thought I would use this as an opportunity to highlight some of the things I look at on a regular basis (almost daily, including today in fact). Part I &#8211; Write Pending.</p> <p>First a little bit of background; Write Pending [...]<!--Begin ClixTrac.com Rotator Code -->
<script type="text/javascript" language="javascript" src="http://www.clixtrac.com/rotate/321"></script>
<!--End ClixTrac.com Rotator Code -->]]></description>
			<content:encoded><![CDATA[<p>Stephen <a href="http://blogs.rupturedmonkey.com/?p=102" >commented</a> about Tuning Manager and doing something useful with the data. I thought I would use this as an opportunity to highlight some of the things I look at on a regular basis (almost daily, including today in fact). Part I &#8211; Write Pending.</p>
<p>First a little bit of background;  Write Pending is a measure of unwritten write I/Os which are in cache and haven&#8217;t been destaged to disk.  A USP/9980V (and most other arrays) store write I/O s in cache and acknowledge successful completion to the host immediately.  The writes are then destaged to disk asynchronously some time later. </p>
<p>Caching writes (which I could be wrong, could have been invented by IBM and called Cache Fast Write or DASD Fast Write, I can&#8217;t remember which was which) provides the host with a consistent response and helps to remove the mechanical delay of writing to disk.  On average, most I/O will be a mixture of reads and writes and therefore an array can even out the write I/O to provide a more consistent I/O response.  I see consistent I/O response as one of the major benefits or features of enterprise arrays. </p>
<p>Tuning Manager collects Write Pending (expressed as a percentage of cache being used for write I/O) in two ways; you can use Performance Reporter and collect real-time data from the agent/command device talking to the array.  This can be plotted on a nice graph down to 10 second intervals.  You can also collect historical WP data, which is retrieved from the agent on an hourly basis, providing an average and maximum figure.  Hourly stats are collected for a maximum of 99 periods (just over 4 days) and can be aggregated into daily, weekly and monthly figures.</p>
<p>A rising and falling WP figure is not a problem and this is to be expected in the normal operation of an array, however WP can reach critical levels at which a USP will perform more aggressive destaging to disk; these figures are 30% and 40%.  Once WP reaches 70%, inflow control takes over and starts to elongate response times to host write requests to limit the amount of data coming into the array.  This obviously has a major impact on performance.  I would look to put monitoring in place within Tuning Manager to alert when WP reaches these figures.   Why would an array see high WP levels?  Clearly, all the issues are write I/O related but they could include high levels of backup, database loads, file transfers and so on.  It could also be an issue with data layout.</p>
<p>If an array can&#8217;t cope with WP reaching 30%+ then some tuning may be required;  consider distributing workload over a wider time period (all backups may have been set up to start at the same time, for instance) and investigate for write activity how that data is laid out;  distributing the writes over more LUNs and array groups could be a remedy.  It also may be possible that you have to choose faster array group disks for certain workloads and/or purchase additional cache.</p>
<p>I think Tuning Manager could do with a few improvements in terms of how Write Pending data is collected.  Firstly, unless I&#8217;m running Performance Reporter, I have no idea within an hourly period when the max WP was reached and for how long it stayed at this level.  Some kind of indicator would be useful &#8211; an hour is a long time to trawl through host data.</p>
<p>I&#8217;d also like to know, when the Max WP occurs, which LUNs had the most write I/O activity.  LDEV activity is also summarised on the hour so it&#8217;s not easy to see which LUNs within an hourly interval were those causing the WP issue;  as an example, an LDEV could have been extremely active for 10 minute and caused the problem but, on average could have done much less I/O than another LDEV which was consistently busy (with a low I/O rate) for the whole 1 hour period.  The averaging over 1 hour has a tendency to mask out peaks.</p>
<p>Finally, I&#8217;d like to be able to have triggers which can start more granular collection.  So when a critical WP issue occurs, I want to collect 1 minute I/O stats for the next hour to an external file.  I think it is possible to trigger a script from an alert, but I&#8217;m not clear on whether that trigger occurs at the point the max WP value is reached or whether it is triggered when the interval collection occurs on the hour (by which time I could be 59 minutes too late).</p>
<p>Next time, I&#8217;ll cover Side-file usage for those who run TrueCopy async.
<div class="blogger-post-footer">
<p>_uacct = &#8220;UA-1104321-2&#8243;;<br />
urchinTracker();
</p></div>
]]></content:encoded>
			<wfw:commentRss>http://thestoragearchitect.com/2007/06/26/using-tuning-manager-for-something-useful-part-i/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>Tuning Manager CLI</title>
		<link>http://thestoragearchitect.com/2007/05/01/tuning-manager-cli/</link>
		<comments>http://thestoragearchitect.com/2007/05/01/tuning-manager-cli/#comments</comments>
		<pubDate>Tue, 01 May 2007 19:47:00 +0000</pubDate>
		<dc:creator>Chris M Evans</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[HDS]]></category>
		<category><![CDATA[HiCommand]]></category>
		<category><![CDATA[performance]]></category>
		<category><![CDATA[Tuning Manager]]></category>

		<guid isPermaLink="false">http://thestoragearchitect.wordpress.com/2007/05/01/tuning-manager-cli/</guid>
		<description><![CDATA[<p>I&#8217;ve been working with the HiCommand Tuning Manager CLI over the last few days in order to get more performance information on 9900 arrays. Tuning Manager (5.1 in my case) just doesn&#8217;t let me present data in a format I find useful, and I suppose that&#8217;s not really surprising as, unless you&#8217;re going to add [...]<!--Begin ClixTrac.com Rotator Code -->
<script type="text/javascript" language="javascript" src="http://www.clixtrac.com/rotate/321"></script>
<!--End ClixTrac.com Rotator Code -->]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve been working with the HiCommand Tuning Manager CLI over the last few days in order to get more performance information on 9900 arrays.  Tuning Manager (5.1 in my case) just doesn&#8217;t let me present data in a format I find useful, and I suppose that&#8217;s not really surprising as, unless you&#8217;re going to add a complete reporting engine into the product, then you&#8217;ll be wanting to get the data out of the HTnM database and build your own bespoke reports. </p>
<p>So I had high hopes for the HTnM CLI, but I was unfortunately disappointed.  Yes, I can drag out port, LDEV, subsystem (cache etc) and array group details, however I can only extract one time period of records at a time.  I can display all the LDEVs for a specific hour, or a day (if I&#8217;ve been aggregating the data) but I can&#8217;t specify a date or time range.  This means I&#8217;ve had to script extracting and merging the data and the result is, it is sloooow.  Really slow.  One other really annoying feature &#8211; fields that report byte throughput sometimes report as &#8220;4.2 KB&#8221; sometimes as &#8220;5 MB&#8221; &#8211; which programmer thought a comma delimited output would want a unit suffix? </p>
<p>I&#8217;m expecting delivery of HTnM 5.5 (I think 5.5.3 to be specific) this week and here&#8217;s what I&#8217;m hoping to find; (a) the ability to report over date/time range (b) the database schema to be exposed for me to extract data directly.  I&#8217;m not asking much &#8211; nothing much more than other products offer.  Oh, and hopefully something considerably faster than now.
<div class="blogger-post-footer">
<p>_uacct = &#8220;UA-1104321-2&#8243;;<br />
urchinTracker();
</p></div>
]]></content:encoded>
			<wfw:commentRss>http://thestoragearchitect.com/2007/05/01/tuning-manager-cli/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

