<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>DelphiTools.info</title>
	<atom:link href="http://delphitools.info/feed/" rel="self" type="application/rss+xml" />
	<link>http://delphitools.info</link>
	<description>SamplingProfiler and other Delphi tools</description>
	<lastBuildDate>Sat, 06 Feb 2010 15:42:33 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Don&#8217;t abuse FreeAndNil anymore</title>
		<link>http://delphitools.info/2010/02/06/dont-abuse-freeandnil-anymore/</link>
		<comments>http://delphitools.info/2010/02/06/dont-abuse-freeandnil-anymore/#comments</comments>
		<pubDate>Sat, 06 Feb 2010 15:41:21 +0000</pubDate>
		<dc:creator>Eric</dc:creator>
				<category><![CDATA[Tips]]></category>
		<category><![CDATA[Debug]]></category>
		<category><![CDATA[Delphi]]></category>

		<guid isPermaLink="false">http://delphitools.info/?p=433</guid>
		<description><![CDATA[A recurring subject when it comes to freeing objects and preventing is whether you should just .Free them, thus leaving a invalid reference that should however never be used anymore when the code design is correct, or if you should defensively FreeAndNil() them, thus leaving a nil value that will hopefully trigger AVs more often [...]]]></description>
			<content:encoded><![CDATA[<p>A recurring subject when it comes to freeing objects and preventing is whether you should just .Free them, thus leaving a invalid reference that should however never be used anymore when the code design is correct, or if you should defensively FreeAndNil() them, thus leaving a nil value that will hopefully trigger AVs more often on improper usage after release.</p>
<p>Allen Bauer recently brought this subject in his blog &#8220;<a href="http://www.delphifeeds.com/postings/65203-a_case_against_freeandnil">A case against FreeAndNil</a>&#8220;, arguing that there are better tools than FreeAndNil to diagnose improper usage after release, and that it can hide other issues and lead to other magic bullet solutions, which only further the problem. This is true, and FastMM debug mode can do wonders here, however, quite often, you don&#8217;t want to rely on a debug and diagnostic machinery that needs to be switched ON for problems to be detected early on.</p>
<p>Well, if you&#8217;re using FreeAndNil() for defensive purposes, don&#8217;t abuse it anymore, invest in a few lines of code for a shiny new FreeAndInvalidate():</p>
<pre>procedure FreeAndInvalidate(var obj);
var
   temp : TObject;
begin
   temp := TObject(obj);
   Pointer(obj) := Pointer(1);
   temp.Free;
end;
</pre>
<p>This function frees the object and sets the reference to an invalid magic value, which will trigger and AV on improper field or virtual method access after release  (just like FreeAndNil), but unlike FreeAndNil, it will also AV on multiple .Free attempt, and will not be stopped by &#8220;if Assigned()&#8221; checks. If you wish even more defense, you can also &#8220;sabotage&#8221; the VMT pointer of the freed object instance.</p>
<p>With a FreeAndInvalidate() added to your bag of tricks, you can now reserve FreeAndNil usage to situations where having a nil reference is truly part of the design, and no longer abuse it for defensive programming. Of course <em>this is still no magic-bullet, but it&#8217;s cheap enough that you can use it in release builds</em> (unlike debug and diagnostic tools), and as a bonus, it makes it obvious when reading the code that the object reference is supposed to be invalid after the call.</p>
]]></content:encoded>
			<wfw:commentRss>http://delphitools.info/2010/02/06/dont-abuse-freeandnil-anymore/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>SamplingProfiler v1.7.4</title>
		<link>http://delphitools.info/2009/09/08/samplingprofiler-v1-7-4/</link>
		<comments>http://delphitools.info/2009/09/08/samplingprofiler-v1-7-4/#comments</comments>
		<pubDate>Tue, 08 Sep 2009 15:33:38 +0000</pubDate>
		<dc:creator>Eric</dc:creator>
				<category><![CDATA[News]]></category>
		<category><![CDATA[Delphi]]></category>
		<category><![CDATA[Download]]></category>
		<category><![CDATA[Profiler]]></category>
		<category><![CDATA[Silent]]></category>

		<guid isPermaLink="false">http://delphitools.info/?p=428</guid>
		<description><![CDATA[SamplingProfiler v1.7.4 is now available. This version adds an option for Delphi 2010 paths, and fixes a bug with the silent mode execution that would render it inoperative. There also have been other minor changes, mostly cosmetic.
This release also includes preparation for an &#8220;attach to process&#8221; option, which is currently not enabled, but should hopefully [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://delphitools.info/downloads/samplingprofiler-changelog/">SamplingProfiler v1.7.4</a> is now available. This version adds an option for Delphi 2010 paths, and fixes a bug with the silent mode execution that would render it inoperative. There also have been other minor changes, mostly cosmetic.</p>
<p>This release also includes preparation for an &#8220;attach to process&#8221; option, which is currently not enabled, but should hopefully make in the next version (available &#8220;when ready&#8221;).</p>
]]></content:encoded>
			<wfw:commentRss>http://delphitools.info/2009/09/08/samplingprofiler-v1-7-4/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>ZJDBGPack 2.0 improved &#8220;batch-ability&#8221;</title>
		<link>http://delphitools.info/2009/07/10/zjdbgpack-2-0-improved-batch-ability/</link>
		<comments>http://delphitools.info/2009/07/10/zjdbgpack-2-0-improved-batch-ability/#comments</comments>
		<pubDate>Fri, 10 Jul 2009 14:52:15 +0000</pubDate>
		<dc:creator>Eric</dc:creator>
				<category><![CDATA[News]]></category>
		<category><![CDATA[Command]]></category>
		<category><![CDATA[ZJDBGPack]]></category>

		<guid isPermaLink="false">http://delphitools.info/?p=419</guid>
		<description><![CDATA[An improved version of ZJDBGPack has been released, with better error messages and non-zero exit codes when an error occurs. This makes it more usable for batches and automated builds.
]]></description>
			<content:encoded><![CDATA[<p>An improved version of <a href="http://delphitools.info/other-tools/zjdbgpack/">ZJDBGPack</a> has been released, with better error messages and non-zero exit codes when an error occurs. This makes it more usable for batches and automated builds.</p>
]]></content:encoded>
			<wfw:commentRss>http://delphitools.info/2009/07/10/zjdbgpack-2-0-improved-batch-ability/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Profiling multi-threaded applications</title>
		<link>http://delphitools.info/2009/05/27/profiling-multi-threaded-applications/</link>
		<comments>http://delphitools.info/2009/05/27/profiling-multi-threaded-applications/#comments</comments>
		<pubDate>Wed, 27 May 2009 12:02:28 +0000</pubDate>
		<dc:creator>Eric</dc:creator>
				<category><![CDATA[Tips]]></category>
		<category><![CDATA[Bottleneck]]></category>
		<category><![CDATA[Command]]></category>
		<category><![CDATA[Monte-Carlo]]></category>
		<category><![CDATA[Multithreading]]></category>
		<category><![CDATA[OutputDebugString]]></category>
		<category><![CDATA[Profiler]]></category>
		<category><![CDATA[threadID]]></category>

		<guid isPermaLink="false">http://delphitools.info/?p=397</guid>
		<description><![CDATA[SamplingProfiler has a few options to help profile a multi-threaded application which I&#8217;ll go over here.
In the current version, those options allow identifying CPU-related bottlenecks, as in &#8220;threads taking too much CPU resources or execution time&#8221;. However, they do not provide much clues yet to pinpoint bottlenecks arising from thread synchronization issues or serialization (insufficient [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://delphitools.info/samplingprofiler/">SamplingProfiler</a> has a few options to help profile a multi-threaded application which I&#8217;ll go over here.</p>
<p>In the current version, those options allow identifying <em>CPU-related bottlenecks</em>, as in &#8220;threads taking too much CPU resources or execution time&#8221;. However, they do not provide much clues yet to pinpoint bottlenecks arising from thread synchronization issues or serialization (insufficient parallelism). Hopefully, more support for profiling multi-threaded applications will come in future versions.</p>
<p><strong>Single-threaded profiling</strong></p>
<p>By default, SamplingProfiler only looks at one thread, the main application thread, but you can manually (and dynamically) specify another thread. This is done via <em>OutputDebugString</em> (see <a href="http://delphitools.info/2009/03/02/control-sampling-from-your-code/">Control sampling from your code</a>)</p>
<pre style="padding-left: 30px;">OutputDebugString('SAMPLING THREAD threadID');
</pre>
<p>with <em>threadID</em> the thread ID (as returned from the WinAPI function <em>GetCurrentThreadID</em> f.i.). If you specify an invalid <em>threadID</em>, or if the thread dies, no more samples will be collected until you specify a new thread or &#8220;return&#8221; the sampling focus to the main thread, which can be accomplished with</p>
<pre style="padding-left: 30px;">OutputDebugString('SAMPLING THREAD 0');</pre>
<p>This command is mostly useful if you already have a clue which thread is proving troublesome, like when a worker thread is used in GUI interface. If you have several worker threads in a <a rel="nofollow" href="http://en.wikipedia.org/wiki/Thread_pool">thread pool</a>, which serve random workloads (or assumed random),  you can pick one of those threads (at random) and have it profiled.</p>
<p>However, this involves a fair amount of bias and guessing where the bottleneck could be, and is not really applicable if you have a high number of threads working (or sleeping) simultaneously on multiple CPUs. This is where comes in&#8230;</p>
<p><strong>Monte-Carlo Samples Gathering</strong></p>
<p><a href="http://en.wikipedia.org/wiki/Monte_Carlo_method">Monte-Carlo</a> sampling is specified via the samples gathering mode option, when set, SamplingProfiler will pick a random thread of the profiled application at each sampling, and use it for the sample. Bias and guessing are eliminated.</p>
<p>The good news is that with this method, the sampling load is not increased, and its impact is random: concurrency issues and UI bottlenecks can still be spotted. Hot-spots in a server running at production speed can be spotted too.</p>
<p>The bad news is that if you have a high number of inactive threads, you&#8217;ll have to gather more samples to get meaningful results on the active threads (as each time an inactive thread is picked at random, the sample will be meaningless, and thus lost).</p>
<p>Interpreting the profiling results can however be a little more difficult, as several multi-threading effects can come into play, for instance a drop in CPU cache efficiency (code stressed in highly threaded situations can behave quite differently from what it looks when stressed in single-threaded situation). This will be food for future articles.</p>
<p>To decide if a thread is active or not, SamplingProfiler looks at its registers: if all the registers are unchanged between two samplings, the thread is deemed inactive and the sample dropped. Inactivity can thus result from the thread being sleeping or waiting on some event, or just from having not gotten its share of CPU time since the last time it was sampled (this can be quite common if you have a much higher number of threads than you have CPU cores, even if all the threads are busy).</p>
<p><strong>CPU Affinity</strong></p>
<p>The last set of options is the one for <a rel="nofollow" href="http://en.wikipedia.org/wiki/Processor_affinity">processor affinities</a>. You can choose on which CPUs SamplingProfiler is constrained, and on which CPUs the profiled application is constrained.</p>
<p>Affinities can be used either to further isolate the profiled application from the profiler, or to easily simulate your application running on a machine with less cores. In more advanced scenarios, if you have enough CPU cores, you can also leave CPU cores entirely unused by both the profiler or the profiled, and thus reserve them to a third application (such as a database server).</p>
]]></content:encoded>
			<wfw:commentRss>http://delphitools.info/2009/05/27/profiling-multi-threaded-applications/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>SamplingProfiler v1.7.3 bug fix</title>
		<link>http://delphitools.info/2009/05/22/samplingprofiler-v173-bug-fix/</link>
		<comments>http://delphitools.info/2009/05/22/samplingprofiler-v173-bug-fix/#comments</comments>
		<pubDate>Fri, 22 May 2009 08:01:29 +0000</pubDate>
		<dc:creator>Eric</dc:creator>
				<category><![CDATA[News]]></category>
		<category><![CDATA[Changelog]]></category>
		<category><![CDATA[Download]]></category>
		<category><![CDATA[Profiler]]></category>

		<guid isPermaLink="false">http://delphitools.info/?p=409</guid>
		<description><![CDATA[SamplingProfiler v1.7.3 has now been released and should be used in place of 1.7.2 which was pulled.
1.7.2 had a nasty bug in the timings statistics (promptly spotted by Robert Houdart) which should be fixed in 1.7.3, there are no other changes and additions in this version.
]]></description>
			<content:encoded><![CDATA[<p><a href="http://delphitools.info/downloads/samplingprofiler-changelog/">SamplingProfiler v1.7.3</a> has now been released and should be used in place of 1.7.2 which was pulled.</p>
<p>1.7.2 had a nasty bug in the timings statistics (promptly spotted by Robert Houdart) which should be fixed in 1.7.3, there are no other changes and additions in this version.</p>
]]></content:encoded>
			<wfw:commentRss>http://delphitools.info/2009/05/22/samplingprofiler-v173-bug-fix/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>SamplingProfiler v1.7.2</title>
		<link>http://delphitools.info/2009/05/20/samplingprofiler-v172/</link>
		<comments>http://delphitools.info/2009/05/20/samplingprofiler-v172/#comments</comments>
		<pubDate>Wed, 20 May 2009 15:53:02 +0000</pubDate>
		<dc:creator>Eric</dc:creator>
				<category><![CDATA[News]]></category>
		<category><![CDATA[Affinity]]></category>
		<category><![CDATA[Changelog]]></category>
		<category><![CDATA[Download]]></category>
		<category><![CDATA[Profiler]]></category>

		<guid isPermaLink="false">http://delphitools.info/?p=404</guid>
		<description><![CDATA[SamplingProfiler v1.7.2 has now been released.
This version includes the following changes:

added an option to display line numbers in the source preview
extended the process CPU affinity options to allow individually selecting up to 16 cores

The UI has been slightly rearrange to accomodate the CPU affinity options (I guess I&#8217;ll need to find something prettier for those [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://delphitools.info/downloads/samplingprofiler-changelog/">SamplingProfiler v1.7.2</a> has now been released.</p>
<p>This version includes the following changes:</p>
<ul>
<li>added an option to display line numbers in the source preview</li>
<li>extended the process CPU affinity options to allow individually selecting up to 16 cores</li>
</ul>
<p>The UI has been slightly rearrange to accomodate the CPU affinity options (I guess I&#8217;ll need to find something prettier for those upcoming <a href="http://software.intel.com/en-us/blogs/2009/01/05/what-does-256-cores-look-like/">256 core CPUs</a>&#8230;). There may be other indirect minor changes.</p>
]]></content:encoded>
			<wfw:commentRss>http://delphitools.info/2009/05/20/samplingprofiler-v172/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Printable versions of the articles</title>
		<link>http://delphitools.info/2009/05/07/printable-versions-of-the-articles/</link>
		<comments>http://delphitools.info/2009/05/07/printable-versions-of-the-articles/#comments</comments>
		<pubDate>Thu, 07 May 2009 10:27:29 +0000</pubDate>
		<dc:creator>Eric</dc:creator>
				<category><![CDATA[News]]></category>
		<category><![CDATA[Site]]></category>
		<category><![CDATA[WordPress]]></category>

		<guid isPermaLink="false">http://delphitools.info/?p=391</guid>
		<description><![CDATA[There is something about WordPress that reminds me of Delphi, not the language or the IDE, but the spirit. On the one hand, its a convenient and comfortable environment out of the box, to which ready-made functionality can be easily added, and on the other hand everything under the hood is still accessible and tweakable [...]]]></description>
			<content:encoded><![CDATA[<p>There is something about <a href="http://wordpress.org/">WordPress</a> that reminds me of Delphi, not the language or the IDE, but the spirit. On the one hand, its a <a href="http://www.amazon.com/gp/product/0470402962?ie=UTF8&amp;tag=httpdelphiinf-20&amp;linkCode=as2&amp;camp=1789&amp;creative=9325&amp;creativeASIN=0470402962">convenient and comfortable</a> environment out of the box, to which ready-made functionality can be easily added, and on the other hand everything under the hood is still accessible and <a href="http://codex.wordpress.org/">tweakable</a> without having to fork in a major way (like Delphi, unlike most of the rest of them).</p>
<p>Anyway, I&#8217;ve added support for printable versions of the articles here, thanks to Lester Chan&#8217;s <a href="http://lesterchan.net/portfolio/programming/php/#wp-print">WP-Print</a>, a wee bit of CSS &amp; php tweaking is all it took. Hopefully the dead-tree lovers that asked for it should be satisfied, as well as those few that manually tried to append a &#8220;/print&#8221; to the url <img src='http://delphitools.info/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' /> </p>
]]></content:encoded>
			<wfw:commentRss>http://delphitools.info/2009/05/07/printable-versions-of-the-articles/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Code Optimization: Go For the Jugular</title>
		<link>http://delphitools.info/2009/05/06/code-optimization-go-for-the-jugular/</link>
		<comments>http://delphitools.info/2009/05/06/code-optimization-go-for-the-jugular/#comments</comments>
		<pubDate>Wed, 06 May 2009 05:00:04 +0000</pubDate>
		<dc:creator>Eric</dc:creator>
				<category><![CDATA[Tips]]></category>
		<category><![CDATA[asm]]></category>
		<category><![CDATA[Bottleneck]]></category>
		<category><![CDATA[Breakpoint]]></category>
		<category><![CDATA[CPU]]></category>
		<category><![CDATA[Delphi]]></category>
		<category><![CDATA[Optimization]]></category>
		<category><![CDATA[Profiler]]></category>

		<guid isPermaLink="false">http://delphitools.info/?p=343</guid>
		<description><![CDATA[Code optimization can sometimes be experienced as a lengthy process, with disruptive effects on code readability and maintainability. For effective optimization, it is crucial to focus efforts on areas where minimal work and minimal changes will have to most impact, ie. go for the jugular

The Prey
 
I will illustrate this using SamplingProfiler in a small [...]]]></description>
			<content:encoded><![CDATA[<p>Code optimization can sometimes be experienced as a lengthy process, with disruptive effects on code readability and maintainability. For effective optimization, it is crucial to focus efforts on areas where minimal work and minimal changes will have to most impact, ie. go for the jugular</p>
<p><br class="spacer_" /></p>
<h4>The Prey<strong><br />
 </strong></h4>
<p>I will illustrate this using <a href="http://delphitools.info/samplingprofiler/">SamplingProfiler</a> in a small example, taken from a small library that deals with short vectors of varying length (but usually less than 10 dimensions), which I simplified, isolated &amp; anonymized for the purpose of this article.</p>
<pre>uses TypInfo;

type
   TDoWhat = (dwInc, dwDec);

procedure DoSomething1(var data : array of Integer; what : TDoWhat);
var
   i : Integer;
begin
   for i:=Low(data) to High(data) do
   begin
      case what of
         dwInc : Inc(data[i]);
         dwDec : Dec(data[i]);
      else
         raise Exception.Create('Unsupported: '+GetEnumName(TypeInfo(TDoWhat), Integer(what)));
      end;
   end;
end;
</pre>
<p><br class="spacer_" /></p>
<h4>Get Meat into Belly</h4>
<p>Before starting any kind of optimization, one has to define goals and limits, ie. figure out what &#8220;good enough&#8221; will be rather consider  &#8220;good enough&#8221; to be the state of the code one has grown tired of optimizing it!</p>
<p>The sample code above is quite straightforward and simple. It would of course be possible to blow this code to huge proportions for optimization&#8217;s sake. If you are after getting every last drop of CPU-cycle juice, and allow yourself to use every trick in the book, a fully optimized version could represent several thousandths of lines of code (I&#8217;m not exaggerating). If it&#8217;s your core business, it <em>might</em> be okay, but if it&#8217;s just a utility library, the increased maintainability issues could never be justified.</p>
<p>But since this article is intended more as an illustration than a discussion on the methodology, I&#8217;ll get straight to the buffalo (beef). For further reading on that subject, you can start from <a href="http://en.wikipedia.org/wiki/Big_O_notation">Big O Notation</a>, <a href="http://en.wikipedia.org/wiki/Benchmarking">Benchmarking</a> and <a href="http://en.wikipedia.org/wiki/Software_metrics">Software metrics</a> articles in wikipedia, there are also whole <a href="http://www.amazon.com/gp/product/0201729156?ie=UTF8&amp;tag=httpdelphiinf-20&amp;linkCode=as2&amp;camp=1789&amp;creative=9325&amp;creativeASIN=0201729156">books</a> on the subject.</p>
<p><br class="spacer_" /></p>
<h4>Stalking the Prey</h4>
<p>Looking at the above code, the first obvious optimization that developers suggest seems to be taking the conditional out of the loop, resulting in several case-specific loops. On small vectors, this nets about a 30% speedup. For further speedups, the suggestions are typically to go for loop unrolling, asm, and other heavy-handed solutions that come with a significant development time and code complexity increase.</p>
<p>Of course, readers of this website will know better than to jump straight into the code and apply optimization recipes: they would run the code through a profiler first. And since we&#8217;re dealing with a single procedure, an instrumenting profiler would be of little help, so they would run Sampling Profiler instead, and would get to see something like this:</p>
<p style="text-align: center;"><img class="aligncenter size-full wp-image-349" title="Going For The Jugular - Initial Profiling Results" src="http://delphitools.info/wp-content/uploads/2009/04/jugular-1.png" alt="Going For The Jugular - Initial Profiling Results" width="581" height="281" /></p>
<p>In this run, only the dwInc case was stressed (line 37), and obviously the procedure spends less than 30% of its time doing what it was asked of, and most of its time (33%) on the &#8220;<em>end</em>&#8220;, ie. cleaning up, plus 8% setting up in &#8220;<em>begin</em>&#8220;. That&#8217;s 40%+ doing nothing but stack and setup/cleanup work!<br />
 The conditional in the loop that could have looked like the most worrying bit is eating a bit less than 20% of the time.</p>
<p>What is the source of all that <em>begin/end</em> work? Place a breakpoint on begin, run and hit Ctrl+Alt+C when your breakpoint is reached, go have a look at the CPU view, and you&#8217;ll see this:</p>
<p style="text-align: center;"><img class="aligncenter size-full wp-image-350" title="Going For The Jugular - CPU view near &quot;begin&quot;" src="http://delphitools.info/wp-content/uploads/2009/04/jugular-2.png" alt="Going For The Jugular - CPU view near &quot;begin&quot;" width="546" height="220" /></p>
<p>This is a fairly significant stack setup for such a small procedure, and those instructions with &#8220;<em>fs:</em>&#8221; at the bottom are the setting up of an (implicit) exception frame. An exception frame for what? if you haven&#8217;t guessed already, navigate your CPU view near the &#8220;<em>end</em>&#8221; line.<br class="spacer_" /></p>
<p style="text-align: center;"><img class="aligncenter size-full wp-image-351" title="Going For The Jugular - CPU view near &quot;end&quot;" src="http://delphitools.info/wp-content/uploads/2009/04/jugular-3.png" alt="Going For The Jugular - CPU view near &quot;end&quot;" width="447" height="306" /></p>
<p>No wonder &#8220;<em>end</em>&#8221; was a bottleneck! The call to <em>UStrArrayClr</em> indicates that the exception frame is here to cleanup several strings&#8230; these strings are those of the <em>raise Exception</em>, one is the string returned by <em>GetEnumName</em>, the other is the result of the concatenation passed to <em>Exception.Create</em>.</p>
<p><br class="spacer_" /></p>
<h4>Isolate and Kill</h4>
<p>How to get rid of that exception frame? One typical way is to use &#8220;Exception.CreateFmt&#8221;, and pass only constant strings to it, but that is not possible here with the call to <em>GetEnumName</em>, which returns a string. The other way is to isolate the exception to its own (nested) procedure:</p>
<pre>procedure RaiseUnsupported(what : TDoWhat);
begin
   raise Exception.Create('Unsupported: '+GetEnumName(TypeInfo(TDoWhat), Integer(what)));
end;</pre>
<p>and call <em>RaiseUnsupported</em> in the &#8220;<em>case else</em>&#8220;. Doing so will move the exception frame to the new procedure, where it&#8217;s irrelevant in terms of performance.<br />
 This simple change nets us a 33% speedup, ie. we reclaimed most of the lost time in <em>begin/end</em>! We also gained a bit from the <em>UStrArrayClr</em>, which did essentially nothing since those strings it was used to clear weren&#8217;t defined (as long as we did not hit the exception).</p>
<p>Note that if you use a nested procedure for <em>RaiseUnsupported</em>, you can be tempted not to pass it the &#8220;<em>what</em>&#8221; parameter, but use directly the &#8220;<em>what</em>&#8221; from its parent procedure. However by doing so, you&#8217;ll have the compiler use a special stack setup (so that the nested procedure can access the parent procedure&#8217;s variables). This setup will be faster than the exception frame it replaces, but with it, <em>begin/end</em> would still be taking about 18% of the CPU time spent in the procedure.</p>
<p><br class="spacer_" /></p>
<h4>Repeat Until Belly.Full;</h4>
<p>Those first 33% were easily gained. Let&#8217;s go for another round of SamplingProfiler:</p>
<p style="text-align: center;"><img class="aligncenter size-full wp-image-352" title="Going For The Jugular - Further Profiling Results" src="http://delphitools.info/wp-content/uploads/2009/04/jugular-4.png" alt="Going For The Jugular - Further Profiling Results" width="551" height="278" /></p>
<p>Things are more satisfying: the line performing the actual work is now taking up most of the CPU time. Second comes the <em>case of</em> line. For further speed improvements, we now need to move the conditional out of the loop:</p>
<pre>procedure DoSomething3(var data : array of Integer; what : TDoWhat);

   procedure RaiseUnsupported(what : TDoWhat);
   begin
      raise Exception.Create('Unsupported: '+GetEnumName(TypeInfo(TDoWhat), Integer(what)));
   end;

var
   i : Integer;
begin
   case what of
      dwInc :
         for i:=Low(data) to High(data) do
            Inc(data[i]);
      dwDec :
         for i:=Low(data) to High(data) do
            Dec(data[i]);
   else
      RaiseUnsupported(what);
   end;
end;</pre>
<p>We have increased the line count noticeably, but most of those extra lines are still cosmetic. What further makes it a reasonable trade-off is that the execution time has been reduced by 66% from the initial version, it now executes 3 times faster!</p>
<p>Are there any more easy gains to be had? Let&#8217;s run the last version through SamplingProfiler:</p>
<p style="text-align: center;"><img class="aligncenter size-full wp-image-355" title="Going For The Jugular - Final Profiling Results" src="http://delphitools.info/wp-content/uploads/2009/04/jugular-5.png" alt="Going For The Jugular - Final Profiling Results" width="551" height="296" /></p>
<p>More than 92% of the execution time now goes to the loop and actual work. We got only a wee bit left for stack setup (line 96) and the <em>case of</em> (line 97). At this point, the above makes it clear that if you want to go faster you&#8217;ll have to increase the line count and code complexity significantly as you&#8217;ll need to replace the two-liner loops with something else, which is bound to be heavier (unrolling, SIMD, etc.)</p>
<p><br class="spacer_" /></p>
<h4>Rest Under A Tree</h4>
<p>Some quick final notes to conclude.</p>
<p>When moving an exception to a procedure, there are two things to keep in mind:</p>
<ul>
<li>the exception will be triggered at another place in the code, to know where it was actually triggered, you&#8217;ll have to look up one step in your exception log stack trace&#8230; You do have an exception log stack trace in place, don&#8217;t you?</li>
<li>the compiler won&#8217;t &#8220;know&#8221; about the exception in the called procedure, so it will assume execution continues after your <em>RaiseUnsupported</em>, so you may want to place an <em>Exit</em> after it (which will never be reached), to avoid warnings and allow the occasional register optimization by the compiler.</li>
</ul>
<p>In the final version, we gained more than the previous profiling run hinted at: the new code allowed the compiler to make better use of the registers. Ofttimes, getting the fat out of the way is all you need to see improvements.</p>
<p>If you check the CPU view, you&#8217;ll see everything is quite efficient now, but even then, using all the remaining tricks in the book could probably net noteworthy gains, just at a significant complexity increase. I didn&#8217;t try, but I would guess a 2x or 3x speed up should be about right.</p>
<p>If you were to need to go that route, SamplingProfiler could still help you there: on ASM code, you get profiling data down to the ASM instruction&#8230; but that&#8217;s food for another article.</p>
]]></content:encoded>
			<wfw:commentRss>http://delphitools.info/2009/05/06/code-optimization-go-for-the-jugular/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>ZJDBGPack re-release</title>
		<link>http://delphitools.info/2009/05/04/zjdbgpack-re-release/</link>
		<comments>http://delphitools.info/2009/05/04/zjdbgpack-re-release/#comments</comments>
		<pubDate>Mon, 04 May 2009 08:35:42 +0000</pubDate>
		<dc:creator>Eric</dc:creator>
				<category><![CDATA[News]]></category>
		<category><![CDATA[Command]]></category>
		<category><![CDATA[Debug]]></category>
		<category><![CDATA[Delphi]]></category>
		<category><![CDATA[Download]]></category>
		<category><![CDATA[JCL]]></category>
		<category><![CDATA[Profiler]]></category>
		<category><![CDATA[Size]]></category>
		<category><![CDATA[ZJDBGPack]]></category>

		<guid isPermaLink="false">http://delphitools.info/?p=376</guid>
		<description><![CDATA[ZJDBGPack is again available, but as an independent download (it used to be bundled with SamplingProfiler).
This is a command-line utility intended for use in a build process or from the Delphi tools menu, whose purpose is to integrate debug information into an executable. The debug information format  is a compressed version of JCL&#8217;s JDBG.
As of [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://delphitools.info/other-tools/zjdbgpack/">ZJDBGPack</a> is again available, but as an independent download (it used to be bundled with SamplingProfiler).</p>
<p>This is a command-line utility intended for use in a build process or from the Delphi tools menu, whose purpose is to integrate debug information into an executable. The debug information format  is a compressed version of <a href="http://jcl.delphi-jedi.org/">JCL</a>&#8217;s JDBG.</p>
<p>As of know, <a href="http://delphitools.info/samplingprofiler/">SamplingProfiler</a> is the only published utility that understands this format, so you can use it either to reduce the size of the executables you deploy for profiling purposes, or if you do not want to deploy directly-readable debug information files.</p>
]]></content:encoded>
			<wfw:commentRss>http://delphitools.info/2009/05/04/zjdbgpack-re-release/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Knowing what and when to optimize&#8230;</title>
		<link>http://delphitools.info/2009/04/20/knowing-what-and-when-to-optimize/</link>
		<comments>http://delphitools.info/2009/04/20/knowing-what-and-when-to-optimize/#comments</comments>
		<pubDate>Mon, 20 Apr 2009 10:11:51 +0000</pubDate>
		<dc:creator>Eric</dc:creator>
				<category><![CDATA[Tips]]></category>
		<category><![CDATA[Algorithms]]></category>
		<category><![CDATA[Bottleneck]]></category>
		<category><![CDATA[Compiler]]></category>
		<category><![CDATA[Delphi]]></category>
		<category><![CDATA[Forums]]></category>
		<category><![CDATA[Methodologies]]></category>
		<category><![CDATA[Optimization]]></category>

		<guid isPermaLink="false">http://delphitools.info/?p=338</guid>
		<description><![CDATA[&#8230;is as important as knowing how to optimize.
In this thread on the Delphi forums Ante Bonic brought back to intention this excellent Delphi Optimization Guide in Delphi article by Robert Lee. The article has aged a bit, but many tips remain true with the Delphi 2009 compiler (sadly so).  Like many optimization articles, Robert&#8217;s focuses [...]]]></description>
			<content:encoded><![CDATA[<p>&#8230;is as important as knowing how to optimize.</p>
<p>In <a href="https://forums.codegear.com/thread.jspa?threadID=15794">this thread</a> on the Delphi forums Ante Bonic brought back to intention this excellent <a href="http://effovex.com/OptimalCode/opguide.htm">Delphi Optimization Guide</a> in Delphi article by Robert Lee. The article has aged a bit, but many tips remain true with the Delphi 2009 compiler (sadly so).  Like many optimization articles, Robert&#8217;s focuses on mostly local optimization tips, which can draw in warnings like this one one by <a href="http://web.telia.com/~u16122508/anders.htm">Anders Isaksson</a>:</p>
<p style="padding-left: 30px;"><em>Optimization should be done after profiling, not before.</em></p>
<p>Which I couldn&#8217;t agree more with. But to be fair, Robert&#8217;s <a href="http://effovex.com/OptimalCode/general.htm#flex">states so</a> in his article, as do most authors of optimization articles. Recipes and local optimization tips are to be used <em>after</em> all algorithmic and data structures improvements have been taken advantage off.</p>
<p>If one can list tips and tricks for local optimization, do&#8217;s and don&#8217;ts that are true often enough to be good tips in many scenarios. However, it&#8217;s practically impossible to come up with a &#8220;reusable&#8221; list of tips for algorithms and data structures. Too many specifics can come together, even when the problems are similar, considerations of scale or reactivity can drastically influence architectural and algorithmic options.</p>
<p>Hence the most visible optimization recipes are often local optimization ones, but mostly because there are few global optimization recipes. You only have global optimization <em>methodologies</em>. But even these methodologies can usually be summarized with few words:</p>
<ol>
<li>Time, profile, analyze and confirm your bottlenecks.</li>
<li>Improve algorithms &amp; data structures.</li>
<li>Exhaust 1 &amp; 2 before looking at local optimizations, and then don&#8217;t forget 1.</li>
</ol>
<p>To optimize <em>efficiently</em>, ie. not waste <em>your</em> time, you have to master the first point.<br />
 To optimize <em>effectively</em>, ie. not waste the <em>machine</em> time, you have to master the second.</p>
<p>And the third point you ask? It&#8217;s a razor&#8217;s edge, when applied effectively, it can be very efficient, with very few changes like <a href="http://www.bobswart.nl/Weblog/Blog.aspx?RootId=5:3294">in this case</a>, but if not, it&#8217;s a good way to end up <a href="http://www.google.fr/url?sa=t&amp;source=web&amp;ct=res&amp;cd=1&amp;url=http%3A%2F%2Fthedailywtf.com%2F&amp;ei=NUfsSdD-M4OOjAeWj6mfCg&amp;usg=AFQjCNETv_WJf9NiC7VH982LdL3oV3PUPQ&amp;sig2=lihwWkBHJ7-_Nr9v2sV85w">there</a>. To be effective, local optimization has to be about taking care of hidden machinery, hidden shortcomings of the compiler, hidden algorithms and data-structures <em>that get in the way</em>.</p>
<p>I&#8217;ll close this post by quoting Robert Lee&#8217;s article on timing:</p>
<p style="padding-left: 30px;"><em>Timing code is generally called &#8220;profiling&#8221;. If you want              to improve the performance of your code, you first need to know precisely              what that performance </em><em><strong>is</strong>. Additionally, you need to re-measure              with each change you apply to your code. Do not spend a single second              twiddling code to improve performance until you have analytically              determined exactly where the application is spending its time. I cannot              emphasize this enough.</em></p>
]]></content:encoded>
			<wfw:commentRss>http://delphitools.info/2009/04/20/knowing-what-and-when-to-optimize/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

<!-- Dynamic page generated in 0.840 seconds. -->
<!-- Cached page generated by WP-Super-Cache on 2010-03-11 15:48:07 -->
<!-- Compression = gzip -->