Archive

Posts Tagged ‘Delphi’

Don’t abuse FreeAndNil anymore

February 6th, 2010

A recurring subject when it comes to freeing objects and preventing is whether you should just .Free them, thus leaving a invalid reference that should however never be used anymore when the code design is correct, or if you should defensively FreeAndNil() them, thus leaving a nil value that will hopefully trigger AVs more often on improper usage after release.

Allen Bauer recently brought this subject in his blog “A case against FreeAndNil“, arguing that there are better tools than FreeAndNil to diagnose improper usage after release, and that it can hide other issues and lead to other magic bullet solutions, which only further the problem. This is true, and FastMM debug mode can do wonders here, however, quite often, you don’t want to rely on a debug and diagnostic machinery that needs to be switched ON for problems to be detected early on.

Well, if you’re using FreeAndNil() for defensive purposes, don’t abuse it anymore, invest in a few lines of code for a shiny new FreeAndInvalidate():

procedure FreeAndInvalidate(var obj);
var
   temp : TObject;
begin
   temp := TObject(obj);
   Pointer(obj) := Pointer(1);
   temp.Free;
end;

This function frees the object and sets the reference to an invalid magic value, which will trigger and AV on improper field or virtual method access after release  (just like FreeAndNil), but unlike FreeAndNil, it will also AV on multiple .Free attempt, and will not be stopped by “if Assigned()” checks. If you wish even more defense, you can also “sabotage” the VMT pointer of the freed object instance.

With a FreeAndInvalidate() added to your bag of tricks, you can now reserve FreeAndNil usage to situations where having a nil reference is truly part of the design, and no longer abuse it for defensive programming. Of course this is still no magic-bullet, but it’s cheap enough that you can use it in release builds (unlike debug and diagnostic tools), and as a bonus, it makes it obvious when reading the code that the object reference is supposed to be invalid after the call.

Eric Tips ,

SamplingProfiler v1.7.4

September 8th, 2009

SamplingProfiler v1.7.4 is now available. This version adds an option for Delphi 2010 paths, and fixes a bug with the silent mode execution that would render it inoperative. There also have been other minor changes, mostly cosmetic.

This release also includes preparation for an “attach to process” option, which is currently not enabled, but should hopefully make in the next version (available “when ready”).

Eric News , , ,

Code Optimization: Go For the Jugular

May 6th, 2009

Code optimization can sometimes be experienced as a lengthy process, with disruptive effects on code readability and maintainability. For effective optimization, it is crucial to focus efforts on areas where minimal work and minimal changes will have to most impact, ie. go for the jugular


The Prey

I will illustrate this using SamplingProfiler in a small example, taken from a small library that deals with short vectors of varying length (but usually less than 10 dimensions), which I simplified, isolated & anonymized for the purpose of this article.

uses TypInfo;

type
   TDoWhat = (dwInc, dwDec);

procedure DoSomething1(var data : array of Integer; what : TDoWhat);
var
   i : Integer;
begin
   for i:=Low(data) to High(data) do
   begin
      case what of
         dwInc : Inc(data[i]);
         dwDec : Dec(data[i]);
      else
         raise Exception.Create('Unsupported: '+GetEnumName(TypeInfo(TDoWhat), Integer(what)));
      end;
   end;
end;


Get Meat into Belly

Before starting any kind of optimization, one has to define goals and limits, ie. figure out what “good enough” will be rather consider  “good enough” to be the state of the code one has grown tired of optimizing it!

The sample code above is quite straightforward and simple. It would of course be possible to blow this code to huge proportions for optimization’s sake. If you are after getting every last drop of CPU-cycle juice, and allow yourself to use every trick in the book, a fully optimized version could represent several thousandths of lines of code (I’m not exaggerating). If it’s your core business, it might be okay, but if it’s just a utility library, the increased maintainability issues could never be justified.

But since this article is intended more as an illustration than a discussion on the methodology, I’ll get straight to the buffalo (beef). For further reading on that subject, you can start from Big O Notation, Benchmarking and Software metrics articles in wikipedia, there are also whole books on the subject.


Stalking the Prey

Looking at the above code, the first obvious optimization that developers suggest seems to be taking the conditional out of the loop, resulting in several case-specific loops. On small vectors, this nets about a 30% speedup. For further speedups, the suggestions are typically to go for loop unrolling, asm, and other heavy-handed solutions that come with a significant development time and code complexity increase.

Of course, readers of this website will know better than to jump straight into the code and apply optimization recipes: they would run the code through a profiler first. And since we’re dealing with a single procedure, an instrumenting profiler would be of little help, so they would run Sampling Profiler instead, and would get to see something like this:

Going For The Jugular - Initial Profiling Results

In this run, only the dwInc case was stressed (line 37), and obviously the procedure spends less than 30% of its time doing what it was asked of, and most of its time (33%) on the “end“, ie. cleaning up, plus 8% setting up in “begin“. That’s 40%+ doing nothing but stack and setup/cleanup work!
The conditional in the loop that could have looked like the most worrying bit is eating a bit less than 20% of the time.

What is the source of all that begin/end work? Place a breakpoint on begin, run and hit Ctrl+Alt+C when your breakpoint is reached, go have a look at the CPU view, and you’ll see this:

Going For The Jugular - CPU view near "begin"

This is a fairly significant stack setup for such a small procedure, and those instructions with “fs:” at the bottom are the setting up of an (implicit) exception frame. An exception frame for what? if you haven’t guessed already, navigate your CPU view near the “end” line.

Going For The Jugular - CPU view near "end"

No wonder “end” was a bottleneck! The call to UStrArrayClr indicates that the exception frame is here to cleanup several strings… these strings are those of the raise Exception, one is the string returned by GetEnumName, the other is the result of the concatenation passed to Exception.Create.


Isolate and Kill

How to get rid of that exception frame? One typical way is to use “Exception.CreateFmt”, and pass only constant strings to it, but that is not possible here with the call to GetEnumName, which returns a string. The other way is to isolate the exception to its own (nested) procedure:

procedure RaiseUnsupported(what : TDoWhat);
begin
   raise Exception.Create('Unsupported: '+GetEnumName(TypeInfo(TDoWhat), Integer(what)));
end;

and call RaiseUnsupported in the “case else“. Doing so will move the exception frame to the new procedure, where it’s irrelevant in terms of performance.
This simple change nets us a 33% speedup, ie. we reclaimed most of the lost time in begin/end! We also gained a bit from the UStrArrayClr, which did essentially nothing since those strings it was used to clear weren’t defined (as long as we did not hit the exception).

Note that if you use a nested procedure for RaiseUnsupported, you can be tempted not to pass it the “what” parameter, but use directly the “what” from its parent procedure. However by doing so, you’ll have the compiler use a special stack setup (so that the nested procedure can access the parent procedure’s variables). This setup will be faster than the exception frame it replaces, but with it, begin/end would still be taking about 18% of the CPU time spent in the procedure.


Repeat Until Belly.Full;

Those first 33% were easily gained. Let’s go for another round of SamplingProfiler:

Going For The Jugular - Further Profiling Results

Things are more satisfying: the line performing the actual work is now taking up most of the CPU time. Second comes the case of line. For further speed improvements, we now need to move the conditional out of the loop:

procedure DoSomething3(var data : array of Integer; what : TDoWhat);

   procedure RaiseUnsupported(what : TDoWhat);
   begin
      raise Exception.Create('Unsupported: '+GetEnumName(TypeInfo(TDoWhat), Integer(what)));
   end;

var
   i : Integer;
begin
   case what of
      dwInc :
         for i:=Low(data) to High(data) do
            Inc(data[i]);
      dwDec :
         for i:=Low(data) to High(data) do
            Dec(data[i]);
   else
      RaiseUnsupported(what);
   end;
end;

We have increased the line count noticeably, but most of those extra lines are still cosmetic. What further makes it a reasonable trade-off is that the execution time has been reduced by 66% from the initial version, it now executes 3 times faster!

Are there any more easy gains to be had? Let’s run the last version through SamplingProfiler:

Going For The Jugular - Final Profiling Results

More than 92% of the execution time now goes to the loop and actual work. We got only a wee bit left for stack setup (line 96) and the case of (line 97). At this point, the above makes it clear that if you want to go faster you’ll have to increase the line count and code complexity significantly as you’ll need to replace the two-liner loops with something else, which is bound to be heavier (unrolling, SIMD, etc.)


Rest Under A Tree

Some quick final notes to conclude.

When moving an exception to a procedure, there are two things to keep in mind:

  • the exception will be triggered at another place in the code, to know where it was actually triggered, you’ll have to look up one step in your exception log stack trace… You do have an exception log stack trace in place, don’t you?
  • the compiler won’t “know” about the exception in the called procedure, so it will assume execution continues after your RaiseUnsupported, so you may want to place an Exit after it (which will never be reached), to avoid warnings and allow the occasional register optimization by the compiler.

In the final version, we gained more than the previous profiling run hinted at: the new code allowed the compiler to make better use of the registers. Ofttimes, getting the fat out of the way is all you need to see improvements.

If you check the CPU view, you’ll see everything is quite efficient now, but even then, using all the remaining tricks in the book could probably net noteworthy gains, just at a significant complexity increase. I didn’t try, but I would guess a 2x or 3x speed up should be about right.

If you were to need to go that route, SamplingProfiler could still help you there: on ASM code, you get profiling data down to the ASM instruction… but that’s food for another article.

Eric Tips , , , , , ,

ZJDBGPack re-release

May 4th, 2009

ZJDBGPack is again available, but as an independent download (it used to be bundled with SamplingProfiler).

This is a command-line utility intended for use in a build process or from the Delphi tools menu, whose purpose is to integrate debug information into an executable. The debug information format  is a compressed version of JCL’s JDBG.

As of know, SamplingProfiler is the only published utility that understands this format, so you can use it either to reduce the size of the executables you deploy for profiling purposes, or if you do not want to deploy directly-readable debug information files.

Eric News , , , , , , ,

Knowing what and when to optimize…

April 20th, 2009

…is as important as knowing how to optimize.

In this thread on the Delphi forums Ante Bonic brought back to intention this excellent Delphi Optimization Guide in Delphi article by Robert Lee. The article has aged a bit, but many tips remain true with the Delphi 2009 compiler (sadly so).  Like many optimization articles, Robert’s focuses on mostly local optimization tips, which can draw in warnings like this one one by Anders Isaksson:

Optimization should be done after profiling, not before.

Which I couldn’t agree more with. But to be fair, Robert’s states so in his article, as do most authors of optimization articles. Recipes and local optimization tips are to be used after all algorithmic and data structures improvements have been taken advantage off.

If one can list tips and tricks for local optimization, do’s and don’ts that are true often enough to be good tips in many scenarios. However, it’s practically impossible to come up with a “reusable” list of tips for algorithms and data structures. Too many specifics can come together, even when the problems are similar, considerations of scale or reactivity can drastically influence architectural and algorithmic options.

Hence the most visible optimization recipes are often local optimization ones, but mostly because there are few global optimization recipes. You only have global optimization methodologies. But even these methodologies can usually be summarized with few words:

  1. Time, profile, analyze and confirm your bottlenecks.
  2. Improve algorithms & data structures.
  3. Exhaust 1 & 2 before looking at local optimizations, and then don’t forget 1.

To optimize efficiently, ie. not waste your time, you have to master the first point.
To optimize effectively, ie. not waste the machine time, you have to master the second.

And the third point you ask? It’s a razor’s edge, when applied effectively, it can be very efficient, with very few changes like in this case, but if not, it’s a good way to end up there. To be effective, local optimization has to be about taking care of hidden machinery, hidden shortcomings of the compiler, hidden algorithms and data-structures that get in the way.

I’ll close this post by quoting Robert Lee’s article on timing:

Timing code is generally called “profiling”. If you want to improve the performance of your code, you first need to know precisely what that performance is. Additionally, you need to re-measure with each change you apply to your code. Do not spend a single second twiddling code to improve performance until you have analytically determined exactly where the application is spending its time. I cannot emphasize this enough.

Eric Tips , , , , , , ,

SamplingProfiler v1.7.1 bugfix release

April 16th, 2009

SamplingProfiler v1.7.1 is now available, it fixes the crash in the paths dialog reported by Kazan in the forums.

Incidentally this was due to a very old Delphi 5 bit of code that somehow survived Delphi 2009 at the compilation level, but bombed at runtime… I dropped the code and made use of the already existing D2009 version, hence the smaller executable.

For further details on this version, see the v1.7.0 post.

Eric News , , ,

Delphi 2009 hidden compiler switch?

April 1st, 2009

This morning while debugging a statistical ichthyo-parser I stumbled upon what looked like a Delphi 2009 compiler bug: the compiler was outputting gibberish ASM opcodes… But after further investigations, it appeared this wasn’t completely gibberish, but that it was (somewhat) correct MSIL bytecode!

What’s more, a quick hexadecimal examination of dcc32.exe yelded that this MSIL codegen looks like it can be forced by using an undocumented command-line compiler switch: -af

The resulting exe won’t run because it’s a mismatch of Win32 headers and MSIL bytecode… What do you think?
Did CodeGear plan supporting unmanaged code in managed executables or managed code in native executables?

Update: here is a screenshot of the switch in action.

Eric News , , , , ,

How familiar are you with code profiling?

March 30th, 2009

SamplingProfiler was initially released in the Delphi ASM newsgroup, and I’m curious about the audience of this website, so I’ve setup a small poll.

How familiar are you with code profiling and/or Delphi code optimization? Can you tell apart instrumenting and sampling profilers merely by their respective heisenbugs, or is that profiler business sounding like a TV series from the last century?

Poll - Familiarity with Profilers

Eric News , , , , , ,

begin…end as bottlenecks?

March 25th, 2009

There will come a time when SamplingProfiler may report you that begin or end are your bottlenecks. This may sound a little surprising, but it’s actually quite a common occurrence, and something that instrumenting profilers are not going to point out, so it might be worth a little explanation.

This can be illustrated it with the minimalistic example of an array property getter. Witness the innocuous looking code below:

function TMyList.GetItem(index : Integer) : T;
begin
    if (index < 0) or (index >= Count) then
       Error(index);
    Result := FItems[index];
 end;

Nothing out of the ordinary there, you can find similar looking code in practically every array-based collection in the RTL and many third party libraries. But someday, that GetItem will be bottleneck, and you could be left looking at code profiling results like those:

begin-end-critical-01

Yes, those are the are the begin and end lines taking up more than 70% of the CPU time spent inside GetItem
You knew it! Sampling profilers are unreliable… or are they? Surely the index range checking must be the culprit? or the assignment and the reference counting business? Well, they could be, but in this case they aren’t.

To understand why, let’s have a look in the CPU view. Place a breakpoint on your begin, run up to there and hit Ctr+Alt+C, here is what you could see:

begin-end-critical-02

That’s a whole lot of traffic to the stack: 3 registers saved, 3 copies. Those things aren’t free, they can dwarf what your explicit code does, and in this example, they do. We didn’t even have any local variables, if we did, they would have taken setup and teardown code, and this code would have been “hidden” in begin and end too.

This illustrates a difference of sampling vs instrumenting profilers: the ability to pinpoint an actual bottleneck, even if it is “outside” of your explicit code, so you can find where the actual bottleneck is, and don’t waste time trying to optimize what isn’t critical.

Now what can you do to improve things locally? With generics, an interface type and Delphi 2009 sp2, nothing much, short of going BASM. The bottleneck code is compiler-generated, optimizing the assignment or the range checking would only provide minimal benefits. If you want to go faster, you’ll have to reduce the number of calls to GetItem, ie. open that “Show Callers” pane, have a look there, and solve the issue at the higher-level routines that are involved.

But there are other situations in which you can influence the auto-generated begin/end code, the solutions then typically revolve around distributing the code across smaller local functions or methods, tweaking your variable usage, separating branches, or if all else fails, going BASM… but that is food for future posts!

Eric Tips , , , , , , ,

Sampling Profiler

February 25th, 2009
Comments Off

SamplingProfiler is a sampling profiler for Delphi 5 to Delphi 2009. Its purpose is to help locate bottlenecks, even in final, optimized code running at full-speed.

Downloads and changelog
News, Tips and posts about SamplingProfiler

Main options screen
Main options screen

Profiling results analysis
Profiling results analysis

Though it may be able to profile application compiled by many other compilers, the focus is (currently) solely on Delphi applications.

What is a sampling profiler?

There are basically two kinds of profiling tools: instrumenting profilers (source or binary) and sampling profilers. Instrumenting profilers work by altering an application code or binary, and adding calls to functions that will count how many time each procedure was called and how many time was spent inside. This approach allows an exhaustive analysis of which code called which code, and how many times was spent in each procedures. However, it will typically incur a significant execution speed and memory penalty that can only be avoided by spending time and insight and limiting instrumentation to a subset of an application’s functions, making them more suitable when you know where the issue is (see GpProfile for a free instrumenting Delphi profiler).

Sampling profilers on the other hand do not require instrumentation and proceed by a statistical analysis by periodically looking at which code is currently being executed by the profiled application. The statistical nature means that not all code may be seen by the profiler (only code that takes time to execute), profiling information may also vary statistically between executions, and context information for bottlenecks is typically more limited.

By focusing on what code is actually taking execution time, and not being as intrusive, they can be used to pinpoint actual bottlenecks in production code, a feat instrumenting profilers aren’t capable of. They also provide bottleneck information down to the code line, and can point to issues that aren’t in your explicit code (such as call convention overhead, local values initialization/cleanup, etc.)

Why should I use a sampling profiler?

Using a sampling profilers has benefits:

  • it will not affect the execution speed significantly, neither because of its own execution times, nor because it affects the CPU instruction or data cache by its instrumenting code (ie. you get a measure of actual performance like if there was no profiler running)
  • it is immune to the heisenbug of instrumenting profiler that inflate disproportionately the execution time of small procedures invoked in tight loops or from many contexts in an application’s code.
  • it is able to measure the time spent in other OS components or DLLs (like the video driver, OpenGL, etc.), not just the time spent in your application
  • profiling latencies won’t hide your application’s latencies (hard disk accesses, network accesses, video driver waits…), which can be particularly significant if your application makes asynchronous accesses.
  • it can pinpoint bottlenecks at the code-line level (not just procedure level), for the entire application.
  • it can be used to profile over long periods of time, like a full batch run of computations or a complete game level, you can literally have an application being profiled for days
  • being lightweight, you can profile multiple applications simultaneously (like a client and a server running on the same development machine)

RealTime Monitor

With version 1.70, SamplingProfiler includes a small http web server which can be used for real-time monitoring of the profiled application. The monitor provides code hot-spot information in real-time, in HTML or XML form.
This feature can help diagnostic infrequent usage spikes or near-freezes (like infinite loops). It can also be used for monitoring long-running processes executing on other machines.

Eric Uncategorized , , ,