SamplingProfiler has a few options to help profile a multi-threaded application which I’ll go over here.
In the current version, those options allow identifying CPU-related bottlenecks, as in “threads taking too much CPU resources or execution time”. However, they do not provide much clues yet to pinpoint bottlenecks arising from thread synchronization issues or serialization (insufficient parallelism). Hopefully, more support for profiling multi-threaded applications will come in future versions.
By default, SamplingProfiler only looks at one thread, the main application thread, but you can manually (and dynamically) specify another thread. This is done via OutputDebugString (see Control sampling from your code)
OutputDebugString('SAMPLING THREAD threadID');
with threadID the thread ID (as returned from the WinAPI function GetCurrentThreadID f.i.). If you specify an invalid threadID, or if the thread dies, no more samples will be collected until you specify a new thread or “return” the sampling focus to the main thread, which can be accomplished with
OutputDebugString('SAMPLING THREAD 0');
This command is mostly useful if you already have a clue which thread is proving troublesome, like when a worker thread is used in GUI interface. If you have several worker threads in a thread pool, which serve random workloads (or assumed random), you can pick one of those threads (at random) and have it profiled.
However, this involves a fair amount of bias and guessing where the bottleneck could be, and is not really applicable if you have a high number of threads working (or sleeping) simultaneously on multiple CPUs. This is where comes in…
Monte-Carlo Samples Gathering
Monte-Carlo sampling is specified via the samples gathering mode option, when set, SamplingProfiler will pick a random thread of the profiled application at each sampling, and use it for the sample. Bias and guessing are eliminated.
The good news is that with this method, the sampling load is not increased, and its impact is random: concurrency issues and UI bottlenecks can still be spotted. Hot-spots in a server running at production speed can be spotted too.
The bad news is that if you have a high number of inactive threads, you’ll have to gather more samples to get meaningful results on the active threads (as each time an inactive thread is picked at random, the sample will be meaningless, and thus lost).
Interpreting the profiling results can however be a little more difficult, as several multi-threading effects can come into play, for instance a drop in CPU cache efficiency (code stressed in highly threaded situations can behave quite differently from what it looks when stressed in single-threaded situation). This will be food for future articles.
To decide if a thread is active or not, SamplingProfiler looks at its registers: if all the registers are unchanged between two samplings, the thread is deemed inactive and the sample dropped. Inactivity can thus result from the thread being sleeping or waiting on some event, or just from having not gotten its share of CPU time since the last time it was sampled (this can be quite common if you have a much higher number of threads than you have CPU cores, even if all the threads are busy).
The last set of options is the one for processor affinities. You can choose on which CPUs SamplingProfiler is constrained, and on which CPUs the profiled application is constrained.
Affinities can be used either to further isolate the profiled application from the profiler, or to easily simulate your application running on a machine with less cores. In more advanced scenarios, if you have enough CPU cores, you can also leave CPU cores entirely unused by both the profiler or the profiled, and thus reserve them to a third application (such as a database server).
ZJDBGPack is again available, but as an independent download (it used to be bundled with SamplingProfiler).
This is a command-line utility intended for use in a build process or from the Delphi tools menu, whose purpose is to integrate debug information into an executable. The debug information format is a compressed version of JCL‘s JDBG.
As of know, SamplingProfiler is the only published utility that understands this format, so you can use it either to reduce the size of the executables you deploy for profiling purposes, or if you do not want to deploy directly-readable debug information files.
This morning while debugging a statistical ichthyo-parser I stumbled upon what looked like a Delphi 2009 compiler bug: the compiler was outputting gibberish ASM opcodes… But after further investigations, it appeared this wasn’t completely gibberish, but that it was (somewhat) correct MSIL bytecode!
What’s more, a quick hexadecimal examination of dcc32.exe yelded that this MSIL codegen looks like it can be forced by using an undocumented command-line compiler switch: -af
The resulting exe won’t run because it’s a mismatch of Win32 headers and MSIL bytecode… What do you think?
Did CodeGear plan supporting unmanaged code in managed executables or managed code in native executables?
Update: here is a screenshot of the switch in action.
One issue when trying to profile a “live” application is that you may be getting a lot of noise, resulting from a particular library or section of code being executed from multiple contexts. You may also be after profiling only one particular case, and want some reproducibility between runs… in short: you want a finer grained control on when or for what the profiling will take place.
In those cases, you can control SamplingProfiler’s samples collection from your code with the following:
OutputDebugString('SAMPLING ON'); ...whatever needs to be profiled... OutputDebugString('SAMPLING OFF');
Those calls to OutputDebugString() are understood as commands to turn sampling ON or OFF. Usually you’ll want to use this in conjunction with the “Start sampling on command only” option, but it can also be used in reverse to “pause” sample collection. OutputDebugString() is declared in the Windows.pas unit.
As of version 1.5.2, another command that is accepted via OutputDebugString() is ‘SAMPLING THREAD threadID’, which is used to define from which threadID samples must be collected. This is useful when you want to profile a particular thread in multi-threaded application… but that’s another can o’worms for another day!