Archive

Archive for the ‘Tips’ Category

Defeat “Print Screen” with Visual Cryptography

August 1st, 2012

Time for some summer fun! Someone asked in the delphi win32 newsgroup how to prevent users from doing a “Print Screen”. The answer to that is that you can’t really, since screen capture can be invoked from other applications, or your app can be hosted in a Virtual Machine, accessed over RDP, or a frame can be grabbed directly from the screen cable.

However all hope is not lost! Even if you can’t defeat a “Print Screen” or screen capture, it’s possible to make such a capture useless.

Defeating any casual PrintScreen

Any worthy accomplishment requires sacrifices, and this one is no different. Assuming inducing end-user headaches or epileptic seizures is acceptable, you can defeat any single casual “Print Screen”, screen capture or frame grab with the help of visual cryptography and retinal persistence.

The trick is that a “Print Screen” will capture exactly what’s on the screen at a given time, so if you ensure that at any time, you only have random noise on the screen, then the screen capture is possible, but useless as only random noise will be captured. This is where visual cryptography comes in.

The idea is to generate two (or more) images that will individually be random noise, with no information as to the content they crypt, and when combined the information becomes visible again.

Classic Visual Cryptography

Visual cryptography does it by superposing two (or more) transparent slides (see the wikipedia article for details), and using transparency, that approach isn’t directly applicable to our “Print Screen” problem, but a variation can be devised where instead of using transparency (an “OR” operator), we’ll be using retinal persistence (an “AVERAGE” operator).

Show me the code!

To achieve that we only need to tweak slightly the Virtual Cryptography image generation, for instance the following code will take a black/white image in bmpOrig and generate two “crypted” images in bmpOne and bmpTwo:

const
   BoolToColor : array [False..True] of TColor = ( clBlack, clWhite );
...
for y := 0 to bmpOrig.Height-1 do begin
   for x := 0 to bmpOne.Width-1 do begin
      b := (Random(255) and 8) <> 0;
      bmpOne.Canvas.Pixels[x, y] := BoolToColor[b];
      if bmpOrig.Canvas.Pixels[x, y] = clWhite then
         b := not b;
      bmpTwo.Canvas.Pixels[x, y] := BoolToColor[b];
   end;
end;

Basically the first image is pure random, the second is a variation of the first with colors flipped depending on the original image. So taken in isolations, both images are effectively random, and provide zero information about the original image (assuming your Random function is not predictable enough, which isn’t the case with Delphi’s Random, but let’s ignore that).

What happens in detail is that:

  • White becomes either Black + White or White + Black
  • Black becomes either Black + Black or White + White

So White becomes perceptual Gray, and Black becomes a static White or static Black. Obviously, that’s not too pretty and will only gain you appreciation from the denizens of the MoMA, but that’s enough information for the eye to figure things out.

What does it look like?

bmpOrig

Original Image

Will thus results in the following two “crypted” images

bmpOne

bmpOne

bmpTwo

bmpTwo

Which are only what a PrintScreen could capture if you were to flip the display madly between bmpOne and bmpTwo.

The perceptual image, assuming you were to flip at a high enough frequency will be:

Perceptual image

Perceptual image, visible but never displayed

If your hardware and screen display (LCD) are fast enough, the above image is what your eyes will see. In practice, you’re more likely to have at least occasional frame skips, which will make the image flicker.

Since the technique only requires generating two images and flipping between them at high frequency, it will “work” on any platform, native apps, web apps, etc. and will defeat casual PrintScreen even when hosted in a VM or viewed over RDP.

Making it work in the real world

The key to making it work is to be able to flip fast enough between the images, preferably at each screen refresh (VSync) to minimize user discomfort. On Windows, you have access to VSync through OpenGL or DirectX, on web browsers, that’s through requestAnimationFrame.

When you don’t have access to VSync (such as in a normal VCL app), you’ll have to use a timer, but that won’t be very precise, can be subject to stroboscopic effect, so may have to aim for a lower flipping frequency than theoretically possible, which will increase discomfort.

I’ve made a small SmartMS app for your consumption’s, er… pleasure?

Consider yourself warned against epileptic seizures and headaches, then click on the link below, and hit Print Screen to your heart’s content!

VisualCryptoGraphy Demo
http://bit.ly/vcrypto

My PC and iPad don’t seem to have any trouble displaying a rather stable perceptual image. If you don’t see it working on mobile Android browsers, try touching and dragging the screen (it allows higher screen refresh rates).

As a final note, remember that this technique does not work against eye-like devices, like… cameras. It defeats PrintScreen only, and only against casual print-screeners.

As an exercise left to the astute reader, there is very simple way to decrypt with non-casual Print-screening and the help of a Paint program, but shhh! ;-)

Tips , , , ,

Casting an Interface to a Class, the efficient way

July 4th, 2012

Delphi 2010 added support for the “as” to cast an interface reference to its implementation class.

Cast interface as class

type
   IFoo = interface ... end;
   TFoo = class (TInterfacedObject, IFoo) ... end;
...
var intf : IFoo;
var foo : TFoo;
...
intf := TFoo.Create;
...
foo := intf as TFoo; // get back the implementation class

However, if “as” can be convenient in certain scenarios, it’s alas not implemented very efficiently: the compiler and RTL go through several hoops to perform it (cf. this article by Arnaud Bouchez). One of those hoops f.i. gets slower the more interfaces are implemented by the underlying class.

For instance in this benchmark, the “as” loop takes 5.9 ms when operating on a class implementing 2 interfaces, and 7.1 ms (20% more) when operating on a class implementing 8 interfaces (benchmark code adapted from this one in the comments by Chris Rolliston)
Not visible in the benchmark is also the poor cache efficiency of that scanning, should you be dealing with an interface that is implemented by many different classes.

Potion of Speed-Casting

A faster way to go at it (about 4 to 6 times faster, even when not under stress), which is incidentally compatible with older Delphi versions, is to use something like

type
   IGetSelf = interface
      function GetSelf : TObject;
   end;
   IFoo = interface (IGetSelf)
      ...
   end;
...
procedure TFoo.GetSelf : TObject;
begin
   Result := Self;   
end;
...
foo := intf.GetSelf as TFoo;

and parent your Delphi interfaces to some base interface that provides a GetSelf or similar method, and implement it in a root class (in DWScript f.i., it is introduced by a TInterfacedSelfObject).

With the above code, a similar loop completes in 1.23 ms, constant-time, and doesn’t increase when classes implement many interfaces or if you have many classes implementing the same interface. So unlike “as“, it won’t fail you the more you stress it (I first bumped on the issue in a pathological case for “as” when multi-threading, where it ended up on top of profiling results for no good reason).

The limitation is that intf.GetSelf will fail if intf is nil (while “as” would just return a nil), though IME, when you’re casting back to the implementation, you’re likely to have filtered against nil far earlier in the code.

Another option would be Arnaud Bouchez’s ObjectFromInterface, which is constant-time and faster than “as”, but slightly slower than using an IGetSelf (about 7%), and you would be dealing with internal structures magic.

Beyond performance

A last benefit of the IGetSelf approach, beyond any performance considerations, is that it makes the cast part of the design.

Casting interfaces to classes is relevant only for Delphi-implemented classes and Delphi-oriented interfaces, going through an IGetSelf focuses the purposes and scope of those interfaces that are susceptible to be cast back to to classes, while “as” is more of a death-match trip-mine weapon, since you can invoke it on any interface.

Let’s not forget that casting an interface back to a class isn’t exactly a benign implementation choice: interfaces are often intended to isolate the implementation from the interfaces, if that isolation can be broken, that has to be a conscious design choice imho, more than an implementor’s shortcut.

As a bonus, using GetSelf allows to easily find where those cast are made in the code: mark the GetSelf method as deprecated in the IGetSelf, and the compiler will give you a complete lists of places where it’s used. So IGetSelf usage is easily diagnosable, and thus easily refactor-able. Try doing that with “as“…

Tips ,

Buffered Image for SmartMS

May 24th, 2012

Here is a small class to facilitate working with off-screen dynamic images in Smart MS: w3BufferedImage.zip (1kb)
It’ll be in the next Smart update, but you can already use it, it was introduced as part of WarTrail, as a way to optimize graphic elements that are complex and don’t change over several frames (text, tiled background, etc.).

You can use it to bundle graphic layer element, for instance in WarTrail the top & bottom areas (with scores & buttons) are in two distinct buffered images, and when you bring up the “menu” for a tower upgrade, that’s another buffered image.

Setup

To use it, you simply create it, and specify its size, f.i. a 200×50 buffer is created with

buffer := TBufferedImage.Create(200, 50);

then you need to setup its OnRedraw event,  which is a procedure that passes the buffered image itself. This is here that you’ll have to (re)draw whatever your buffered image is meant to contain

buffer.OnRedraw :=
   procedure (Sender : TBufferedImage)
   begin
      Sender.Canvas... your code here ...
   end;

Inplace of the anonymous methods, you can also use a standalone procedure or a regular method (and it might be preferable if what you draw isn’t trivial), as they’re all compatible, the compiler takes care of everything that would differentiate a “procedure” from a “procedure of object” or a “reference to procedure” in Delphi.

Usage

When you need to draw the buffered image, you can use one of its Draw() or DrawScaled() methods, basically telling it to redraw itself on another canvas. The available methods are currently:

// top-left anchor
procedure Draw(dest : TW3Canvas; x, y : Float); overload;
procedure Draw(dest : TW3Canvas; x, y : Float; const sourceRect : TRect); overload;

// center anchor
procedure DrawScaled(dest : TW3Canvas; x, y, scale : Float);

The first two are top-left anchored, ie. the x & y are the top-left of where the image will be drawn, and you can optionally specify a sourceRect, so that only part of your buffered image will be drawn (note that for tiled images, the w3SpriteSheet class is preferable, it offers more features like rotations and automatic sourceRect computations)

For DrawScaled, the x & y are the center of where the image will be drawn.

When you need the image invalidated, you just call it’s Invalidate method, then the next Draw call will first invoke OnRedraw. You can also use the Prepare method to have OnRedraw be invoked at a time of your choosing instead.

Last thing, if you need to draw with alpha-transparency (f.i. to fade-in or out), just use the GlobalAlpha property of the target canvas, f.i.:

myCanvas.GlobalAlpha:=0.5;  // 50% opacity
bufferedImage.Draw(myCanvas, 50, 50); // drawn at top, left = 50,50
myCanvas.GlobalAlpha:=1;  // restore 100% opacity

That’s about it! Now you can efficiently make use of custom dynamic elements, such as multi-layered text with shadow, that would otherwise be too complex to redraw from scratch at each frame!

Tips , , ,

OptimalCode – “Delphi Optimization Guidelines”

April 26th, 2012

If you recognize the title of this article by Robert Lee, then chances are you’ve been around Delphi for a while! :-)

Alas the optimalcode.com website and Robert Lee disappeared years ago without a trace, but the “Delphi Optimization Guidelines” (dating back from 2002-3003) has been safeguarded and preserved. Recently someone pointed to me that the mirror I had in my Links section had disappeared too…

I guess it’s my turn to host the sacred relics, so I’ve updated the link and placed not one, but three copies on this site:

The guide is, at the time of this writing, nearly 10 years old, so read it with that in mind!
That said many tips still apply to the 32bit Delphi XE2 compiler, and quite a few tips are valid regardless of compiler and programming language.

Tips , , ,

Getting Rid of the Middleman

March 28th, 2012

On this StackOverflow question David Heffernan asked about a hack I’m using in DWScript’s UnifyAssignString.

TStringListCracker

The code of the function looks like:

procedure UnifyAssignString(const fromStr : UnicodeString; var toStr : UnicodeString);
var
   i : Integer;
   sl : TUnifierStringList;
begin
   if fromStr = '' then
      toStr := ''
   else begin
      i := Ord(fromStr[1]) and High(vCharStrings);
      sl := vCharStrings[i];
      sl.FLock.Enter;
      i := sl.AddObject(fromStr, nil);
      toStr := TStringListCracker(sl).FList[i].FString; // HACK HERE
      sl.FLock.Leave;
   end;
end;

The non-hacky variant would be to have that line just be

toStr := sl[i];

but if you use that simpler form, you could see a performance drop by up to 45% (Delphi XE and XE2 32bit) or 38% (Delphi XE2 64bit).

Ouch! Why is that?

The hacky version

The hacky version compiles to a single UStrAsg, the internal function for string assignment which takes care of the reference-counting. So it’s just:

dwsUtils.pas.487: toStr:=TStringListCracker(sl).FList[i].FString;
00511594 8B0424           mov eax,[esp]
00511597 8B532C           mov edx,[ebx+$2c]
0051159A 8B14F2           mov edx,[edx+esi*8]
0051159D E8A659EFFF       call @UStrAsg

i having been returned by AddObject, no range-check is required.

(note that TStrings.AddObject is called directly because TStrings.Add is just an indirection to AddObject)

What about the “toStr := sl[i]” version?

The property is just syntax sugar on a getter method, so what is actually compiled is  “toStr := sl.Get(i)“, which itself, being a virtual call, looks like

toStr := sl.VMT[ TStrings_Get ]( sl, i );

However the virtual call to TStringList.Get on its own costs only a tiny bitsy of extra CPU cycles, though it also prevents inlining (at least, until the compiler gets de-virtualisation optimizations).

But there are other factors at work, the virtual call on its own can’t come close to explaining the performance differential, as after all, the rest of the code UnifyAsstringString is far from trivial (with a critical section and a binary search…)

First, it’s a function returning a string, which is a managed type, so it’s actually compiled to something like:

try
   temp := sl.getValue( i );
   UStrAsg( toStr, temp );
finally
   UStrClr( temp );
end;

Second, TStringList.GetValue itself is compiled to

TStringList.Get:
00448204 53               push ebx
00448205 56               push esi
00448206 57               push edi
00448207 8BF9             mov edi,ecx
00448209 8BF2             mov esi,edx
0044820B 8BD8             mov ebx,eax
0044820D 3B7330           cmp esi,[ebx+$30]
00448210 720F             jb $00448221
00448212 8B1544EA5100     mov edx,[$0051ea44]
00448218 8BCE             mov ecx,esi
0044821A 8BC3             mov eax,ebx
0044821C E8C3E3FFFF       call TStrings.Error
00448221 8BC7             mov eax,edi
00448223 8B532C           mov edx,[ebx+$2c]
00448226 8B14F2           mov edx,[edx+esi*8]
00448229 E81AEDFBFF       call @UStrAsg
0044822E 5F               pop edi
0044822F 5E               pop esi
00448230 5B               pop ebx
00448231 C3               ret

That’s quite a lot of code, a fair share of it is spent juggling registers because TStringList.Get is implemented as

function TStringList.Get(Index: Integer): string;
begin
   if Cardinal(Index) >= Cardinal(FCount) then
      Error(@SListIndexError, Index);
   Result := FList^[Index].FString;
end;

If the range check fails, Error triggers an exception and the last line won’t ever be reached, but the compiler can’t know that, so it compiles everything assuming the last line can be reached even after an Error, thus has to save the parameters, hence the stack juggling.

(the fix to the juggling? use an “else”)

Apart from that issue, there remains in TStringList.Get a range check and an extra UStrAsg call.

And UStrAsg itself includes an atomic incrementation instruction (bus locking)

lock inc dword ptr [edx-$08]

Even though modern CPUs handle them far more efficiently than their ancestors, atomic instructions still cost quite a bit more than their non-atomic counterparts, and the more multi-threaded you are, the more expensive they get.

Summary

So when all is said and done, the hacky version has:

  • 1 atomic instruction (1 UStrAsg)
  • 1 call

While the non-hacky version has

  • 3 atomic instruction (2 UStrAsg, 1 UStrClr)
  • 4 calls (2 UStrAsg, 1 UStrClr, 1 TStringList.Get)
  • 1 exception frame (free in 64bit, not free in 32bit)
  • higher register pressure in UnifyAssignString
  • 1 range check
  • a lot of register juggling (especially in 32bit)

So while “cracking” TStringList isn’t safe, depending on the circumstances, you can use it to achieve a boost for very little extra complexity.

Tips , , , , ,

Small is Beautiful

February 6th, 2012

Small JavaScript that is. Or how to go from 350 kB down to just… 25 kB 23 kB.

Smaller JavaScript can help in up to three ways:

  • faster download: faster application installation or startup.
  • faster parsing for the browser: faster startup.
  • smaller identifiers: faster execution for non-JITting JavaScript engines.

And smaller also means you can have far more complex applications for a given size budget.

Using “Nickel Iron” as illustration


sshot_1280_800Nickel Iron is now available in the Chrome Web Store and in the Android marketplace, it’s built with Smart Mobile Studio, using the DWScript JS CodeGen.

The Pascal source for Nickel Iron is made of about 10k lines of code (most of it from the VJL), and a “normal” build, results in 350kB of JavaScript, well formatted, readable and debug-able with clear variable & class names. That’s larger than Pascal source size by about 50%.

Starting from that, if you enable obfuscation, optimization for size and smart linking, the 350 kB go down to 100 kB of a JavaScript source (not really readable anymore), and when that source is packaged in an Android app or sent with HTTP compression, you’ll be looking at a 25 kB file.

For comparison, jQuery is 229 kB raw, and 31 kB minified & compressed, and jQuery UI is about twice larger than jQuery. And when you’ve taken all that baggage, you haven’t done anything yet!

Obfuscation

Obfuscation isn’t just to make your code more annoying to reverse engineer: it can also help make your JavaScript smaller.

When obfuscation is active, the CodeGen will replace most identifiers with shorter versions, usually 1 to 3 characters in length.

Since JavaScript is case sensitive, each extra character added to an identifier can take 62 different values (the CodeGen reserves “$” and “_” for special uses). So obfuscated identifiers are typically short, and that allows to save on space.

JavaScript is also quite heavy on hash-table name lookups, and smaller names help making hash computations faster. On a Desktop JavaScript engine, that advantage quickly fades away as the browser hot-spots profiler decides to JIT, but on your typical Smart-phone browser, the difference can be felt.

Optimize for Size

Optimize for size triggers two mechanisms in the CodeGen:

  • a JavaScript “minifier” is run on the output, it will strip away comments, useless spaces, tabs and other characters.
  • alternative code generation templates are used, which spit out less readable but smaller code. At this point, there is no choice between size and performance, only between more and less human-readable.

The minifier is applied to “asm” sections too, and performs “safe” minifications only.

Smart Linking

Just like in Delphi, smart linking will eliminate functions methods and classes you have in the Pascal source, but never use in your program.

This is where things break away from other JavaScript libraries in terms of size. At best, they offer manual smart-linking like  jQuery UI’s “Build Your Download“, or plain old plug-ins. But if you want to use those, it means you’ll be dealing with manually managing hundreds of different builds (given all the possible combinations), and will probably just be bundling useless stuff sooner or later, because life’s too short and/or time is money.

However, just like in Delphi, Smart Linking works best if your code is well decoupled, if you use dependency-injection and other light-coupling design approaches. So avoid coding spaghetti plates ;-)

At the time this article was written, the DWScript Smart-Linker limitations are :

  • virtual or interfaced methods of a class you use aren’t eliminated (same limitation as in Delphi) update 02/08: now supported, was simpler than anticipated.
  • there is no de-virtualization just yet (same limitation as in Delphi).
  • cross-referencing functions aren’t eliminated (procA calls procB, and procB calls procA), though as this may be more of a sign of a code smell, it might just be getting a compiler “hint” rather than smart-linker support.

Finally, Pascal being declarative and statically-typed (as long as you’re not abusing RTTI/asm stuff), the Smart-Linker will be able to go further than other optimizers that start from JavaScript (like Google’s closure), and thus have to accommodate for all potential dynamic tricks.

Tips , ,

Good Practices for JavaScript “asm” sections in DWS/OP4JS

January 16th, 2012
Comments Off

The compiler supports writing “asm” aka JavaScript section in the middle of Object Pascal, there are a few good practices as well as tips to keep in mind, let’s review the menu:

  1. Name conflicts and obfuscation support
  2. Do you really need an “asm” section?
  3. Don’t rely on implicit parameters structure
  4. Handling callbacks with “Variant” methods
  5. Handling callbacks in an “asm” section
  6. Current limitations

1. Name conflicts and obfuscation support

This should be a point zero actually, but the first thing to have in mind is that you are allowed in Pascal to use as names identifiers that are reserved in JavaScript. Those can be language keywords (“this”, “delete”, etc.) or common DOM objects and properties (“document”, “window”).

The compiler automatically protects you from such conflicts by transparently renaming your identifiers (currently by adding a “$”+number at the end).

Then there is the obfuscator, which will basically rename everything to short, meaningless names. That’s good for more than obfuscation: it reduces the size of the JavaScript, improves the parsing and lookup-based performance in the browser.

The consequence is that in an “asm” section, you should prefix all Pascal identifiers with an ‘@’, so the compiler can correctly compile your asm. For instance in:

var window : String;
...
asm
   @window = window.name
end;

The ‘@window’ refers to the ‘window’ string variable (which the compiler will rename), while ‘window.name’ will be compiled “as is”, as it reads the ‘name’ property of the global ‘window’ JavaScript object.

2. Do you really need an “asm”‘ section?

Though for some weird cases you might (like this gem), there are many cases in which you don’t need “asm”, as the language supports a “Variant” type which is a raw JavaScript object, and upon which you can call methods, read properties directly or via indexes.

For instance, with v a Variant, the following code:

v := v.getNext();
v['hello'] := v.space + 'world';

will get compiled (almost) straight into

v = v.getNext();
v['hello'] = v.space + 'world';

When using Variant, you don’t have strong compile-time checks (it’s just you vs JavaScript), property and function names are case-sensitive, so use them with care. This is similar in syntax and essence to using OLE Variants and Delphi.

On the other hand, you have compiler support, and you get automatic casts when assigning a variant to a strong type (Integer, String, etc.), and you also get name conflict protection & obfuscation support without having to ‘@’ your Pascal references.

3. Don’t rely on implicit parameters structure

Because they may change in future compiler revisions!

For instance, methods are currently invoked with an implicit “Self” parameters, and the others behind, so currently “arguments[0]” is Self, and everything else if after that. But don’t rely on it.

Future compiler revisions may change that parameter’s name, may obfuscate it, may remove it entirely in favor of an implicit “this”, may inline your function, etc.

So if you need explicit parameters, declare them, if you’re in a method and need to access the object (Self), use “@Self”, if you need to access a field of the current object use “@Self.FieldName”, etc. That will keep working.

4. Avoid declaring variables in “asm” sections

Declare them in the parent function/method instead, and reference them with the ‘@’ prefix.

There are three main reasons for that, the first is that doing so means they’ll be case-insensitive, the second is that it will allow the obfuscator to obfuscate them reason for that, and the third is that you’ll get compiler warnings if you declare a variable but do not use it (or if you forgot to @-prefix it).

So don’t write that:

asm
   var myTemp;
   myTemp = ...whatever...;
   ...
end;

But write this instead:

var myTemp : Variant;
...
asm
   @myTemp = ...whatever...;
   ...
end;

5. Handling callbacks with “Variant” methods

A common occurrence is to register a callback to a JavaScript object, when that object is hosted in a Variant, that’s fairly simple to achieve:

procedure DoImageLoaded;
begin
   ...
end;
...
var myImage : Variant; // will refer to an image object
...
myImage.onload(@DoImageLoaded);

There we use the ‘@’ operator Pascal-side, to make it explicit that we want a function pointer, and not call the function. The ‘@’ isn’t necessary when the function is declared Pascal-side, as the compiler can figure it out, but when invoking a Variant method, it doesn’t know the parameters type.

Note that since function pointers are unified, you can get a function pointer from an object method or an interface method in the same fashion:

myImage.onload(@myObject.DoImageLoaded);
myImage.onload(@myInterface.DoImageLoaded);

6. Handling callbacks in an “asm” section

If you want to register the callback in an “asm” section, the situation is a little more complex, as “@myObject.myMethod” will refer to the function prototype, outside of its context. It means it’s okay for standalone functions or procedures, but may not do what you’re expecting for object or interface methods.

The solution is to acquire the function pointer outside of the “asm” section:

var myCallback : Variant;
...
myCallback := @myObject.DoImageLoaded;
asm
   @myImage.onload(@myCallback);
end;

7. Current limitations

Currently the parser for “asm” sections doesn’t really understand JavaScript:

  • it’s still treating JS as a weird invalid form of Pascal, and notably {} define comments for it, so it will pass whatever is inside curlies “as is”, and will annoyingly ignore @ signs inside curlies
  • some weird operator combos (but valid JS)  may throw off the parser, if that happens, place that code in between curlies, and post a bug report

Hopefully in time, there will be a proper JS parser, but currently the focus is more on the Pascal side, and “asm” sections are intended for handling corner cases more than as a main workhorse.

Tips , , ,

Fixing TCriticalSection

November 30th, 2011

TCriticalSection (along with TMonitor*) suffers from a severe design flaw in which entering/leaving different TCriticalSection instances can end up serializing your threads, and the whole can even end up performing worse than if your threads had been serialized.

This is because it’s a small, dynamically allocated object, so several TCriticalSection instances can end up in the same CPU cache line, and when that happens, you’ll have cache conflicts aplenty between the cores running the threads.

How severe can that be? Well, it depends on how many cores you have, but the more cores you have, the more severe it can get. On a quad core, a bad case of contention can easily result in a 200% slowdown on top of the serialization. And it won’t always be reproducible, since it’s related to dynamic memory allocation.

There is thankfully a simple fix for that, use TFixedCriticalSection:

type
   TFixedCriticalSection = class(TCriticalSection)
      private
         FDummy : array [0..95] of Byte;
   end;

That’s it folks. This makes sure the instance size larger than 96 bytes, which means that it’ll be larger than the cache line in all current CPUs, so no serialization anymore across distinct critical section instances.

As a bonus, it also ends up using one of the larger, more aligned, FastMM bucket, which seems to improve critical section code performance by about 7%. The downside is you use more RAM… but how many critical sections do you really have?

* (11-12-01): as noted by Allen Bauer in the comments, the issue is fixed for TMonitor in XE2.

Tips ,

Don’t publish your .dproj/.groupproj

November 10th, 2011

Just a quick reminder to everyone publishing Delphi projects with source:

Please don’t publish your .dproj & .groupproj, only publish the .dpr & .dpk

The reason? Those files include machine specific settings, such as paths, DCU/DCP/BPL/EXE output directories, along with your favorite debug & release options, which are likely different from that of your fellow developer.

It’s possible to have them manually cleaned up, but that’s tedious and error-prone short of checking their xml content manually.

Pretty much every single project with a .dproj out there has issues: that’s from major open-source projects to Embarcadero’s own samples. None of them (of you) got all of them cleaned up right.

But even getting the published .dproj right doesn’t matter: .dproj is where compile options are stored, options you’re just bound to change and adjust. When those .dproj are in a project you synchronize with via version control (SVN, GIT, etc.), your locally modified .dproj will likely conflict next time you synchronize, sometimes in unintended and not immediately obvious ways.

Hopefully in a future version, Embarcadero will split the .dproj, so that machine-specific settings are in a distinct file from the non-machine specific settings, which would essentially be per-project relative paths to the source files.

Ad interim, .dproj are just a kludge by design.

Tips

Rendering semi-transparent objects in FireMonkey

November 4th, 2011
Comments Off

The question has (predictably) popped up several times now, so here is a recapitulative post with workaround.

FireMonkey (as of now) doesn’t support rendering semi-transparent objects in 3D.

FireMonkey only supports blending of semi-transparent objects (either through the Opacity property or because of their texture, for instance a semi-transparent PNG image), but blending alone is not enough to get it right in 3D with a Z-Buffer (which is what FMX, and most 3D apps are using).

For a technical explanation, you can read about Transparency sorting, the article is about OpenGL, but applies to DirectX too.

A solution (not always quick or simple) is to manually sort the objects back to front, ie. have the objects farthest from the camera be rendered first, using an approach similar to that posted by Peter Söderman:

You should be able to sort them in distance from camera order with something like this (a bit sloppy and you may have to tweak the sorting but it’s a start): [...]

To get access to the children list I do an override of the TViewport3D class.

type
  TMyViewPort = class(TViewport3D)
  end;

procedure TForm1.Button2Click(Sender: TObject);
var
  myv: TMyViewPort;
begin
  myv :=  TMyViewPort( Viewport3D1);

  myv.FChildren.SortList(
  function (i1,i2: Pointer): Integer
  var
    o1,o2: TControl3D;
  begin
    if TfmxObject(i1) is TControl3D then o1 := TControl3D(i1);
    if TfmxObject(i2) is TControl3D then o2 := TControl3D(i2);
    if (o1 <> nil) and (o2 <> nil) then
    begin
      Result := trunc(  VectorDistance2(myv.Camera.Position.Vector,o1.Position.Vector)
                      - VectorDistance2(myv.Camera.Position.Vector,o2.Position.Vector));
    end else
      Result := 0;
  end
  );
end;

You’ll have to call the above sorting every time the objects or camera position changes.

This is just a workaround

Ideally this should be handled by the scene-graph, as it is rendering-dependent, otherwise, as in the code above, you end up having to change the scene-graph structure, which can have various other side-effects, and is even more problematic if you have more than one camera looking at the same scene.

Another downside is that this approach will work with convex objects that don’t intersect, and for which you don’t have triple-overlap.

When intersection or triple overlap happens, there is no object that is fully closer or farther from the camera, and the simple sorting approach fails.

The sorting approach also won’t solve transparency issues that happen for a mesh with itself, for that and the overlap case, you need to involve more advanced techniques that FireMonkey currently doesn’t support, like sorting mesh-sub-element, using tessellation, depth peeling, BSP, etc.

To implement the more advanced techniques mentioned above in a reusable and user-friendly fashion, it would involve standardizing materials an textures, standardizing mesh structures, and generalizing the FMX scene graph, thus hitting the top three architectural weaknesses mentioned previously.

Tips , ,