Delphi array constructors performance (or lack of)

In Delphi you can initialize a dynamic array in two ways, either manually or via the Create magic constructor

type
   TIntegerArray = array of Integer;

procedure Test;
var 
   a : TIntegerArray;
begin
   // magic constructor
   a := TIntegerArray.Create(1, 2);

   // manual creation
   SetLength(a, 2);
   a[0] := 1;
   a[1] := 2;
end;

The outcome in both cases is the same, are all things equal?

Some array initializations are more equal than others

The first method is less verbose in code, but quite a bit less efficient, if you check the CPU view, that becomes obvious

TestUnit.pas.32: a := TIntegerArray.Create(1, 2);
00511335 8D45F8           lea eax,[ebp-$08]
00511338 8B15F0125100     mov edx,[$005112f0]
0051133E E89576EFFF       call @DynArrayClear   // anybody knows why?
00511343 6A02             push $02
00511345 8D45F8           lea eax,[ebp-$08]
00511348 B901000000       mov ecx,$00000001
0051134D 8B15F0125100     mov edx,[$005112f0]
00511353 E87476EFFF       call @DynArraySetLength
00511358 83C404           add esp,$04
0051135B 8B45F8           mov eax,[ebp-$08]
0051135E C70001000000     mov [eax],$00000001
00511364 8B45F8           mov eax,[ebp-$08]
00511367 C7400402000000   mov [eax+$04],$00000002
0051136E 8B55F8           mov edx,[ebp-$08]
00511371 8D45FC           lea eax,[ebp-$04]
00511374 8B0DF0125100     mov ecx,[$005112f0]
0051137A E89576EFFF       call @DynArrayAsg

// Manual initialization
TestUnit.pas.35: SetLength(a, 2);
0051137F 6A02             push $02
00511381 8D45FC           lea eax,[ebp-$04]
00511384 B901000000       mov ecx,$00000001
00511389 8B15F0125100     mov edx,[$005112f0]
0051138F E83876EFFF       call @DynArraySetLength
00511394 83C404           add esp,$04
TestUnit.pas.36: a[0] := 1;
00511397 8B45FC           mov eax,[ebp-$04]
0051139A C70001000000     mov [eax],$00000001
TestUnit.pas.37: a[1] := 2;
005113A0 8B45FC           mov eax,[ebp-$04]
005113A3 C7400402000000   mov [eax+$04],$00000002

Now before you complain on the compiler capability, you’ve got to realize that the two ways of initializing a dynamic arrays are not equivalent:

  • the magic constructor creates an array, then assigns it, so the array variable is always in a well-defined state
  • the manual initialization mutates the array in several steps, so the array during the intermediate state is in an unfinished step

Of course, in the limited Test procedure, the compiler could figure out the array isn’t visible from the outside, and thus use the shorter form, but that’s an optimization that would apply only to a local variables.

A more generic optimization would be to have the compiler waive the temporary array when the array that is initialized isn’t referenced anywhere else (so intermediate states don’t matter), that’s possible given that dynamic arrays are reference-counted.

Next: Overhead in detail. Building a better Array constructor.

8 thoughts on “Delphi array constructors performance (or lack of)

  1. DynArrayClear call should be an optimization. Without it the subsequent DynArraySetLength call will copy old `a` contents to a new location if `a nil`.

  2. Take a look about how DynArraySetLength() in System.pas is implemented.

    Checking Length(a)length(Value) will not be much faster, since you will just bypass simple RTTI lookup, some adds, one mul and one div, then call reallocmem which is a no op for FastMM4 if the length is the same. And in most cases, Length(a) is indeed not equal to length(Value), so your optimization will slow down everything.

    Dynamic arrays are reference counted values, with “Copy On Write” pattern.

    DynArrayClear is there to handle the reference counting of any previous dynamic array instance stored in the variable. Here, your local variable is nil, so it will return with no delay. But if a should be nil, it will release any previous instance (if not copied anywhere).

    This is exactly the same for a string, a variant or an interface (or a record containing any of those types).

    What is slow with dynamic arrays is not the initialization, but the change of size. If you use SetLength(a,length(a)+1) to add items, it will be dead slow.
    This is why in our TDynArray wrapper, we allow to define an external “Count: integer” variable, so the array length is not the actual count number, but the array capacity.

    In our enhanced RTL for Delphi 7 and 2007, we rewrote some part of the DynArraySetLength() low-level function. But speed increase is not huge.

    Note that DynArraySetLength() current implementation in System.pas is not thread-safe about the reference counting (whereas it is thread safe e.g. for strings since Delphi 5).

    What may render dynamic arrays faster in Delphi should be the emission of code instead of using the RTTI (similar for record copy). Up to now, optimization exists if there is no RTTI for records, but dynamic arrays or records containing reference counts uses RTTI, which is slower. By the way, this is a reason similar to why TValue/variant types are slower than some optimized versions (like TOmniValue) using proper inlining features, which allows some compile-time optimization, by passing RTTI.

  3. A. Bouchez :

    Checking Length(a)length(Value) will not be much faster

    This is not what benchmarking reports, the 10x speedup us easy to observe in that case.

    And in most cases, Length(a) is indeed not equal to length(Value), so your optimization will slow down everything.

    A simple comparsion like that one that is correctly branch-predicted is very cheap, you’ll be hard-pressed to get measurable a difference in benchmarks.
    And if it’s not correctly branch-predicted, then it means you’re in the same size-situation, and will avoid branch mispredicts that happen in DynSetArrayLength.

    Dynamic arrays are reference counted values, with “Copy On Write” pattern.

    They’re not, Strings are copy-on-write, dynamic arrays are just reference-counted types, which happen to be duplicated by SetLength when there is more than one reference to them. They’re hybrids.

    Here, your local variable is nil, so it will return with no delay.

    It doesn’t in the real world, the reason is that this test is hard to branch-predict, from the POV of the CPU it’s close to random as you have a mix of nil and non-nil arrays which all go through the same test. And a branch mispredict is very costly on modern architectures.

    What is slow with dynamic arrays is not the initialization, but the change of size. If you use SetLength(a,length(a)+1) to add items, it will be dead slow.

    It’s dead-slow even when the size isn’t changed, benchmark it if you doubt it, the RTL source doesn’t give the whole story.

    Btw, the gains in your DynArray wrapper with a capacity come from that aspect as well.

    When rewriting DynArraySetLength, you don’t get around the branch mispredictions (since there is still only one routine, with only one location for the jumps), but with your wrapper, there are multiple source locations for the Count comparison, which means they are a bit more context-specific, and thus better branch predicted.

    Branch misprediction can be spotted when profiling: if the most expensive line is an innocuous-looking trivial test, chances are this is a misprediction. Normal bottlenecks are arithmetic, read/write from memory, complex calls, etc. When a simple boolean test comes on top, it’s because it’s mispredicted.

    What may render dynamic arrays faster in Delphi should be the emission of code instead of using the RTTI (similar for record copy).

    Yes, this was reported way back in D5 or D7, alongside being able to have custom constructors/destructors for records, rather than the RTTI-based RTL ones (which are structurally slow).

    By the way, this is a reason similar to why TValue/variant types are slower

    You may want to check this 🙂
    http://www.cromis.net/blog/2013/02/tanyvalue-an-attempt-to-make-the-best-variable-data-container/

  4. You are right: they are not COW values, just reference counted. But not 100% thread safe!

    Of course, in your case, checking length() before SetLength() will be 10x faster, but I suspect it won’t exist in real world, unless your code is very bad written.

    The “a” value is always initialized to nil, during the function/method local stack initialization, just like any reference-counted variable. Check the generated asm code by the compiler.

    In all cases, branch mis-prediction is nothing in comparison to a ReallocMem(), a InitializeArray() or a move() – as run within DynArraySetLength(). TDynArray is not much faster because of branch prediction and CPU cache issues, but because it bypassed the reallocation. This is what a profiler (like your great sample profiler) shows on real code, not dedicated benchmarks like calling SetLength() in loop. Then accessing a dynamic array is a O(1) opcode, just like any array.

    I know the link in cromis.net – I just did not want to report the whole story.

    DWS array process is very good. But I still miss auto-detection of optimized HTML5 byte/ord/integer/int64 arrays in JavaScript compilation, which is not yet the case. An “array of integer” will be slower than with Delphi, I suspect, about memory use and data access.

  5. A. Bouchez :

    DWS array process is very good. But I still miss auto-detection of optimized HTML5 byte/ord/integer/int64 arrays in JavaScript compilation, which is not yet the case. An “array of integer” will be slower than with Delphi, I suspect, about memory use and data access.

    I’m monitoring progress with JS typed arrays, but at the moment they’re still slower than regular JS arrays, so they’re only beneficial for memory space (when large) or to talk with WebGL. Arrays benefit from quite a lot of optimizations in Chrome V8.

    With the JS compiler you already avoid issues of mixed type arrays (integers & floats f.i.) which are detrimental to v8 optimizations, and the compiler also always issues variable declaration with an initialization (from which V8 infers typing), but there are some other things to be aware of, like try..except/finally which turns off some JITing features.

  6. @Eric
    Thanks for the feedback about V8 inferring typing.

    I did read this some time ago, but it was out of my mind.

    I was wondering why all JavaScript code generated by DWS/SMS do initialize all arrays. Sounded like something not mandatory, and even worse – overweighting the code. But in fact, from the JIT point of view, such initialization is warmly welcome!

    Great work, Eric!

Comments are closed.