Optimizing for memory: a tighter TList

One of the memory hogs when you have object trees or graphs can be good old TList and its many variants (TObjectList, TList<T>, etc.).

This is especially an issue when you have thousandths of lists that typically hold very few items, or are empty.
In the DWS expression tree f.i. there are quickly thousandths of such lists, for local symbol tables, parameter lists, etc.

How much does a TList cost in terms of memory?

A TList holding a single item already costs you:

  • 4 bytes for the field in the owner object
  • 20 bytes for the TList instance
    • 8 hidden bytes: Monitor + VMT pointer
    • 12 field bytes: data pointer + Count + Capacity
  • 4 bytes for the data

So that’s 28 bytes, with two dynamically allocated blocks which, and those dynamic allocations, depending on your allocation alignment settings, can cost you something like an extra dozen of bytes (even with FastMM).

What about the other TList variants?

  • TObjectList has an extra boolean field, which with alignment, costs an extra 4 bytes.
  • TList<T> has an instance size of 28 bytes, and a dynamic array storage with 8 hidden extra bytes (4 for length, 4 for the refcount).

Neither of these are better candidates for memory-efficient small lists.

Enter the TTightList

You can find TTightList in DWS’s dwsUtils unit.
For an empty list, the cost is 8 bytes, no dynamic memory, and for a list with a single item, 8 bytes still with no dynamic memory.
For a n-items list, the cost is 8 bytes plus one n*4 bytes dynamic block.

To achieve that, the TTightList makes use of two tricks:

  • it’s designed to be composed, and hosted as an object field
    • it’s a record-with-methods, not a class, but retains classic-looking use semantics (.Add, .IndexOf, .Clear etc.).
    • eliminates the need for a pointer to the instance in the host object
    • eliminates the dynamically allocated storage for the TTightList itself
  • if the Count is one, the array pointer itself points to the only item, rather than to a dynamically allocated block holding a pointer to the item.

The second trick is where we sacrifice a bit of performance, to save one dynamic allocation for lists holding a single item. Though if you benchmark the TTightList, you’ll see it holds its own fairly well against TList for the smaller item counts, which is what it was designed for.
That’s partly thanks to TList‘s own inefficiencies, and FastMM’s in-place reallocation (on which TTightLight relies, since it doesn’t maintain a capacity field).

13 thoughts on “Optimizing for memory: a tighter TList

  1. Good point.

    I used the record/object trick in some projects.
    It scales pretty well, especially in order to maintain classic-looking methods (like Add/IndexOf/Clear…).
    See http://synopse.info/forum/viewtopic.php?id=80

    As I commented in a recent post of your Blog, I already saw that all these TList creations were stressing FastMM4, in the DWS expression tree.

    Resizing is a bit aggressive, because it relies on FastMM4, and its in-place reallocation… a capacity field be not a bad idea, for example by storing the capacity as integer(FList[0]). Therefore, you could use a more efficient reallocation mechanism, depending on DWS requirements.

    Congratulations about taking the time in order to increase speed and reduce memory usage of your library!
    It’s not so common these days, when every one speak about DotNet, Java and garbage collector.
    🙂

  2. Is this really necessary? If your TList uses 28 bytes, plus an extra dozen because of dynamic block allocation, that’s 40 bytes per instance. You mentioned using thousands of lists. If you use 1024 such lists, that’s only 40 KB of overhead. That might have been significant 15 years ago, but today it really feels like premature optimization, especially because you lose some performance and a lot of scalability, and you’re no longer able to pass your list to a routine that expects a TList.

  3. @Mason Wheeler

    I fully agree. The overhead may be a bit of a problem for a small list, but lists are not really designed to contain only one element. And 40 bytes are not much, these days. Of course, it also depends on what elements the list contains.

  4. Well, it is quite significant in practice, and more than the byte count alone would indicate. I introduced it after diagnosing issues in server-side usage scenarios, where IIRC the gains were about 30% in memory usage and 50% in execution speed.
    Keep in mind that in DWS, every function call means at the very least one list, and most functions have very few parameters (most setters have only one f.i.). This means that an IntToStr call f.i. can be less than half the size with a TTightList than with a regular TList. You typically have multiple functions calls per script line.
    A second thing is that those dynamic allocations aren’t just not free in terms of memory usage, but you have to pay in indirections, multiple cache lines with a higher waste ratio and extra memory accesses.
    Another thing to keep in mind is that for performance, you have to fit in L1/L2 cache, on a server, that L2 is shared across all threads and processes. That you have 16 GB of RAM isn’t very relevant if your code (across all threads) doesn’t fit in L2, for script calls, those lists are effectively par of the code.
    The last point is that you actually don’t lose any scalability compared to TList, because, well, TList and its variants aren’t very scalable in the first place (they’re actually quite poor as scalable containers go, but that’s another story), and a TTightList, in the conditions here (allocated once, accessed many times), can actually run circles around TList. Also, you can pass a TTightList around, either directly (as a var, as a pointer to a record, as an enumerator…), and DWS actually does that in a few places.

  5. The initial capacity of a TList is 0 (zero). Adding an initial item to it triggers an initial “Grow” operation. The allocation scheme of a TList means that the Capacity of a TList with a single item (that has not been specifically subsequently sized using the Capacity property) will allocate 16 bytes for the data but be using only 4 bytes of that, making the total memory use: 52 bytes, of which only 40 are actually needed.

  6. As an “on the spot” illustration of “why memory size matters”, I’m currently experimenting with constant unification in the DWS code.

    On the recent benchmark, where there is only one constant (the 2 in the “*2”), merely unifiying that integer results in the higher line count scenarios a memory usage reduction of 14%, and a runtime speedup of 7% (thanks to improved cache efficiency).

  7. I always try to keep the memory usage at minimum in all my projects not to mention the execution speed of a method. We(as developers) need to keep in mind that our customers do NOT own a similar system, i.e. I have a i7 @ 3 Ghz with 6 GB(soon 12) of ram, some my customers have a Intel Pentium 4 with 512 Mb or 1 Gb of ram, therefore I need to take into consideration how much memory and processor my application will consume. Feel free to disagree with me, BUT keep in mind that company owners invest at each 4/5 years in equipment — therefore most of our clients have somewhere between 512 Mb and 1 GB of ram.

  8. @Dorin Duminica
    I recently bought a netbook, slow Atom processor and 1 GB of RAM, shared with the video card. It works okay with all the well designed applications, and helps you quickly sort out the well designed ones from the hogs 😉

  9. Hi Eric,

    it sounds great to replace TList with TThightList for small lists!
    I wanna make a replacement test but keep the code compatible with existing TList for future versions. It would be nice to keep TThigtList compatible as much as possible with TList. My enhancement proposals are:

    TTightList = record
    private
    function Get(Index: Integer): Pointer;
    procedure Put(Index: Integer; Item: Pointer);
    public
    procedure Destroy; // dummy destructor, just cleanup
    function IndexOf(item : Pointer) : Integer; // would be nice to see
    property Items[Index: Integer]: Pointer read Get write Put; default; // this works for D 2007
    end;

    implementation

    uses
    RtlConst;

    // use SListIndexError for localisation

    procedure TTightList.RaiseIndexOutOfBounds(Index: Integer);
    begin
    raise Exception.CreateFmt(SListIndexError, Index);
    end;

    function TTightList.Get(Index: Integer): Pointer;
    begin
    if Cardinal(index)>=Cardinal(Count) then
    RaiseIndexOutOfBounds(Index)
    else
    Result:= FList[Index];
    end;

    procedure TTightList.Put(Index: Integer; Item: Pointer);
    begin
    if Cardinal(index)>=Cardinal(Count) then
    RaiseIndexOutOfBounds(Index)
    else
    FList[Index]:= Item;
    end;

    procedure TTightList.Destroy;
    begin
    Clear;
    end;

    procedure TTightList.Sort(Compare: TListSortCompare);
    begin
    if (FList nil) and (Count > 0) then
    QuickSort(FList, 0, Count – 1, Compare); // Quicksort from Classes.pas
    end;

    {
    I have many Lists with few items and happy to see what replacemant brings. Let me know what do you think about..

    btw: This code crashes:
    MyThightList.Add(nil);
    MyThightList[0]:= MyTestPointer; // crash here FList is nil

    Best regards
    Dirk
    }

Comments are closed.