Previous: Algorithms and Memory Managers
Memory Usage
It is also interesting to compare memory usage.
- FastMM4 is the winner here as it only used 23,7 MB.
- SynScaleMM is a big memory hog and it used 61,6 MB.
- NN was using about 50% less than SynScaleMM – 31,5 MB – which is good but still a lot more than FastMM.
Conclusion
So, what to choose? Or, rather, what not to choose.
SynScaleMM can be really fast but uses lots of memory and is, sadly, very unstable. It even managed to crash this simple benchmark once. I would definitely not use it in production.
NN can be very fast but it is also extremely slow while concatenating strings. It will also work only in 32-bit applications because of small parts of assembler sprinkled through the code. Both those deficiencies can be easily fixed once it is released as open source.
For the time being I’ll stick with FastMM. It also has the best debugging capabilities of them all.
My dream scenario? Make NN open source and then convince Pierre to run a Kickstarter to finance integrating good features of NN into FastMM4 (which could then probably be called FastMM5). I know I would cash out few hundred bucks to see this happen.
Some of the benchmark screens are missing the title so it is hard to tell what they are benchmarking…
You could also include to your test this memory manager: https://code.google.com/p/scalemm/ (not the same as syncscalemm).
BTW: nice idea with Pierre on a Kickstarter!:)
A important thing you need to know, is that SynScaleMM is actually the same as ScaleMMv1, which runs on top of FastMM (which explains similar performance). However, my initial Proof of Concept (v1) only contained a memory manager per thread for small memory (1mb) is always directly requested from Windows (same as FastMM).
(I also tried to make ScaleMM3 with a very different approached, but it turned out to be much slower…)
I was also busy with testing on my Quad core, and ScaleMM2 scaled (almost) linear!
I also tested it with Google’s TCmalloc, which has a similar performance as SMM2 (but never releases it memory to Windows!)
The MSVCRT MM (Win7) seems to scale fine with stringbuilder (but being slower than all others). But with trivial string it performs very bad! I think it has the same problem as NN: it does a full realloc everytime. Whereas FastMM, ScaleMM and TCmalloc do some kind of “smart capacity” expanding (e.g. alloc 25% more space so need to do a full realloc+move for every byte!).
I will mail you the necessary sources (and my results so far)
At least it shows that strings are memory manager bound in Delphi: with the default mm (fastmm) it stays at 25% cpu on my quad core with 8 threads due to the global lock of fastmm. But with other MM’s it will reach 100% cpu, so making full usage of all cores!
Fascinating! I can make a guess about what NN is – a well-known memory manager that begins with N? 😉
I have been working on a new memory manager myself for some time, although it’s been on the back-burner for a few months while traveling. It aims to have good multithreaded performance, ie it’s designed from the outset for a situation where many threads allocate and free at the same time. Unfortunately it’s not done yet, not even to a beta state. However, I will try to find time to continue working and run your performance tests using it…
Nice article!
BUT there is an other factor that is important : fragmentation. For long running / memory hog applications this can be a problem also (can cause OutofMemory errors even ICO plenty free memory)
Unfortunately I don’t know how can measue it…
Please do not use SynScaleMM, which is a Proof Of Concept, never to be used on production.
Try ScaleMM2 which is much more stable and also fast/tuned.
I just checked the source code.
I would have rather written in this case:
function UseTextWriter : String;
var
i : Integer;
tw : TTextWriter;
st: TRawByteStringStream;
begin
st := TRawByteStringStream.Create;
tw := TTextWriter.Create(st,65536);
try
for i := 1 to NB do begin
tw.AddString(#13#10'Eating apple #');
tw.Add(Int64(i));
end;
tw.Flush;
Result:=Ansi7ToString(st.DataString);
finally
tw.Free;
st.Free;
end;
end;
Since the default buffer may be too small for such generation.
What is pretty “unfair” in the comparison is that you include a UTF-8 to Unicode conversion during the test, only for TTextWriter!
Perhaps using Ansi7ToString() may be a bit faster (even if our UTF-8/Unicode conversion is pretty optimized).
But in all cases, other classes DID NOT do any such conversion.
You are comparing apples with oranges, here.
All those drawings are pretty nice, but…
Which kind of program will do a fixed pattern of string + number concatenation in loop in all threads at once?
A benchmark. Only a benchmark.
More general tests as we use in our regression and performance tests (including JSON creation of several kind of data, JSON parsing, HTTP client/server, RTTI access, caching, search, database backend with disk read/write, logging, with up to 50,000 concurrent clients, IOCP and a thread pool).
What I like very much is feedback for mORMot users using it on production – like http://synopse.info/forum/viewtopic.php?pid=4732#p4732
My current challenge is to provide some code to http://www.techempower.com/benchmarks/
I’m adding MVC support to mORMot currently, using JavaScript BTW.
Here we will see how it works. In the real world…
🙂
Thanks for taking the time to do the comparisons and write them up. Unfortunately, using different scales on your graphs makes it difficult to appreciate the actual differences. This is a cardinal sin of visually representing quantitative information. If you’re interested in how to present information visually, then I highly recommend reading some of Edward Tufte’s book, such as “The Visual Display of Quantitative Information”. Link below.
http://www.edwardtufte.com/tufte/books_vdqi
Interesting stuff. We use the nexusdb memory mananger in FinalBuilder and Automise. We have a bunch of benchmark/test FinalBuilder projects (which exercise the stepping engine with multiple threads), and for those projects the nexus memory manager typically performs twice as fast as FastMM4.
I found a small bug in the test source: in
function UseWOBS : String;
, the linewobs.WriteString(i);
won’t compile since i is an integer, and in what I think is the latest version of DWS, which I just downloaded, there is no overload for integers, only strings.I replaced it with
wobs.WriteString(IntToStr(i));
instead.@A. Bouchez IMHO TTextWriter cannot be used because its not UTF16 ready there for TStringBuilder is still the clear choice for us until Embarcadero improves it.