Why no bytecode format?

A compiled script, a TdwsProgram, cannot be saved to a file, and will not ever be. Why is that?

This is a question that dates back to the first DWS, as it was re-asked recently, I will expose the rationale here.

  • DWS has a very fast compiler, there are no performance problems compiling scripts instead of loading a binary representation that has to be de-serialized. How fast is it? See below.
  • DWS lets you define custom filters, that enable you to encrypt your scripts easily, if hiding the script source is what you were after with the bytecode.
  • DWS compiler/parser portion is quite light (currently less than 75kB), especially compared to the size of the Delphi libraries you will be using for the runtime. You probably will not notice it in the EXE size once you expose more than a few trivial libraries.
  • Last but not least, when loading a binary representation of a script, you have to make sure all libraries are compiled into the application that loads and wants to execute the script, and that they are entirely backward-compatible with what was exposed to the script back when it was compiled. That is irrelevant when re-compiling.

How fast is the DWS compiler?

I did some quick benchmarking against PascalScript and Delphi itself.
I generated a script based on the following template:

var myvar : Integer;
begin
   myVar:=2*myvar-StrToInt(IntToStr(myvar));
end;

The assignment line being there only once, 100 times, 1000 times, etc. The result was saved to a file, and the benchmark consisted in loading the file, compiling and then running it for DWS. For PascalScript, the times are broken down into compiling, loading the bytecode output from a file, and then running that bytecode. Disk size indicates the size of the generated bytecode.
All times are in milliseconds (and have been updated, see Post-Scriptum below):

For line counts expected for typical scripts (less than 1000), compared to PascalScript, the cost of not being able to save to a bytecode is a one-time hit in the sub-15 milliseconds range, on the first run.
This illustrates why it is not really worth the trouble maintaining a bytecode version for scripting purposes, and that is also my practical experience.

For larger scripts, it is expected the execution complexity will dwarf the compile time: the benchmark code tested here doesn’t have any loops, anything more real-life will have loops, and will likely have a greater runtime/compiletime ratio.

What of Delphi?

For reference, I tried compiling the larger line counts versions with Delphi XE, from the IDE.

  • the 100k lines case took 3 minutes 27 seconds to compile (ouch!), obviously hitting some Delphi parser or compiler limitation. Runtime was 63 ms.
  • the 10k lines case in Delphi compiled in a more reasonable 2400 msec, and ran in 4 ms (50% faster than DWS).

What else? The DWS compiler has an initial setup cost higher than PascalScript, but as code size grows, it starts pulling ahead. That setup overhead will nevertheless bear some investigation 😉.
Once compiled, the 10x execution speed ratio advantage of DWS vs PascalScript is consistent with other informal benchmarks.

Post-Scriptum

Gave a quick look at the setup overhead with SamplingProfiler, and found two bottlenecks/bugs. The outcome was the shaving off of 3 ms from the DWS compile times, ie. the compile times for the 1, 100 and 1000 lines cases are now 0.95 ms, 2.85 ms and 19.1 ms respectively.

5 thoughts on “Why no bytecode format?

  1. Yeah, this all makes sense if you only want to run a script one time. But I have scripts that need to run in a loop, and I have scripts that may need to be called multiple times from multiple threads at the same time as each other, and this is where the compiler overhead really starts to show.

    In your tests, the PascalScript Load phase was consistently about 50x faster than DWS’s compile phase, including disc access. If bytecode representing the compiled script was already loaded into memory, it would be almost instantaneous.

    It’s hard to tell just from looking at the code to TdwsProgram whether it’s safe to run the same script in a loop on the same thread. It’s pretty obvious, though, that it’s *not* safe to run the same program in multiple threads at once. So that means a recompile every time. For larger scripts, that can be a noticeable performance hit.

    You might want to reconsider those numbers in light of real-world usage…

  2. You don’t have to recompile a script from scratch each time, the same TdwsProgram can be re-run multiple times.

    The constraint is indeed that a particular TdwsProgram instance can only be run by one thread at a time (and that the libraries you expose to the script are thread-safe, of course). That may or may not be relaxed at a later time, for the moment I have not felt the need in real-world cases.

    If you want to run multiple instances of the same script at the same time, you can recompile an extra TdwsProgram when needed, and then keep them around in a pool (that’s what I do).

  3. Nice results.

    But your code was fairly simple.
    Just one integer assignment and two external functions calls.

    What about a loop?
    What about string process, i.e. concatenation and using some pos() functions or such (string process is a common usage of scripting)?

    The compilation speed is good.

    But what about memory usage?
    As far as I remember, DWS executation is done by calling a multitude of classes instances, via their Execute method. For every OP, a class is created. I guess this is much more memory consuming than byte code.
    FastMM4 helped a lot DWS speed, according to pascal script. 🙂

  4. Loops would only have increased the runtime/compiletime ratio, the point here was to look at compile times.
    IME string processing is where you currently have the lowest performance delta between Delphi and DWS, and integer maths where you have the greatest. Generally speaking, the more complex the functions involved, the lesser the penalty of using a script vs using Delphi.

    Not sure to understand what you mean about FastMM4 and ‘according to pascal script’. The expression tree is essentially static, and during compilation, the highest allocation load still comes from the string temporaries (which I hope to get rid of at some point).

    In terms of memory usage, classes are a bit more expensive sure, and there is a bit more waste indeed (f.i. the monitor hidden field didn’t help), though things have improved from old DWS, and will likely still improve.

  5. FWIW, it seems that with current SVN version, the static memory usage for the compiled benchmark script (TdwsProgram vs a loaded TPSExec) is in favor of DWS lower for larger scripts, by about 7%. This is according to FastMM’s GetMemoryManagerState.
    I’m a bit surprised, it could be a measurement artifact, though this is worthy of further investigations…

Comments are closed.