Archive

Posts Tagged ‘JavaScript’

First look at XE2 floating point performance

September 2nd, 2011

With XE2 now officially out, it’s time for a first look at Delphi XE2 compiler floating point performance (see previous episode).

For a first look I’ll reuse a Mandelbrot benchmark, based on this code Mandelbrot Set in HTML 5 Canvas. What it tests are double-precision floating-point basic operations (add, sub, mult) in a tight loop, there is relatively little in the way of memory accesses (or shouldn’t be, to be more accurate).

You can find the source code see there, it compiles pretty much straight away in XE2 (just comment out  the asm for Win64).

NOTE: when this article was originally posted, I had stumbled upon an XE2 Trial version “trap” (or feature?) which basically deactivated Win64 optimizations as defined through the project options. Kenji Matumoto pointed the issue, and this is an updated article where I used {$O+} in the code to “force” optimizations. The outcome is a *much* prettier picture, I’m happy to say! Reservations from the initial articles are gone, good job Embarcadero!

edit 05/09, after further tests, I’m adding one reservation single-precision floating point doesn’t look so hot. More on the subject there.

Benchmark results

Without further ado, here are the raw figures on my machine for the 480 x 480 case, keep in mind the Delphi versions do NOT use Canvas.Pixels[], but direct memory access in an array:

Execution time in milliseconds, lower is better

Or if you prefer hard figures:

  • Delphi XE2 – 32 bits: 193 ms
  • Delphi XE2 – 64 bits: 67 ms — fastest Delphi
  • Delphi XE: 196 ms
  • FireFox 6: 121 ms
  • Chrome 13: 74 ms
  • (out of competition: XE 32bit hand-made assembly: 57 ms)

So what gives?

  • XE2 32bit compiler still uses the old FPU code, the performance delta with XE is minimal and could just be an alignment issue (pseudo-random, since the compiler doesn’t pro-actively align). Let’s hope the SSE2 codegen will be retrofitted in XE3.
  • XE2 64bit compiler get a nice boost from using SSE2, allowing it to catch up and overtake all JavaScript JITters.
  • Chrome V8 makes a good showing in this benchmark, but loses the crown, native Delphi is back on top!

A peek under the hood

What does the compiler generate for the two following lines?

x := x0 * x0 - y0 * y0 + p;
y := 2 * x0 * y0 + q;

Once you pop up the CPU view, you’ll see:

FMandelTest.pas.193: x := x0 * x0 - y0 * y0 + p;
00000000005A1452 660F28C4         movapd xmm0,xmm4
00000000005A1456 F20F59C4         mulsd xmm0,xmm4
00000000005A145A 660F28CD         movapd xmm1,xmm5
00000000005A145E F20F59CD         mulsd xmm1,xmm5
00000000005A1462 F20F5CC1         subsd xmm0,xmm1
00000000005A1466 F20F58C2         addsd xmm0,xmm2
FMandelTest.pas.194: y := 2 * x0 * y0 + q;
00000000005A146A 660F28CC         movapd xmm1,xmm4
00000000005A146E F20F590DA2000000 mulsd xmm1,qword ptr [rel $000000a2]
00000000005A1476 F20F59CD         mulsd xmm1,xmm5
00000000005A147A F20F58CB         addsd xmm1,xmm3

And further down the code, the compiler makes use of xmm8, so it’s really aware of the 16 xmm registers you have in x86-64, and finally keeps floating poitn value in registers, something the 32bit compilers (both XE & XE2) don’t do.

To what does it lose to the hand-made asm version? Well a handful of minor things:

  • even though it used up to 9 xmm registers, it didn’t use 10th, leaving some memory access
  • with more careful allocation, it could have fit everything in 8 xmm registers, which would have cut unnecessary traffic
  • it zeroes register with a move from memory,  didn’t do constant unification or propagation.

Still those are mostly nitpickings compared to the massive issues of the old FPU code compilation (which, alas XE2 – Win32 still suffers from).

Conclusion

Support for SSE2 in XE2 64bit compiler consists in a significant step ahead for Delphi floating point performance. XE2 32bit is still same old.

If you’re doing heavy floating point maths, XE2 64bit compiler is a simple ticket to much better performance.

Hopefully in Delphi XE3 they will retrofitting the SSE2 codegen into the 32bit compiler, but ad interim it should quell all the critics about “we don’t need no 64bit”, well, if you do any significant floating-point maths, Delphi XE2 64bit is a must!

News , , ,

DWScript to JavaScript

July 26th, 2011

Not exactly breaking news for those following the OP4JS news or the DWS SVN, but a new experimental set of classes is available for DWScript, which allows compiling DWScript source into JavaScript.

This allows to have Pascal code like this one for instance, be compiled into this html page (or see the outcome in jsfiddle), and be executable client-side by any modern browser (the demo uses HTML5′s Canvas). In the DWS source repository, you’ll find it in “MandelbrotJS” (requires Delphi Chromium Embedded to run).

The goal is to allow using a strongly typed, compile-time checked language in a Web-client environment. The code generation is also intended to be as lightweight as possible, without depending on a huge framework, and generate quite readable-looking JavaScript.

In the classic Delphi spirit, it’s all about allowing both a high-level usage, while still being open to low-level usage whenever you wish or need  to.

The compiled Pascal functions can be used for DOM events or called from JS, or vice-versa, and you are f.i. able to use it only for the more complex routines or libraries for which straight JavaScript’s lack of strong typing and prototype-based objects would make it a developer-intensive and bug-prone approach.

This is still work in progress, only a (growing) subset of the DWS runtime library functions are supported at the moment, but most of the language is in working condition, including var parameters, classes, meta-classes, virtual methods & constructors, exceptions, bounds checking, contracts, etc. Currently, more than 85% of the DWS language & rtl unit tests pass (most of those not passing are related to Variants, destructors & ExceptObject).

The JS CodeGen can be invoked directly or via DWS filters, so you can have a single-source DWS code with portions running either server-side or browser-side.

FWIW, the DWS CodeGen classes were originally intended for compiling to SSE2-optimized floating point, either directly to x86-64 or via LLVM, but JavaScript is at the moment opening more opportunities, and modern JS engines are making decent use of SSE2 already. Last but not least, in the near term, it’s probably best to let the dust of the upcoming Delphi XE2 settle a bit ;-)

News ,

Delphi ChromiumEmbedded

May 12th, 2011

The ChromiumEmbedded project just released r231 and DelphiChromiumEmbedded has been updated by Henri Gourvest almost immediately (!), and you can grab the code directly from the SVN.

In this post 2 days ago, I was remarking that Chrome 11 was now even faster than FireFox4, well, it appears that further progress has been made in the WebKit and V8 engine source CEF is based one, as CEF r231 runs the Mandelbrot benchmark another 20% faster than Chrome 11, meaning that JavaScript can now handle that floating point computation almost 3 times faster than vanilla Delphi XE code (not using TCanvas, which would make Delphi look horribly worse).

FWIW you can find in the DWS SVN the first very early prototype of a DWScript to JavaScript CodeGen, whose ultimate goal is to allow running Object Pascal code in a JavaScript environment. More on that sub-project in the next weeks/months as things unfold.

News ,

Delphi for JavaScript

May 10th, 2011

A while back, I posted of FireFox 4 JavaScript engine running around Delphi when it came to floating point performance on the Mandebrot set, since then, Chrome got updated to version 11, and further raised the bar by beating FireFox by about 20% in that benchmark. That’s no mean feat: current generation JavaScript engines run not just faster than Delphi, but also .Net and a slew of other compilers, native or not, when it comes to floating point. Only state of the art native compiler still resist.

The figures for Delphi 64 are still unknown, but it’ll face a challenge merely matching the floating point performance of JavaScript, and if the VCL’s TCanvas hasn’t been revamped from the ground up, chances are that out of the box, Delphi 64 won’t be able to beat the HTML5 Canvas on performance (not to mention in features, where HTML5 Canvas is also leading by a few miles).

Jon L Aasenden is investigating an Object Pascal For JavaScript (OP4JS), with mobile devices in sight (if you don’t already know about it, you may also want to check PhoneGap). My experiments with the mobile WebKit that powers iPhone & Android browsers have been very positive, though some library are still a bit bloated for current hardware, using CSS3, HTML5 & libraries like XUI, it’s possible to design some excellent interactive UIs, in reasonably little time. Given the rate of improvements, in 1-2 years, libraries like jQuery Mobile should run smoothly on all the hardware being sold.

WebGL Aquarium

And add to that upcoming goodies, like WebGL, and JavaScript + HTML5 is step by step, with little fuss, despite all its shortcomings, becoming a universal platform with high performance potential. One could only wish JavaScript weren’t a dynamic language, but hey, after all, the x86 instruction set became prevalent despite its shortcoming too, and will still be serving in the 64bit era for the foreseeable future.

Even on the Windows desktop, it is IMHO becoming increasingly questionable to base your UIs on anything else than HTML5 & CSS, the alternatives are not only more proprietary, but either looking responsive but dated (like unskinned VCL, WinAPI controls), or outright messy and sluggish (WPF).

Chromium LogoRight now, ChromiumEmbedded allows you to embark Webkit + Chrome V8 engine, which will work across the board with no update or dependency issues (unlike IE9), using Henri Gourvest’s Delphi ChromiumEmbedded, you can integrate it into your Delphi applications, and use it as an alternative to VCL-based controls for many aspects of an application’s UI.

Tips ,

Kudos to the Firefox 4 TraceMonkey team!

March 24th, 2011

I’ve been quite impressed with the JavaScript floating point performance in FireFox 4, which puts the Delphi compiler to shame. See for yourself this fractal rendering demo:

Mandelbrot Set in HTML 5 Canvas

I’ve made a version of the same code in Delphi XE (source + pre-compiled executable, 331 kB ZIP), and on my machine here, for the 480×480 resolution, where FireFox 4 gets the default view rendered in 124 ms, where the “regular” Delphi version, which is limited to the old FPU, takes about 200 ms

It takes manually SSE-enhanced Delphi code to get back on top with a 87 ms render time. It’s quick non-optimized scalar SSE code sure, and could likely be improved, but the point remains that without asm, Delphi XE’s native compiler trails TraceMonkey in the floating point department…

So Embarcadero, how is that Delphi 64 version coming? is it properly SSE-enabled?

News , ,