- DelphiTools - https://www.delphitools.info -

The Mischievous Case-Insensitive Hash

In a comment to the previous article on a case insensitive hash code [1], Stefan Glienke pointed to an approach used in Spring4D’s comparers [2], which is a delightful hack.

Rather than converting the string to a “proper lower case”, it converts the string to an “approximate lower case” using an “or $20”, which happens to be good enough for a hash on string identifiers.

To figure the trick, one needs to check the ASCII Table [3].

ASCII Table

ASCII characters range in codes from 0 to 127, 7 bits, and were arranged in 4 groups of 32 characters ($20 in hexadecimal).

[3]

What’s happening

So what an “or $20” does is that on a character code is go from the control group to the numbers group, and from upper case group to lower case group. It leaves numbers & lower case groups characters unchanged.

Since we’re talking whole groups here, it’s not a “proper” lower case conversion:

It’s not proper but it doesn’t matter

However when looking at identifier strings, there will be no control characters, so invalid control group conversions are moot. And in the upper case group, there are only two potentially encountered characters in identifiers:

Given than the upper case conversion is meant to be fed to a hash function, and there is no collision, the “or $20” is as good as a proper upper case from an hash code collision point of view!

And it’s branchless, trivial to vectorize  and computationally as cheap as it gets.

The only “downside” is that after the previous round of optimization, the case insensitive hash was already no longer anywhere near being a bottleneck in DWScript, so while this implementation is much faster, it has not measurable effect on script compilation… at least for now! 🙂