Efficient File Enumeration

Previous: 8dot3 file naming.

Getting around the 8dot3 names

To avoid the 8dot3 names overhead, the Windows API function to use is FindFirstFileEx, available since WinXP and Win2003, which allows to specify that you don’t care about 8dot3 names through the FindExInfoBasic option.

Note that this won’t solve the filtering issue, so Delphi-side masking is still necessary.

Jose_Barretto_8dot3_namingInterestingly enough you can also get rid of 8dot3 names at the volume level, Jose Barreto’s blog post describes the process.

This won’t just speed up file enumeration, but will drastically speedup other file system operations like creating or moving lots of files.

Whether the OS will generate 8dot3 names is defined by both a system (registry) and a volume setting. To check if it’s active on a volume you can use the FSUTIL command:

FSUTIL.EXE 8dot3name query D:

it’ll tell you if the setting is active in the registry and for the volume. You can turn it off with

FSUTIL.EXE 8dot3name set D: 1

This will only affect new files. For exitsint 8dot3 file name aliases, you can strip them with

FSUTIL.EXE 8dot3name strip /s /v D:\

That will make your volume incompatible with prehistoric software though, and if by default the newer server versions of Windows have 8dot3 names off by default for new volumes, you can’t rely on them being off in the wild.

Final remarks

FindFirstFileEx also supports FIND_FIRST_EX_LARGE_FETCH option, which is described in the WinAPI documentation as increasing performance when there are many files. But in my testing, I couldn’t find any case in which it was beneficial, and it even decreased performance when there were few files to be enumerated.

Another option I investigated was FindExSearchLimitToDirectories, which is said to be an advisory flag to only enumerate directories, it’s said to work only on some file systems, but I couldn’t find any on which it did work.

When all is said and done, I’ve found dwsXPlatform.CollectFiles can be from two to ten times faster than the XE version of TDirectory.GetFiles. The lower ratio being when everything is already in cache and CPU is the limiting factor, and the higher ratio being on busy volumes where 8dot3 names are active.

 

7 thoughts on “Efficient File Enumeration

  1. FIND_FIRST_EX_LARGE_FETCH should help if you access a network share and the directory has many entries. But it only works for Windows 7 / 2008R2 or newer.

  2. For what it may be worth, I just tried using FSUTIL to strip 8dot3names from partitions on an external hard drive on Win7 64. It was busy for a time, then I got a dialog reporting it had stopped working. Happened in all attempts, on four different partitions, on two drives. And yes, I ran CMD as administrator. I do not have time to pursue this at the moment, but just wanted to let you know that there may be issues….

  3. For me using a lot of threads (> number of available cores) to crawl directories was beneficial. Currently I use up to
    (number of available cores) * 4 threads.
    For e.g. finding all *.pas files on my disk the 8dot3 names didn’t really make a significant difference, but for sure any improvement is welcome…

  4. @Andreas Hausladen I suppose network performance comes into play, I tested on a 1 Gb LAN and large fetches weren’t beneficial there either.

    @Bill Meyer I had that happen on my main system drive, but data drives were okay once applications got closed.

    @Andreas Dorn how many pas and folders do you have? using “dir *.pas /s” f.i. should give you that at the end, here I’m testing against 6400 files in 2500 folders (for the main branch).

  5. The threading and caching wobble probably skewed my unscientific measurments… I tested with 10000/3500 and 1000/100 pas-files/folders. Now that I took some more time to test:

    For 10000 files removing 8dot3 takes me down from about 250 ms to 220 ms, so it’s definitely a measurable improvement. The 1000 files are very fast, so it’s difficult for me to measure something noticable there.

    Searching still takes a lot of time, I hope there is some more room for improvements…

  6. @Eric These were data drives.So unless it’s a requirement that no apps at all be open, I am still at a loss.

  7. Wouldn’t it be (much) faster if instead of using a TMask instance, you used : “if (ExtractFileExt(filename) = MaskFileExt)”, whereby “MaskFileExt := ExtractFileExt(filemask)” is calculated once beforehand ?

    (It won’t work when there’s a wildcard in the extension – perhaps use your implementation for that scenario?)

Comments are closed.