Remove readDirStreamWithPtr; replace readdir_r() with readdir() #349
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This replaces all use of
readdir_r()
withreaddir()
, and removesSystem.Posix.Directory.Internals.readDirStreamWithPtr
as a consequence. The rationale is 4-fold: safety, portability, performance, and maintainability.Safety
According to the
readdir_r(3)
manpage, it's unsafe because it's impossible to compute the maximum filename length, hence the size of the buffer to be allocated. Currently we callpathconf(".", _PC_NAME_MAX)
whenNAME_MAX
isn't defined. However, that's the longest filename a process is allowed to create, not the longest than can already be there. Thefpathconf(3)
manpage explicitly statesEven if
NAME_MAX
is defined, the glibc manual statesThe
readdir_r(3)
manpage goes on to say that glibc may fail withENAMETOOLONG
, but implies there's no way to detect that until the final dirent is read. On other systems,d_name
may be truncated, or worse, not null-terminated.The original reason for
readdir_r()
was to provide a reentrant version ofreaddir()
. However, "in modern implementations"readdir()
is thread-safe, except for concurrent readers to the sameDIR *
stream. In that specific case it says the user should do their own synchronization. This PR adds a warning to that effect. It's expected (by whom? I don't know) that a future POSIX.1 version will makereaddir()
thread-safe (including to the sameDIR *
, I guess?).Portability
readdir_r()
is deprecated by glibc for the reasons listed above.Performance
I compiled directory-ospath-streaming against unix before and after these changes and executed a simple benchmark counting files recursively under my home directory (about 500K). (Here is the change in the streaming library for the after benchmark.)
readDirStreamWithPtr
readdir_r()
readDirStreamWith
readdir()
That 1% improvement was oddly consistent for me, even though tasty-bench claimed ±15ms of error. What's important here is that it isn't worse. This makes sense. You're not supposed to free the pointer returned by
readdir()
because it "may be statically allocated". My guess is a buffer is allocated byopendir()
or the first call toreaddir()
, and is freed when the kernel cleans up the file descriptor. If byopendir()
, then it's a sunk cost—either way, a user isn't going to do any better by attempting to manage their ownstruct dirent
(notwithstanding the safety issues).For
readdir_r()
, Haskell adds another small cost by allocating thePtr (Ptr CDirent)
out param.Maintainability
When you glance at
readDirStreamWith
versusreadDirStreamWithPtr
and their types, you might think: "ThePtr
one is more low-level for when I want to manage memory myself for extra performance/control; the other one probably does that for me each time, trading control for simplicity." Indeed, currentlyreadDirStreamWith
does do that, and is defined in terms ofreadDirStreamWithPtr
.That intuition is understandable. But the thing being allocated is simply the extra bookkeeping for reentrancy required by
readdir_r()
. Currently the note says,This is not true. The pointer to that pointer is freed, but the
Ptr CDirent
itself is not. It's moot anyway since this PR reimplementsreadDirStreamWith
in terms ofreaddir()
. The note is replaced accordingly.Also note that the above change to directory-ospath-streaming means the "cache" mechanism there can likely be removed. It's only there for the
Ptr (Ptr CDirent)
argument. I'm curious if this leads to more small performance gains.What I'm getting at in terms of code maintainability is
readdir()
iscc @sergv