Description
Context
It's possible for the thread to enter a signaled state and stop its execution without reaching the point where is drops its packet (or return a result at all).
On the other hand, the implementation of JoinHandle::join
is the following:
// https://github.com/rust-lang/rust/blob/master/library/std/src/thread/mod.rs#L1586
impl<'scope, T> JoinInner<'scope, T> {
fn join(mut self) -> Result<T> {
self.native.join();
Arc::get_mut(&mut self.packet).unwrap().result.get_mut().take().unwrap()
}
}
The first .unwrap()
assumes that the thread stopped execution normally, which leads to cryptic panic errors it eventually fails.
How to reproduce this?
One way of reproducing this is to have a Windows DLL with a libc::atexit
handler that joins thread created in the main application. If the application exits before the thread finishes execution, the there is a high change the above .unwrap()
fails. Otherwise no panic will be observed:
// lib.rs (cdylib)
pub static HANDLE: Mutex<OnceLock<JoinHandle<()>>> = Mutex::new(OnceLock::new());
#[no_mangle]
pub extern "C" fn init(n: libc::c_int) {
let handle = thread::spawn(move || thread::sleep(Duration::from_millis(n as u64)));
HANDLE.lock().unwrap().set(handle).unwrap();
unsafe { libc::atexit(handler) };
}
extern "C" fn handler() {
let handle = HANDLE.lock().unwrap().take().unwrap();
handle.join().unwrap();
}
// test.c
#include <stdlib.h>
#include <Windows.h>
extern void init(int);
int main(const int argc, const char *argv[argc])
{
init(atoi(argv[1]));
Sleep(atoi(argv[2]));
return 0;
}
Compile the above test.c
and link to the lib.rs
DLL, then run the following:
RUST_BACKTRACE=1 ./test 1000 100 || echo "info: test with 1s sleep in main and 0.1s sleep in thread failed"
./test 100 1000 && echo "info: test with 0.1s sleep in main and 1s sleep in thread succeeded"
Unfold output of the above script
thread '<unnamed>' panicked at /rustc/25ef9e3d85d934b27d9dada2f9dd52b1dc63bb04\library\std\src\thread\mod.rs:1517:40:
called `Option::unwrap()` on a `None` value
stack backtrace:
0: std::panicking::begin_panic_handler
at /rustc/25ef9e3d85d934b27d9dada2f9dd52b1dc63bb04/library\std\src\panicking.rs:647
1: core::panicking::panic_fmt
at /rustc/25ef9e3d85d934b27d9dada2f9dd52b1dc63bb04/library\core\src\panicking.rs:72
2: core::panicking::panic
at /rustc/25ef9e3d85d934b27d9dada2f9dd52b1dc63bb04/library\core\src\panicking.rs:144
3: core::option::unwrap_failed
at /rustc/25ef9e3d85d934b27d9dada2f9dd52b1dc63bb04/library\core\src\option.rs:1978
4: std::thread::JoinInner<tuple$<> >::join<tuple$<> >
at /rustc/25ef9e3d85d934b27d9dada2f9dd52b1dc63bb04\library\std\src\thread\mod.rs:1517
5: std::thread::JoinHandle<tuple$<> >::join
at /rustc/25ef9e3d85d934b27d9dada2f9dd52b1dc63bb04\library\std\src\thread\mod.rs:1650
6: rust_thread_race_condition::handler
at .\src\lib.rs:19
7: execute_onexit_table
8: execute_onexit_table
9: execute_onexit_table
10: dllmain_crt_process_detach
at D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\dll_dllmain.cpp:180
11: dllmain_dispatch
at D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\dll_dllmain.cpp:293
12: RtlActivateActivationContextUnsafeFast
13: LdrShutdownProcess
14: RtlExitUserProcess
15: ExitProcess
16: <unknown>
17: <unknown>
18: <unknown>
19: BaseThreadInitThunk
20: RtlUserThreadStart
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
info: test with 1s sleep in main and 0.1s sleep in thread failed
info: test with 0.1s sleep in main and 1s sleep in thread succeeded
This example is available in https://github.com/fuzzypixelz/rust-thread-race-condition. There is a simple workflow that can reproduce the above output on windows-2022
runners.
What to do about it?
I think a low hanging fruit here is replacing the .unwrap()
with a terse .expect(...)
error message.