Description
Affects: PythonCall
Describe the bug
The current implementation of allowing more than one Julia thread to directly attempt to lock the GIL, which can block the entire julia OS thread if the GIL is already locked, can hang julia forever because GC will fail to reach a safepoint.
Here is a very simple reproduction, adapted ever so slightly from the documentation:
- The docs suggest to run this example: https://juliapy.github.io/PythonCall.jl/stable/pythoncall/#jl-multi-threading
- Here, we adapt it by only adding a call to trigger GC while the GIL is held:
julia> using PythonCall
julia> PythonCall.GIL.@unlock Threads.@threads for i in 1:4
PythonCall.GIL.@lock (GC.gc(); pyimport("time").sleep(5))
end
..........................
.... hanging forever .....
..........................
Your system
julia> versioninfo()
Julia Version 1.10.2+RAI
Commit 52dcbad168* (2025-03-27 18:26 UTC)
Platform Info:
OS: macOS (arm64-apple-darwin24.3.0)
CPU: 12 × Apple M2 Max
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-15.0.7 (ORCJIT, apple-m1)
Threads: 4 default, 0 interactive, 4 GC (on 8 virtual cores)
Environment:
JULIA_SSL_CA_ROOTS_PATH =
julia> Pkg.status()
Status `~/.julia/environments/v1.10/Project.toml`
[69d22d85] About v1.0.2
[1520ce14] AbstractTrees v0.4.5
[6e4b80f9] BenchmarkTools v1.6.0
[a2441757] Coverage v1.6.1
⌅ [f68482b8] Cthulhu v2.16.5
[31a5f54b] Debugger v0.7.13
[ab62b9b5] DeepDiffs v1.2.0
[fb4d412d] FixedPointDecimals v0.6.3
[c27321d9] Glob v1.3.1
[92ed2492] HeapSnapshotUtils v0.1.0 `https://github.com/RelationalAI/HeapSnapshotUtils.jl#main`
[7ec9b9c5] Humanize v1.0.0
[5903a43b] Infiltrator v1.9.1
⌅ [70703baa] JuliaSyntax v0.4.10
[1fcbbee2] LookingGlass v0.3.3 `~/.julia/dev/LookingGlass`
[bdcacae8] LoopVectorization v0.12.172
[1914dd2f] MacroTools v0.5.16
[85b6ec6f] MethodAnalysis v0.4.13
[e4faabce] PProf v3.2.0
[14b8a8f1] PkgTemplates v0.7.56
[c46f51b8] ProfileView v1.10.1
[92933f4c] ProgressMeter v1.10.4
[9c30249a] RAI v0.2.9
[817f1d60] ReTestItems v1.31.0
[295af30f] Revise v3.8.0
[aa65fe97] SnoopCompile v3.0.2 `~/work/jl_depots/raicode2/dev/SnoopCompile`
[e2b509da] SnoopCompileCore v3.0.0 `~/.julia/dev/SnoopCompile/SnoopCompileCore`
[ac92255e] Speculator v0.2.0
[1e6cf692] TestEnv v1.102.1 `~/work/jl_depots/raicode3/dev/TestEnv`
[e689c965] Tracy v0.1.4
Info Packages marked with ⌅ have new versions available but compatibility constraints restrict them from upgrading. To see why use `status --outdated`
I believe the correct solution is that we should adjust the lock and unlock functions to protect access to the GIL behind a Julia ReentrantLock, which is a cooperative lock and will cooperate with the runtime:
Lines 23 to 30 in 08157cc
This way, only one Julia OS thread will ever attempt to hold the GIL, since only one Task is allowed to proceed to call C.PyGILState_Ensure()
.