Skip to content

A GC triggered while a thread is blocking on the GIL lock can freeze julia forever #627

Open
@NHDaly

Description

@NHDaly

Affects: PythonCall

Describe the bug

The current implementation of allowing more than one Julia thread to directly attempt to lock the GIL, which can block the entire julia OS thread if the GIL is already locked, can hang julia forever because GC will fail to reach a safepoint.

Here is a very simple reproduction, adapted ever so slightly from the documentation:

julia> using PythonCall

julia> PythonCall.GIL.@unlock Threads.@threads for i in 1:4
         PythonCall.GIL.@lock (GC.gc(); pyimport("time").sleep(5))
       end
..........................
.... hanging forever .....
..........................

Your system

julia> versioninfo()
Julia Version 1.10.2+RAI
Commit 52dcbad168* (2025-03-27 18:26 UTC)
Platform Info:
  OS: macOS (arm64-apple-darwin24.3.0)
  CPU: 12 × Apple M2 Max
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, apple-m1)
Threads: 4 default, 0 interactive, 4 GC (on 8 virtual cores)
Environment:
  JULIA_SSL_CA_ROOTS_PATH =

julia> Pkg.status()
Status `~/.julia/environments/v1.10/Project.toml`
  [69d22d85] About v1.0.2
  [1520ce14] AbstractTrees v0.4.5
  [6e4b80f9] BenchmarkTools v1.6.0
  [a2441757] Coverage v1.6.1
⌅ [f68482b8] Cthulhu v2.16.5
  [31a5f54b] Debugger v0.7.13
  [ab62b9b5] DeepDiffs v1.2.0
  [fb4d412d] FixedPointDecimals v0.6.3
  [c27321d9] Glob v1.3.1
  [92ed2492] HeapSnapshotUtils v0.1.0 `https://github.com/RelationalAI/HeapSnapshotUtils.jl#main`
  [7ec9b9c5] Humanize v1.0.0
  [5903a43b] Infiltrator v1.9.1
⌅ [70703baa] JuliaSyntax v0.4.10
  [1fcbbee2] LookingGlass v0.3.3 `~/.julia/dev/LookingGlass`
  [bdcacae8] LoopVectorization v0.12.172
  [1914dd2f] MacroTools v0.5.16
  [85b6ec6f] MethodAnalysis v0.4.13
  [e4faabce] PProf v3.2.0
  [14b8a8f1] PkgTemplates v0.7.56
  [c46f51b8] ProfileView v1.10.1
  [92933f4c] ProgressMeter v1.10.4
  [9c30249a] RAI v0.2.9
  [817f1d60] ReTestItems v1.31.0
  [295af30f] Revise v3.8.0
  [aa65fe97] SnoopCompile v3.0.2 `~/work/jl_depots/raicode2/dev/SnoopCompile`
  [e2b509da] SnoopCompileCore v3.0.0 `~/.julia/dev/SnoopCompile/SnoopCompileCore`
  [ac92255e] Speculator v0.2.0
  [1e6cf692] TestEnv v1.102.1 `~/work/jl_depots/raicode3/dev/TestEnv`
  [e689c965] Tracy v0.1.4
Info Packages marked with ⌅ have new versions available but compatibility constraints restrict them from upgrading. To see why use `status --outdated`

I believe the correct solution is that we should adjust the lock and unlock functions to protect access to the GIL behind a Julia ReentrantLock, which is a cooperative lock and will cooperate with the runtime:

function lock(f)
state = C.PyGILState_Ensure()
try
f()
finally
C.PyGILState_Release(state)
end
end

This way, only one Julia OS thread will ever attempt to hold the GIL, since only one Task is allowed to proceed to call C.PyGILState_Ensure().

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions