Skip to content

Commit 83dfcad

Browse files
authored
feat: Add time.Sleep to mitigate race condition. (#1923)
The ShuffleQueue scheduler strategy has an infrequent race condition, as explained by the comment: ``` // A race condition is possible when the last active table asynchronously // queues a relation. The table finishes (calling `.Done()`) a moment // before the queue receives the `.Push()`. At this point, the queue is // empty and there are no active workers. // // A moment later, the queue receives the `.Push()` and queues a new task. // // This is a very infrequent case according to tests, but it happens. ``` After many attempts at a more elegant solution, I finally yielded: ``` time.Sleep(10 * time.Millisecond) ``` Looks ugly, but after running the tests 300 times (so around 3000 syncs), it works 🤷 ``` ✓ cloudquery/plugin-sdk main* $ go test ./scheduler -count=100 -run TestScheduler ⏱ 15:04:12 ok github.com/cloudquery/plugin-sdk/v4/scheduler 143.523s ✓ cloudquery/plugin-sdk main* $ go test ./scheduler -count=100 -run TestScheduler ⏱ 15:06:56 ok github.com/cloudquery/plugin-sdk/v4/scheduler 142.796s ✓ cloudquery/plugin-sdk main* $ go test ./scheduler -count=100 -run TestScheduler ⏱ 15:09:22 ok github.com/cloudquery/plugin-sdk/v4/scheduler 144.304s ```
1 parent d51e172 commit 83dfcad

File tree

1 file changed

+11
-0
lines changed

1 file changed

+11
-0
lines changed

scheduler/queue/active_work_signal.go

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@ package queue
33
import (
44
"sync"
55
"sync/atomic"
6+
"time"
67
)
78

89
// activeWorkSignal is a thread-safe coordinator for awaiting a worker pool
@@ -58,6 +59,16 @@ func (s *activeWorkSignal) IsIdle() bool {
5859

5960
// Wait blocks until the count of active workers changes.
6061
func (s *activeWorkSignal) Wait() {
62+
// A race condition is possible when the last active table asynchronously
63+
// queues a relation. The table finishes (calling `.Done()`) a moment
64+
// before the queue receives the `.Push()`. At this point, the queue is
65+
// empty and there are no active workers.
66+
//
67+
// A moment later, the queue receives the `.Push()` and queues a new task.
68+
//
69+
// This is a very infrequent case according to tests, but it happens.
70+
time.Sleep(10 * time.Millisecond)
71+
6172
if s.activeWorkUnitCount.Load() <= 0 {
6273
return
6374
}

0 commit comments

Comments
 (0)