Skip to content

Publisher stops publishing in performance tool #27

Closed
@gerhard

Description

@gerhard

We - @acogoluegnes @Gsantomaggio - started looking at why stream-max-length.publish.java-6c7cbd7d98-dv42n Java Publisher stopped publishing since:

2021-09-02 23:08:23 | 135, published 1877 msg/s, confirmed 1805 msg/s, consumed 0 msg/s, latency min/median/75th/95th/99th 0/0/0/0/0 µs, chunk size 0

We ssh-ed into the pod, downloaded https://github.com/adoptium/temurin11-binaries/releases/download/jdk-11.0.12%2B7/OpenJDK11U-jdk_x64_linux_hotspot_11.0.12_7.tar.gz (requires curl or wget) and ran the following commands:

cd jdk-11.0.12+7/bin

./jps
1 stream-perf-test.jar
3305 Jps

./jstack 1
2021-09-06 14:49:27
Full thread dump OpenJDK 64-Bit Server VM (11.0.12+7-LTS mixed mode):

Threads class SMR info:
_java_thread_list=0x00007fb98c001f00, length=20, elements={
0x00007fb9b0027000, 0x00007fb9b010b000, 0x00007fb9b010f000, 0x00007fb9b0124000,
0x00007fb9b0126000, 0x00007fb9b0128000, 0x00007fb9b012a800, 0x00007fb9b012c800,
0x00007fb9b01af000, 0x00007fb9b0438800, 0x00007fb9b095f000, 0x00007fb960001800,
0x00007fb9b087d800, 0x00007fb9b03f0000, 0x00007fb95c001000, 0x00007fb9b03f2800,
0x00007fb9b03f4000, 0x00007fb9b0970800, 0x00007fb96000e800, 0x00007fb98c001000
}

"main" #1 prio=5 os_prio=0 cpu=1721.24ms elapsed=319407.51s tid=0x00007fb9b0027000 nid=0x7 waiting on condition  [0x00007fb9b688e000]
   java.lang.Thread.State: WAITING (parking)
	at jdk.internal.misc.Unsafe.park(java.base@11.0.12/Native Method)
	- parking to wait for  <0x00000000fae9aae8> (a java.util.concurrent.CountDownLatch$Sync)
	at java.util.concurrent.locks.LockSupport.park(java.base@11.0.12/LockSupport.java:194)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(java.base@11.0.12/AbstractQueuedSynchronizer.java:885)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(java.base@11.0.12/AbstractQueuedSynchronizer.java:1039)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(java.base@11.0.12/AbstractQueuedSynchronizer.java:1345)
	at java.util.concurrent.CountDownLatch.await(java.base@11.0.12/CountDownLatch.java:232)
	at com.rabbitmq.stream.perf.StreamPerfTest.call(StreamPerfTest.java:743)
	at com.rabbitmq.stream.perf.StreamPerfTest.call(StreamPerfTest.java:84)
	at picocli.CommandLine.executeUserObject(CommandLine.java:1953)
	at picocli.CommandLine.executeHelpRequest(CommandLine.java:1929)
	at picocli.CommandLine.executeHelpRequest(CommandLine.java:1909)
	at picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2176)
	at picocli.CommandLine.execute(CommandLine.java:2078)
	at com.rabbitmq.stream.perf.StreamPerfTest.run(StreamPerfTest.java:353)
	at com.rabbitmq.stream.perf.StreamPerfTest.main(StreamPerfTest.java:344)
	at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(java.base@11.0.12/Native Method)
	at jdk.internal.reflect.NativeMethodAccessorImpl.invoke(java.base@11.0.12/NativeMethodAccessorImpl.java:62)
	at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(java.base@11.0.12/DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(java.base@11.0.12/Method.java:566)
	at org.springframework.boot.loader.MainMethodRunner.run(MainMethodRunner.java:49)
	at org.springframework.boot.loader.Launcher.launch(Launcher.java:108)
	at org.springframework.boot.loader.Launcher.launch(Launcher.java:58)
	at org.springframework.boot.loader.JarLauncher.main(JarLauncher.java:88)

"Reference Handler" #2 daemon prio=10 os_prio=0 cpu=5.42ms elapsed=319407.48s tid=0x00007fb9b010b000 nid=0x9 waiting on condition  [0x00007fb9b4104000]
   java.lang.Thread.State: RUNNABLE
	at java.lang.ref.Reference.waitForReferencePendingList(java.base@11.0.12/Native Method)
	at java.lang.ref.Reference.processPendingReferences(java.base@11.0.12/Reference.java:241)
	at java.lang.ref.Reference$ReferenceHandler.run(java.base@11.0.12/Reference.java:213)

"Finalizer" #3 daemon prio=8 os_prio=0 cpu=0.68ms elapsed=319407.48s tid=0x00007fb9b010f000 nid=0xa in Object.wait()  [0x00007fb9a0ffe000]
   java.lang.Thread.State: WAITING (on object monitor)
	at java.lang.Object.wait(java.base@11.0.12/Native Method)
	- waiting on <0x00000000fab3bd68> (a java.lang.ref.ReferenceQueue$Lock)
	at java.lang.ref.ReferenceQueue.remove(java.base@11.0.12/ReferenceQueue.java:155)
	- waiting to re-lock in wait() <0x00000000fab3bd68> (a java.lang.ref.ReferenceQueue$Lock)
	at java.lang.ref.ReferenceQueue.remove(java.base@11.0.12/ReferenceQueue.java:176)
	at java.lang.ref.Finalizer$FinalizerThread.run(java.base@11.0.12/Finalizer.java:170)

"Signal Dispatcher" #4 daemon prio=9 os_prio=0 cpu=0.83ms elapsed=319407.43s tid=0x00007fb9b0124000 nid=0xb runnable  [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"Service Thread" #5 daemon prio=9 os_prio=0 cpu=0.07ms elapsed=319407.43s tid=0x00007fb9b0126000 nid=0xc runnable  [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"C2 CompilerThread0" #6 daemon prio=9 os_prio=0 cpu=13131.35ms elapsed=319407.43s tid=0x00007fb9b0128000 nid=0xd waiting on condition  [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE
   No compile task

"C1 CompilerThread0" #7 daemon prio=9 os_prio=0 cpu=3724.80ms elapsed=319407.43s tid=0x00007fb9b012a800 nid=0xe waiting on condition  [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE
   No compile task

"Sweeper thread" #8 daemon prio=9 os_prio=0 cpu=0.22ms elapsed=319407.43s tid=0x00007fb9b012c800 nid=0xf runnable  [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"Common-Cleaner" #9 daemon prio=8 os_prio=0 cpu=382.06ms elapsed=319407.24s tid=0x00007fb9b01af000 nid=0x11 in Object.wait()  [0x00007fb9a0610000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
	at java.lang.Object.wait(java.base@11.0.12/Native Method)
	- waiting on <no object reference available>
	at java.lang.ref.ReferenceQueue.remove(java.base@11.0.12/ReferenceQueue.java:155)
	- waiting to re-lock in wait() <0x00000000fab3c558> (a java.lang.ref.ReferenceQueue$Lock)
	at jdk.internal.ref.CleanerImpl.run(java.base@11.0.12/CleanerImpl.java:148)
	at java.lang.Thread.run(java.base@11.0.12/Thread.java:829)
	at jdk.internal.misc.InnocuousThread.run(java.base@11.0.12/InnocuousThread.java:134)

"stream-perf-test-env-0" #13 prio=5 os_prio=0 cpu=168292.99ms elapsed=319399.64s tid=0x00007fb9b0438800 nid=0x14 waiting on condition  [0x00007fb979d46000]
   java.lang.Thread.State: WAITING (parking)
	at jdk.internal.misc.Unsafe.park(java.base@11.0.12/Native Method)
	- parking to wait for  <0x00000000fad11488> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
	at java.util.concurrent.locks.LockSupport.park(java.base@11.0.12/LockSupport.java:194)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(java.base@11.0.12/AbstractQueuedSynchronizer.java:2081)
	at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(java.base@11.0.12/ScheduledThreadPoolExecutor.java:1177)
	at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(java.base@11.0.12/ScheduledThreadPoolExecutor.java:899)
	at java.util.concurrent.ThreadPoolExecutor.getTask(java.base@11.0.12/ThreadPoolExecutor.java:1054)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.12/ThreadPoolExecutor.java:1114)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.12/ThreadPoolExecutor.java:628)
	at java.lang.Thread.run(java.base@11.0.12/Thread.java:829)

"nioEventLoopGroup-2-1" #15 prio=10 os_prio=0 cpu=4500.71ms elapsed=319399.32s tid=0x00007fb9b095f000 nid=0x15 runnable  [0x00007fb979c45000]
   java.lang.Thread.State: RUNNABLE
	at sun.nio.ch.EPoll.wait(java.base@11.0.12/Native Method)
	at sun.nio.ch.EPollSelectorImpl.doSelect(java.base@11.0.12/EPollSelectorImpl.java:120)
	at sun.nio.ch.SelectorImpl.lockAndDoSelect(java.base@11.0.12/SelectorImpl.java:124)
	- locked <0x00000000fad55fa8> (a io.netty.channel.nio.SelectedSelectionKeySet)
	- locked <0x00000000fad55f50> (a sun.nio.ch.EPollSelectorImpl)
	at sun.nio.ch.SelectorImpl.select(java.base@11.0.12/SelectorImpl.java:136)
	at io.netty.channel.nio.SelectedSelectionKeySetSelector.select(SelectedSelectionKeySetSelector.java:62)
	at io.netty.channel.nio.NioEventLoop.select(NioEventLoop.java:814)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:457)
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986)
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.lang.Thread.run(java.base@11.0.12/Thread.java:829)

"pool-3-thread-1" #16 prio=5 os_prio=0 cpu=535.18ms elapsed=319399.23s tid=0x00007fb960001800 nid=0x16 waiting on condition  [0x00007fb979b44000]
   java.lang.Thread.State: WAITING (parking)
	at jdk.internal.misc.Unsafe.park(java.base@11.0.12/Native Method)
	- parking to wait for  <0x00000000fae9aef0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
	at java.util.concurrent.locks.LockSupport.park(java.base@11.0.12/LockSupport.java:194)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(java.base@11.0.12/AbstractQueuedSynchronizer.java:2081)
	at java.util.concurrent.LinkedBlockingQueue.take(java.base@11.0.12/LinkedBlockingQueue.java:433)
	at java.util.concurrent.ThreadPoolExecutor.getTask(java.base@11.0.12/ThreadPoolExecutor.java:1054)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.12/ThreadPoolExecutor.java:1114)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.12/ThreadPoolExecutor.java:628)
	at java.lang.Thread.run(java.base@11.0.12/Thread.java:829)

"stream-perf-test-env-1" #17 prio=5 os_prio=0 cpu=169655.49ms elapsed=319399.14s tid=0x00007fb9b087d800 nid=0x17 waiting on condition  [0x00007fb979a43000]
   java.lang.Thread.State: TIMED_WAITING (parking)
	at jdk.internal.misc.Unsafe.park(java.base@11.0.12/Native Method)
	- parking to wait for  <0x00000000fad11488> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
	at java.util.concurrent.locks.LockSupport.parkNanos(java.base@11.0.12/LockSupport.java:234)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(java.base@11.0.12/AbstractQueuedSynchronizer.java:2123)
	at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(java.base@11.0.12/ScheduledThreadPoolExecutor.java:1182)
	at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(java.base@11.0.12/ScheduledThreadPoolExecutor.java:899)
	at java.util.concurrent.ThreadPoolExecutor.getTask(java.base@11.0.12/ThreadPoolExecutor.java:1054)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.12/ThreadPoolExecutor.java:1114)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.12/ThreadPoolExecutor.java:628)
	at java.lang.Thread.run(java.base@11.0.12/Thread.java:829)

"nioEventLoopGroup-2-2" #18 prio=10 os_prio=0 cpu=3078.23ms elapsed=319399.13s tid=0x00007fb9b03f0000 nid=0x18 runnable  [0x00007fb979942000]
   java.lang.Thread.State: RUNNABLE
	at sun.nio.ch.EPoll.wait(java.base@11.0.12/Native Method)
	at sun.nio.ch.EPollSelectorImpl.doSelect(java.base@11.0.12/EPollSelectorImpl.java:120)
	at sun.nio.ch.SelectorImpl.lockAndDoSelect(java.base@11.0.12/SelectorImpl.java:124)
	- locked <0x00000000fad56540> (a io.netty.channel.nio.SelectedSelectionKeySet)
	- locked <0x00000000fad564e8> (a sun.nio.ch.EPollSelectorImpl)
	at sun.nio.ch.SelectorImpl.select(java.base@11.0.12/SelectorImpl.java:136)
	at io.netty.channel.nio.SelectedSelectionKeySetSelector.select(SelectedSelectionKeySetSelector.java:62)
	at io.netty.channel.nio.NioEventLoop.select(NioEventLoop.java:814)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:457)
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986)
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.lang.Thread.run(java.base@11.0.12/Thread.java:829)

"pool-4-thread-1" #19 prio=5 os_prio=0 cpu=612.73ms elapsed=319399.12s tid=0x00007fb95c001000 nid=0x19 waiting on condition  [0x00007fb979841000]
   java.lang.Thread.State: WAITING (parking)
	at jdk.internal.misc.Unsafe.park(java.base@11.0.12/Native Method)
	- parking to wait for  <0x00000000fae9b490> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
	at java.util.concurrent.locks.LockSupport.park(java.base@11.0.12/LockSupport.java:194)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(java.base@11.0.12/AbstractQueuedSynchronizer.java:2081)
	at java.util.concurrent.LinkedBlockingQueue.take(java.base@11.0.12/LinkedBlockingQueue.java:433)
	at java.util.concurrent.ThreadPoolExecutor.getTask(java.base@11.0.12/ThreadPoolExecutor.java:1054)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.12/ThreadPoolExecutor.java:1114)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.12/ThreadPoolExecutor.java:628)
	at java.lang.Thread.run(java.base@11.0.12/Thread.java:829)

"pool-5-thread-1" #20 prio=5 os_prio=0 cpu=3517.46ms elapsed=319399.01s tid=0x00007fb9b03f2800 nid=0x1a waiting on condition  [0x00007fb979740000]
   java.lang.Thread.State: WAITING (parking)
	at jdk.internal.misc.Unsafe.park(java.base@11.0.12/Native Method)
	- parking to wait for  <0x00000000faead570> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
	at java.util.concurrent.locks.LockSupport.park(java.base@11.0.12/LockSupport.java:194)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(java.base@11.0.12/AbstractQueuedSynchronizer.java:2081)
	at java.util.concurrent.LinkedBlockingQueue.take(java.base@11.0.12/LinkedBlockingQueue.java:433)
	at java.util.concurrent.ThreadPoolExecutor.getTask(java.base@11.0.12/ThreadPoolExecutor.java:1054)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.12/ThreadPoolExecutor.java:1114)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.12/ThreadPoolExecutor.java:628)
	at java.lang.Thread.run(java.base@11.0.12/Thread.java:829)

"pool-5-thread-2" #21 prio=5 os_prio=0 cpu=3510.34ms elapsed=319399.01s tid=0x00007fb9b03f4000 nid=0x1b waiting on condition  [0x00007fb97963f000]
   java.lang.Thread.State: WAITING (parking)
	at jdk.internal.misc.Unsafe.park(java.base@11.0.12/Native Method)
	- parking to wait for  <0x00000000faead570> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
	at java.util.concurrent.locks.LockSupport.park(java.base@11.0.12/LockSupport.java:194)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(java.base@11.0.12/AbstractQueuedSynchronizer.java:2081)
	at java.util.concurrent.LinkedBlockingQueue.take(java.base@11.0.12/LinkedBlockingQueue.java:433)
	at java.util.concurrent.ThreadPoolExecutor.getTask(java.base@11.0.12/ThreadPoolExecutor.java:1054)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.12/ThreadPoolExecutor.java:1114)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.12/ThreadPoolExecutor.java:628)
	at java.lang.Thread.run(java.base@11.0.12/Thread.java:829)

"pool-6-thread-1" #22 prio=5 os_prio=0 cpu=19099.76ms elapsed=319398.92s tid=0x00007fb9b0970800 nid=0x1c waiting on condition  [0x00007fb97953e000]
   java.lang.Thread.State: TIMED_WAITING (parking)
	at jdk.internal.misc.Unsafe.park(java.base@11.0.12/Native Method)
	- parking to wait for  <0x00000000fae9bb38> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
	at java.util.concurrent.locks.LockSupport.parkNanos(java.base@11.0.12/LockSupport.java:234)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(java.base@11.0.12/AbstractQueuedSynchronizer.java:2123)
	at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(java.base@11.0.12/ScheduledThreadPoolExecutor.java:1182)
	at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(java.base@11.0.12/ScheduledThreadPoolExecutor.java:899)
	at java.util.concurrent.ThreadPoolExecutor.getTask(java.base@11.0.12/ThreadPoolExecutor.java:1054)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.12/ThreadPoolExecutor.java:1114)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.12/ThreadPoolExecutor.java:628)
	at java.lang.Thread.run(java.base@11.0.12/Thread.java:829)

"pool-7-thread-1" #26 prio=5 os_prio=0 cpu=316.80ms elapsed=318800.33s tid=0x00007fb96000e800 nid=0x1f waiting on condition  [0x00007fb97813b000]
   java.lang.Thread.State: WAITING (parking)
	at jdk.internal.misc.Unsafe.park(java.base@11.0.12/Native Method)
	- parking to wait for  <0x00000000fafd1a50> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
	at java.util.concurrent.locks.LockSupport.park(java.base@11.0.12/LockSupport.java:194)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(java.base@11.0.12/AbstractQueuedSynchronizer.java:2081)
	at java.util.concurrent.LinkedBlockingQueue.take(java.base@11.0.12/LinkedBlockingQueue.java:433)
	at java.util.concurrent.ThreadPoolExecutor.getTask(java.base@11.0.12/ThreadPoolExecutor.java:1054)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.12/ThreadPoolExecutor.java:1114)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.12/ThreadPoolExecutor.java:628)
	at java.lang.Thread.run(java.base@11.0.12/Thread.java:829)

"Attach Listener" #27 daemon prio=9 os_prio=0 cpu=0.39ms elapsed=0.10s tid=0x00007fb98c001000 nid=0xce8 waiting on condition  [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"VM Thread" os_prio=0 cpu=13002.25ms elapsed=319407.49s tid=0x00007fb9b0102800 nid=0x8 runnable

"VM Periodic Task Thread" os_prio=0 cpu=249760.35ms elapsed=319407.34s tid=0x00007fb9b0171000 nid=0x10 waiting on condition

JNI global refs: 17, weak refs: 0

We would first like to understand if the publishing thread crashes (we can't find it in the thread list above), and how does it crash (we are currently discarding the exception).

Originally posted by @gerhard in https://github.com/rabbitmq/opportunities/issues/99 (private repository)

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions