Description
This is not a bug report, this is a discussion about performance optimization
What did you do?
I am a user of Flink Kubernetes Operator, I found that there is a potential performance issues with huge amount resources.
The detail:
flink kubernetes operator use updateControl to update custom resource's state with 1 minute interval, when custom resource amount reaches 10000, the worker threads and task queue in the reconcile thread pool will be filled up by these periodic tasks of UpdateControl (from TimerEventSource).
At this time, when we create a new CR or delete a CR , operator can not handle this event timely. maybe after several tens of seconds, operator handle this event.
What did you expect to see?
TimerEventSource And InformerEventSource can be independent of each other so that operator can handle CR's create、 update、delete event
What did you see instead? Under which circumstances?
create cr need long time to be reconciled when lots of timer task with UpdateControl
Environment
Kubernetes cluster type:
$ Mention java-operator-sdk version from pom.xml file
4.4.4
$ java -version
openjdk11
Possible Solution
TimerEventSource and InformerEventSource use separate thread pools respectively, and the thread pool size can be configured separately.