-
Notifications
You must be signed in to change notification settings - Fork 249
Fix controller-manager failures causing CNS to remain incorrectly healthy #3688
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Co-authored-by: timraymond <288061+timraymond@users.noreply.github.com>
Co-authored-by: timraymond <288061+timraymond@users.noreply.github.com>
@copilot Please make only the minimum changes necessary to guard |
Co-authored-by: timraymond <288061+timraymond@users.noreply.github.com>
Made minimal changes as requested in commit c9b5c00. Added 5-minute timeout context to manager.Start() and process terminates with os.Exit(1) if it fails or times out. Removed the retry loop to keep changes surgical. |
@timraymond Unfortunately I hit an unexpected error while processing your comment. I've automatically reported this to GitHub. You can ask me to try again later by mentioning me in a new comment. If you want to contact GitHub about this error, please mention the following identifier so they can better serve you: Sorry for the inconvenience! |
…remove synchost.go Co-authored-by: timraymond <288061+timraymond@users.noreply.github.com>
…remove unnecessary metric Co-authored-by: timraymond <288061+timraymond@users.noreply.github.com>
Co-authored-by: timraymond <288061+timraymond@users.noreply.github.com>
…ested Co-authored-by: timraymond <288061+timraymond@users.noreply.github.com>
Co-authored-by: timraymond <288061+timraymond@users.noreply.github.com>
Co-authored-by: timraymond <288061+timraymond@users.noreply.github.com>
@copilot fix lints cns/service/main.go:1619: File is not `gci`-ed with --skip-generated -s standard -s default (gci)
cns/service/main.go:1629: File is not `gci`-ed with --skip-generated -s standard -s default (gci)
cns/service/main.go:1642: File is not `gci`-ed with --skip-generated -s standard -s default (gci) |
Co-authored-by: rbtr <2940321+rbtr@users.noreply.github.com>
Problem
Currently, CNS will repeatedly try to start controller-manager even if it encounters terminal errors, keeping CNS in an incorrectly healthy state. This affects both controller-manager in CRD mode and multiTenantController in MultiTenantCRD mode.
Solution
This PR adds timeouts for controller startup and proper error handling to ensure CNS fails appropriately when controllers can't be started. Key changes:
cns_ctrlmanager_start_timeouts_total
cns_multitenantcontroller_start_timeouts_total
os.Exit
directly from goroutinesBefore
After
These changes ensure CNS will not remain in an incorrectly healthy state when controllers fail to initialize properly.
Fixes #3687.
💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.