Skip to content

Failures in controller-manager leave CNS in an incorrectly healthy state #3687

Open
@timraymond

Description

@timraymond

Currently, CNS will repeatedly try to start controller-manager, even though all errors from it effectively leave controller-manager in a terminal state. This operation should have a timeout that will cause CNS to terminate.

In CNS versions after v1.6.23, this defect is ameliorated by the health check improvements added in #3269. In some cases, this will cause CNS to be considered unhealthy by the API server since it will be unable to perform list requests for NNCs. Since controller-manager performs the same requests to the API server, this specific case could be mitigated by the health check. However, because controller-manager is left in a terminal state, it does not account for scenarios with transient apiserver unavailability.

We should avoid retries of controller-manager.Start entirely, set a reasonable timeout for the context passed in to controller-manager.Start, and ensure that the application is terminated when controller-manager.Start produces an error.

Metadata

Metadata

Assignees

Labels

staleStale due to inactivity.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions