Skip to content

Merge operator history into master #1

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 830 commits into from
Closed

Conversation

Julien-Ben
Copy link
Collaborator

No description provided.

MaciejKaras and others added 30 commits October 30, 2024 16:29
…tion (#3884)

# Summary

CLOUDP-278184 - e2e_sharded_cluster_pv_resize test multi-cluster
adoption

## Documentation changes

* [ ] Add an entry to [release notes](.../RELEASE_NOTES.md).
* [ ] When needed, make sure you create a new [DOCSP
ticket](https://jira.mongodb.org/projects/DOCSP) that documents your
change.

## Changes to CRDs

* [ ] Add `slaskawi`(Sebastian) and `@giohan` (George) as reviewers.
* [ ] Make sure any changes are reflected on `/public/samples`
directory.
# Summary

Our agent images were failing because:
1. we were copying the licenses in the wrong directory, it should have
been /licenses
2. the version label was empty, due to the inventory files not passing
the `version` buildarg

Example of preflight that passes:
* matrix agent:
https://spruce.mongodb.com/task/ops_manager_kubernetes_preflight_release_images_preflight_mongodb_agent_image_patch_9dceeba0fdfc11c9155e4baa0166ca5860210eb4_6721067dcdb68800074feafc_24_10_29_15_59_58/logs?execution=0
* non-matrix agent:
https://spruce.mongodb.com/task/ops_manager_kubernetes_preflight_release_images_preflight_mongodb_agent_image_patch_9dceeba0fdfc11c9155e4baa0166ca5860210eb4_67210bc82ed9c60007add4c8_24_10_29_16_22_33/logs?execution=0&sortBy=STATUS&sortDir=ASC

For now, the preflight for agent images will pass until we decide how to
treat the old agent images.

https://spruce.mongodb.com/task/ops_manager_kubernetes_preflight_release_images_preflight_mongodb_agent_image_patch_9dceeba0fdfc11c9155e4baa0166ca5860210eb4_6721124b5f03380007caa140_24_10_29_16_50_20/logs?execution=0

## Documentation changes

* [ ] Add an entry to [release notes](.../RELEASE_NOTES.md).
* [ ] When needed, make sure you create a new [DOCSP
ticket](https://jira.mongodb.org/projects/DOCSP) that documents your
change.

## Changes to CRDs

* [ ] Add `slaskawi`(Sebastian) and `@giohan` (George) as reviewers.
* [ ] Make sure any changes are reflected on `/public/samples`
directory.
# Summary

This brings down the `periodic_agent_build` task's runtime to just over
half an hour from more than six. We also drop the 1.25 agent image combo
because we no longer support it after releasing 1.28.

## Documentation changes

* [ ] Add an entry to [release notes](.../RELEASE_NOTES.md).
* [ ] When needed, make sure you create a new [DOCSP
ticket](https://jira.mongodb.org/projects/DOCSP) that documents your
change.

## Changes to CRDs

* [ ] Add `slaskawi`(Sebastian) and `@giohan` (George) as reviewers.
* [ ] Make sure any changes are reflected on `/public/samples`
directory.
…er unit test (#3899)

# Summary

`createClusterSpecList` function was creating `ClusterSpecList` with
non-deterministic order, because it was iterating over map entries.

## Documentation changes

* [ ] Add an entry to [release notes](.../RELEASE_NOTES.md).
* [ ] When needed, make sure you create a new [DOCSP
ticket](https://jira.mongodb.org/projects/DOCSP) that documents your
change.

## Changes to CRDs

* [ ] Add `slaskawi`(Sebastian) and `@giohan` (George) as reviewers.
* [ ] Make sure any changes are reflected on `/public/samples`
directory.
_Opened by Private Cloud Tools (PCT)_.

# Ticket
[CLOUDP-281977](https://jira.mongodb.org/browse/CLOUDP-281977)

# Description
Bump Ops Manager container image version to 8.0.1.

# Reviewer Checklist

Before merging this PR, verify the following:
- [ ] the following tasks are passing in Evergreen:
  - `publish_ops_manager` task (variant: `publish_om80_images`)
- [ ] the `agent_version` was updated correctly
- [ ] the `tools_version` was updated correctly

---------

Co-authored-by: Mircea Cosbuc <mircea.cosbuc@mongodb.com>
# Summary

This pull request improves the way we do certificate rotation by keeping
the previous certificate available in the pods. This fixes the deadlock
that can appear between the agents and the automation config (the agents
are not ready because they don't have the latest certificate from the
automation config, and the automation config is not updated because the
agents are not ready).

Changes:
* Main change in in `CreatePemSecretClient`. This method now has a flag
which enables or disables the functionality of "safeRotation". This flag
exists because certificates for the agent don't benefit from this since
they are not stored under a certificate hash, but a hardcoded key.
* The "safeRotation" functionality resides in
`updateSecretDataWithLatestHash`. This method compares the new
certificate received with the data already present in the `-pem` secret.
Using a new key `latestHash` we can determine which certificates to put
in the final `-pem` secret. Also added a `previousHash` key in the
secret to use for vault.
* Created e2e tests which simulate this deadlock by triggering a
statefulset restart. Tests have been created for every component:
replicasets, sharded clusters (mongos, configsrv, shard), ops manager,
appdb.
* Updated some e2e test that were verifying certificate rotation. These
tests were instantly passing because the mdb resource stays in a
`Running` phase for a while after a rotation happens. Added a
`assert_abandons_phase` to make sure that the rotation happens
successfully.
* Removed leftover code and fixed tests from the time when tls
certificates were allowed to be generic. TLS certificate have to be of
the `kubernetes.io/tls` type (have been for a while), but there were
some leftovers and some unit tests.
* Updated hashicorp vault annotations to not mount all the values in a
secret, but only the key we require.
* Added new hashicorp vault annotations to mount the previous
certificate when using this as the secret backend.

## Documentation changes

* [x] Add an entry to [release notes](.../RELEASE_NOTES.md).
* [ ] When needed, make sure you create a new [DOCSP
ticket](https://jira.mongodb.org/projects/DOCSP) that documents your
change.

## Changes to CRDs

* [ ] Add `slaskawi`(Sebastian) and `@giohan` (George) as reviewers.
* [ ] Make sure any changes are reflected on `/public/samples`
directory.
# Summary

We decided not to support `ExternalAccessConfiguration` in shard
overrides.
This PR removes the field, and adds it in the unit tests.
# Summary

*Enter your issue summary here.*

## Documentation changes

* [ ] Add an entry to [release notes](.../RELEASE_NOTES.md).
* [ ] When needed, make sure you create a new [DOCSP
ticket](https://jira.mongodb.org/projects/DOCSP) that documents your
change.

## Changes to CRDs

* [ ] Add `slaskawi`(Sebastian) and `@giohan` (George) as reviewers.
* [ ] Make sure any changes are reflected on `/public/samples`
directory.
…3903)

# Summary

Before the fix `status.Merge(other Status)` was not properly merging
statuses. Method accepted both pointer and value:
```go
status := pendingStatus{}
_ = status.Merge(&pendingStatus{})
_ = status.Merge(pendingStatus{})
```

Because of this when pointer was passed as argument (and that was the
case 99% of the time) we never entered switch branch and went straight
to last return statement:
```go
func (p pendingStatus) Merge(other Status) Status {
	switch v := other.(type) {
	// Pending messages are just merged together
	case pendingStatus:
		return mergedPending(p, v)
	case failedStatus:
		return v
	}
	return p // <-- this was always executed
}
```

The root cause was that `Status` interface was implemented by value
receiver methods and not pointer receiver methods, which allowed to pass
both pointer and value as Status. After the change it only accepts
pointers:

![Screenshot 2024-11-05 at 10 32
40](https://github.com/user-attachments/assets/ebcd6710-ca97-4aa5-bdb7-a80ed5de1dec)

# Proof of Work

Passing CI tests. I had to fix one place, because `status.Merge` started
to work properly ->
https://github.com/10gen/ops-manager-kubernetes/pull/3903/files#r1829028015

## Documentation changes

* [ ] Add an entry to [release notes](.../RELEASE_NOTES.md).
* [ ] When needed, make sure you create a new [DOCSP
ticket](https://jira.mongodb.org/projects/DOCSP) that documents your
change.

## Changes to CRDs

* [ ] Add `slaskawi`(Sebastian) and `@giohan` (George) as reviewers.
* [ ] Make sure any changes are reflected on `/public/samples`
directory.
…n (#3893)

# Summary

CLOUDP-278184 - e2e_sharded_cluster_secret test multi-cluster adoption

`e2e_sharded_cluster_secret` looks like a subset of
`e2e_sharded_cluster_recovery` ->

https://github.com/10gen/ops-manager-kubernetes/blob/c1e4e11773f4b47b69ec38a2ee013b92a3ab5a14/docker/mongodb-enterprise-tests/tests/shardedcluster/sharded_cluster_secret.py#L22-L39

https://github.com/10gen/ops-manager-kubernetes/blob/c1e4e11773f4b47b69ec38a2ee013b92a3ab5a14/docker/mongodb-enterprise-tests/tests/shardedcluster/sharded_cluster_recovery.py#L22-L44

## Documentation changes

* [ ] Add an entry to [release notes](.../RELEASE_NOTES.md).
* [ ] When needed, make sure you create a new [DOCSP
ticket](https://jira.mongodb.org/projects/DOCSP) that documents your
change.

## Changes to CRDs

* [ ] Add `slaskawi`(Sebastian) and `@giohan` (George) as reviewers.
* [ ] Make sure any changes are reflected on `/public/samples`
directory.
…ster adoption (#3901)

# Summary

CLOUDP-278184 - e2e_sharded_cluster_statefulset_status test
multi-cluster adoption

Additional fixes:
- merged statuses for statefulsets in controller so that we will have
multiple resources in `status.resourcesNotReady` array

## Documentation changes

* [ ] Add an entry to [release notes](.../RELEASE_NOTES.md).
* [ ] When needed, make sure you create a new [DOCSP
ticket](https://jira.mongodb.org/projects/DOCSP) that documents your
change.

## Changes to CRDs

* [ ] Add `slaskawi`(Sebastian) and `@giohan` (George) as reviewers.
* [ ] Make sure any changes are reflected on `/public/samples`
directory.
# Summary

*Enter your issue summary here.*

## Documentation changes

* [ ] Add an entry to [release notes](.../RELEASE_NOTES.md).
* [ ] When needed, make sure you create a new [DOCSP
ticket](https://jira.mongodb.org/projects/DOCSP) that documents your
change.

## Changes to CRDs

* [ ] Add `slaskawi`(Sebastian) and `@giohan` (George) as reviewers.
* [ ] Make sure any changes are reflected on `/public/samples`
directory.
## Kubernetes Enterprise Operator Release 1.29.0

This release patch has been created and it should be reviewed now.

You can add new commits to this patch if needed. After changes have been
reviewed, merge this PR for PCT to continue the release process.

### Next steps

1. Find the SHA of the merge commit; resulting of merging this patch. 2.
Execute `/pct k8s set-release-sha <merge-commit-sha>` 3. Execute `/pct
k8s ok-to-publish CLOUDP-265121`

---------

Co-authored-by: Yavor Georgiev <yavor.georgiev@mongodb.com>
# Summary

There's no point in preflighting older images that were already
published on Red Hat as we can't replace them anyway. This also fixes
the version build arg to the agent image in the daily build which was
causing it to fail preflight.

## Documentation changes

* [ ] Add an entry to [release notes](.../RELEASE_NOTES.md).
* [ ] When needed, make sure you create a new [DOCSP
ticket](https://jira.mongodb.org/projects/DOCSP) that documents your
change.

## Changes to CRDs

* [ ] Add `slaskawi`(Sebastian) and `@giohan` (George) as reviewers.
* [ ] Make sure any changes are reflected on `/public/samples`
directory.
…ter adoption (#3905)

# Summary

CLOUDP-278184 - e2e_sharded_cluster_upgrade_downgrade test multi-cluster
adoption

## Documentation changes

* [ ] Add an entry to [release notes](.../RELEASE_NOTES.md).
* [ ] When needed, make sure you create a new [DOCSP
ticket](https://jira.mongodb.org/projects/DOCSP) that documents your
change.

## Changes to CRDs

* [ ] Add `slaskawi`(Sebastian) and `@giohan` (George) as reviewers.
* [ ] Make sure any changes are reflected on `/public/samples`
directory.
…doption (#3892)

# Summary

CLOUDP-278184 - e2e_sharded_cluster_scale_shards test multi-cluster
adoption

Additional changes:

- This test is a superset of `e2e_sharded_cluster_scale_down_shards`
with the only difference being initial number of mongods per shard,
mongos and config_srv. Removed it and closed previous PR
10gen/ops-manager-kubernetes#3890
- Allowed to specify ClusterSpecItem with 0 members. This is needed for
scaling down clusters and was a bug. Additional e2e tests for that will
be provided in https://jira.mongodb.org/browse/CLOUDP-279047

## Documentation changes

* [ ] Add an entry to [release notes](.../RELEASE_NOTES.md).
* [ ] When needed, make sure you create a new [DOCSP
ticket](https://jira.mongodb.org/projects/DOCSP) that documents your
change.

## Changes to CRDs

* [ ] Add `slaskawi`(Sebastian) and `@giohan` (George) as reviewers.
* [ ] Make sure any changes are reflected on `/public/samples`
directory.

---------

Co-authored-by: Lucian Tosa <49226451+lucian-tosa@users.noreply.github.com>
# Summary

The Pyxis API doesn't accept submissions from preflight 1.10.0 anymore
so the periodic build preflight wasn't actually uploading anything. See
https://spruce.mongodb.com/task/ops_manager_kubernetes_preflight_images_preflight_images_patch_e20227cbf2fc05f07df83110d1d4f8047207884c_672cde61088a9e0007987f51_24_11_07_15_36_10/logs
for the exact error output.

## Documentation changes

* [ ] Add an entry to [release notes](.../RELEASE_NOTES.md).
* [ ] When needed, make sure you create a new [DOCSP
ticket](https://jira.mongodb.org/projects/DOCSP) that documents your
change.

## Changes to CRDs

* [ ] Add `slaskawi`(Sebastian) and `@giohan` (George) as reviewers.
* [ ] Make sure any changes are reflected on `/public/samples`
directory.
…ion (#3889)

# Summary

CLOUDP-278184 - e2e_sharded_cluster_recovery test multi-cluster adoption

## Documentation changes

* [ ] Add an entry to [release notes](.../RELEASE_NOTES.md).
* [ ] When needed, make sure you create a new [DOCSP
ticket](https://jira.mongodb.org/projects/DOCSP) that documents your
change.

## Changes to CRDs

* [ ] Add `slaskawi`(Sebastian) and `@giohan` (George) as reviewers.
* [ ] Make sure any changes are reflected on `/public/samples`
directory.
# Summary

Due to a bug introduced in #3883 Sonar was only actually building the
last image in the matrix, over and over. The fix is to always copy the
`args` dict that contains the requested version image before submitting
it to a concurrent Executor for pickling so that the next loop iteration
doesn't update it in place.

## Documentation changes

* [ ] Add an entry to [release notes](.../RELEASE_NOTES.md).
* [ ] When needed, make sure you create a new [DOCSP
ticket](https://jira.mongodb.org/projects/DOCSP) that documents your
change.

## Changes to CRDs

* [ ] Add `slaskawi`(Sebastian) and `@giohan` (George) as reviewers.
* [ ] Make sure any changes are reflected on `/public/samples`
directory.
…ter adoption (#3913)

# Summary

CLOUDP-278184 - om_ops_manager_backup_sharded_cluster test multi-cluster
adoption

## Documentation changes

* [ ] Add an entry to [release notes](.../RELEASE_NOTES.md).
* [ ] When needed, make sure you create a new [DOCSP
ticket](https://jira.mongodb.org/projects/DOCSP) that documents your
change.

## Changes to CRDs

* [ ] Add `slaskawi`(Sebastian) and `@giohan` (George) as reviewers.
* [ ] Make sure any changes are reflected on `/public/samples`
directory.
…895)

_Opened by Private Cloud Tools (PCT)_.

# Ticket
[CLOUDP-281900](https://jira.mongodb.org/browse/CLOUDP-281900)

# Description
Bump Ops Manager container image version to 7.0.12.

# Reviewer Checklist

Before merging this PR, verify the following:
- [ ] the following tasks are passing in Evergreen:
  - `publish_ops_manager` task (variant: `publish_om70_images`)
- [ ] the `agent_version` was updated correctly
- [ ] the `tools_version` was updated correctly

---------

Co-authored-by: Yavor Georgiev <yavor.georgiev@mongodb.com>
Co-authored-by: Mircea Cosbuc <mircea.cosbuc@mongodb.com>
Co-authored-by: Sebastian Łaskawiec <sebastian.laskawiec@mongodb.com>
# Summary

As raised in this conversation:
https://mongodb.slack.com/archives/CGLP6R2PQ/p1732112676842479

We're close to hitting the limit on unique span attributes as Honeycomb
indexes them and we generate new Org and Project IDs in every test run
where we make OM requests.

## Documentation changes

* [ ] Add an entry to [release notes](.../RELEASE_NOTES.md).
* [ ] When needed, make sure you create a new [DOCSP
ticket](https://jira.mongodb.org/projects/DOCSP) that documents your
change.

## Changes to CRDs

* [ ] Add `slaskawi`(Sebastian) and `@giohan` (George) as reviewers.
* [ ] Make sure any changes are reflected on `/public/samples`
directory.
# Summary

Works towards
[CLOUDP-286015](https://jira.mongodb.org/browse/CLOUDP-286015).
Temporarily pin older appdb images that don't have issues making many of
ours e2e tests fail.

## Documentation changes

* [ ] Add an entry to [release notes](.../RELEASE_NOTES.md).
* [ ] When needed, make sure you create a new [DOCSP
ticket](https://jira.mongodb.org/projects/DOCSP) that documents your
change.

## Changes to CRDs

* [ ] Add `slaskawi`(Sebastian) and `@giohan` (George) as reviewers.
* [ ] Make sure any changes are reflected on `/public/samples`
directory.

---------

Co-authored-by: Maciej Karaś <maciej.karas@mongodb.com>
# Summary

From what was told [by evergreen
team](https://mongodb.slack.com/archives/C0V896UV8/p1732115443309179?thread_ts=1732114761.691759&cid=C0V896UV8)
and specified in
[docs](https://docs.devprod.prod.corp.mongodb.com/evergreen/Project-Configuration/Project-Configuration-Files#task-groups),
setup and teardown tasks by default won't mark evergreen task as failed.
To make this happen we need to set `setup_group_can_fail_task`,
`setup_task_can_fail_task` or `teardown_task_can_fail_task` to `true`.

## Documentation changes

* [ ] Add an entry to [release notes](.../RELEASE_NOTES.md).
* [ ] When needed, make sure you create a new [DOCSP
ticket](https://jira.mongodb.org/projects/DOCSP) that documents your
change.

## Changes to CRDs

* [ ] Add `slaskawi`(Sebastian) and `@giohan` (George) as reviewers.
* [ ] Make sure any changes are reflected on `/public/samples`
directory.
# Summary

- **Add CGO_ENABLED=0 for multi-cluster-kube-config-creator**
- **Fix issue with empty message when multicluster tool fails**
- **Raise exception when multicluster tool fails**

## Documentation changes

* [ ] Add an entry to [release notes](.../RELEASE_NOTES.md).
* [ ] When needed, make sure you create a new [DOCSP
ticket](https://jira.mongodb.org/projects/DOCSP) that documents your
change.

## Changes to CRDs

* [ ] Add `slaskawi`(Sebastian) and `@giohan` (George) as reviewers.
* [ ] Make sure any changes are reflected on `/public/samples`
directory.
# Summary

Previously the number of
[max_hosts](https://docs.devprod.prod.corp.mongodb.com/evergreen/Project-Configuration/Project-Configuration-Files#task-groups)
was set to number matching the number of tasks in each task group. This
allowed to start simultaneously all tasks in task group immediately.
This had couple of downsides:

- you have to update `max_hosts` each time you update a list of tasks.
If you don't then not all tasks may start immediately
- some task groups like `e2e_operator_task_group` or
`e2e_static_operator_task_group` didn't specify `max_hosts` at all and
the tasks were run sequentially on a single host. This made our CI runs
unnecessary long.

Solution would be to set `max_hosts` to `-1` which will make evergreen
automatically adjust `max_hosts` to number of task. **For
`e2e_mdb_openshift_ubi_cloudqa_task_group` we still need to run on a
single host to prevent any interference between tests**

Additionally had to increase timeout for
`e2e_om_ops_manager_backup_tls_custom_ca` to `400 sec` due to flakiness.

## Documentation changes

* [ ] Add an entry to [release notes](.../RELEASE_NOTES.md).
* [ ] When needed, make sure you create a new [DOCSP
ticket](https://jira.mongodb.org/projects/DOCSP) that documents your
change.

## Changes to CRDs

* [ ] Add `slaskawi`(Sebastian) and `@giohan` (George) as reviewers.
* [ ] Make sure any changes are reflected on `/public/samples`
directory.
Reverts #3883 and just increases the timeout for the agent build.
…ng in AC (#3934)

# Summary

This PR introduces an E2E test for a simple disaster recovery scenario:
we lose one cluster without losing the majority.
We ensure that the operator correctly ignores the unhealthy cluster in
the subsequent reconciliation, and we can still scale.

While writing this test, I discovered a bug in the way we update the
automation configuration. We hadn't kept track of the _id fields, so
when we rescaled, we changed a host's unique _id, and the agent doesn't
support that.

---------

Co-authored-by: Łukasz Sierant <lukasz.sierant@mongodb.com>
MaciejKaras and others added 23 commits April 4, 2025 10:42
# Summary

From time to time the assertion failed when deleted the
MongoDBMultiCluster resource:
```
[2025/04/01 10:09:39.672]         tests/multicluster/multi_cluster_replica_set_deletion.py::test_deployment_has_been_removed_from_automation_configassert 5 == 0

[2025/04/01 10:09:39.672]  +  where 5 = len([{'args2_6': {'net': {'port': 27017, 'tls': {'mode': 'disabled'}}, 'replication': {'replSetName': 'multi-replica-set'}, 'storage': {'dbPath': '/data'}, 'systemLog': {'destination': 'file', 'path': '/var/log/mongodb-mms-automation/mongodb.log'}}, 'auditLogRotate': {'sizeThresholdMB': 1000.0, 'timeThresholdHrs': 24}, 'authSchemaVersion': 5, 'featureCompatibilityVersion': '6.0', ...}, {'args2_6': {'net': {'port': 27017, 'tls': {'mode': 'disabled'}}, 'replication': {'replSetName': 'multi-replica-set'}, 'storage': {'dbPath': '/data'}, 'systemLog': {'destination': 'file', 'path': '/var/log/mongodb-mms-automation/mongodb.log'}}, 'auditLogRotate': {'sizeThresholdMB': 1000.0, 'timeThresholdHrs': 24}, 'authSchemaVersion': 5, 'featureCompatibilityVersion': '6.0', ...}, {'args2_6': {'net': {'port': 27017, 'tls': {'mode': 'disabled'}}, 'replication': {'replSetName': 'multi-replica-set'}, 'storage': {'dbPath': '/data'}, 'systemLog': {'destination': 'file', 'path': '/var/log/mongodb-mms-automation/mongodb.log'}}, 'auditLogRotate': {'sizeThresholdMB': 1000.0, 'timeThresholdHrs': 24}, 'authSchemaVersion': 5, 'featureCompatibilityVersion': '6.0', ...}, {'args2_6': {'net': {'port': 27017, 'tls': {'mode': 'disabled'}}, 'replication': {'replSetName': 'multi-replica-set'}, 'storage': {'dbPath': '/data'}, 'systemLog': {'destination': 'file', 'path': '/var/log/mongodb-mms-automation/mongodb.log'}}, 'auditLogRotate': {'sizeThresholdMB': 1000.0, 'timeThresholdHrs': 24}, 'authSchemaVersion': 5, 'featureCompatibilityVersion': '6.0', ...}, {'args2_6': {'net': {'port': 27017, 'tls': {'mode': 'disabled'}}, 'replication': {'replSetName': 'multi-replica-set'}, 'storage': {'dbPath': '/data'}, 'systemLog': {'destination': 'file', 'path': '/var/log/mongodb-mms-automation/mongodb.log'}}, 'auditLogRotate': {'sizeThresholdMB': 1000.0, 'timeThresholdHrs': 24}, 'authSchemaVersion': 5, 'featureCompatibilityVersion': '6.0', ...}])
```

This was probably because `mongodb_multi` fixture was always updating
the resource at the end thus recreating it. If the recreation was fast
enough and new processes were added to AC in OpsManager then the test
would fail. The solution was to add `try_load` statement and return if
resource was found.

## Proof of Work

Passing e2e test.

## Checklist
- [ ] Have you linked a jira ticket and/or is the ticket in the title?
- [ ] Have you checked whether your jira ticket required DOCSP changes?
- [ ] Have you checked for release_note changes?

## Reminder (Please remove this when merging)
- Please try to Approve or Reject Changes the PR, keep PRs in review as
short as possible
- Our Short Guide for PRs:
[Link](https://docs.google.com/document/d/1T93KUtdvONq43vfTfUt8l92uo4e4SEEvFbIEKOxGr44/edit?tab=t.0)
- Remember the following Communication Standards - use comment prefixes
for clarity:
  * **blocking**: Must be addressed before approval.
  * **follow-up**: Can be addressed in a later PR or ticket.
  * **q**: Clarifying question.
  * **nit**: Non-blocking suggestions.
  * **note**: Side-note, non-actionable. Example: Praise
  * --> no prefix is considered a question
# Summary

This is a really small follow up to another PR, regarding this comment:
10gen/ops-manager-kubernetes#4232 (comment)
…er (#4245)

# Summary

Although the field `spec.clusterSpecList.externalConnectivity` existed
it was not used in operator code. Based on
[documentation](https://www.mongodb.com/docs/kubernetes-operator/current/reference/k8s-operator-om-specification/#mongodb-opsmgrkube-opsmgrkube.spec.clusterSpecList.externalConnectivity)
we should replace common `spec.externalConnectivity` config in cluster
specific configuration is present:

![Screenshot 2025-04-04 at 13 42
59](https://github.com/user-attachments/assets/4d0ea130-2b87-4d2e-810c-cfb5e6ec61f6)

## Proof of Work

Passing new unit test
`TestOpsManagerInKubernetes_ClusterSpecificExternalConnectivity` and
`e2e_multi_cluster_om_networking_clusterwide` assertion
`test_external_services_are_created`

## Checklist
- [x] Have you linked a jira ticket and/or is the ticket in the title?
- [x] Have you checked whether your jira ticket required DOCSP changes?
- [x] Have you checked for release_note changes?

## Reminder (Please remove this when merging)
- Please try to Approve or Reject Changes the PR, keep PRs in review as
short as possible
- Our Short Guide for PRs:
[Link](https://docs.google.com/document/d/1T93KUtdvONq43vfTfUt8l92uo4e4SEEvFbIEKOxGr44/edit?tab=t.0)
- Remember the following Communication Standards - use comment prefixes
for clarity:
  * **blocking**: Must be addressed before approval.
  * **follow-up**: Can be addressed in a later PR or ticket.
  * **q**: Clarifying question.
  * **nit**: Non-blocking suggestions.
  * **note**: Side-note, non-actionable. Example: Praise
  * --> no prefix is considered a question
# Summary

Allows to configure debug port for OpsManager deployments using
`spec.configuration.mms.k8s.debuggingPort` property. WiKi page
containing relevant guide ->
https://wiki.corp.mongodb.com/spaces/MMS/pages/346198535/Debugging+Ops+Manager+running+in+cluster

## Proof of Work

Tested this locally with other JVM options in `e2e_om_jvm_params` test:
![Screenshot 2025-03-28 at 14 18
53](https://github.com/user-attachments/assets/28c4f0ff-e1b4-4498-ac63-ca3132b4a843)
![Screenshot 2025-03-28 at 14 28
50](https://github.com/user-attachments/assets/e99cd1ea-120c-44f0-b9b1-e66caeceae53)
![Screenshot 2025-03-28 at 14 28
55](https://github.com/user-attachments/assets/5b5a46e5-7ba8-4d09-9464-d71b1970416e)

## Checklist
- [ ] Have you linked a jira ticket and/or is the ticket in the title?
- [ ] Have you checked whether your jira ticket required DOCSP changes?
- [ ] Have you checked for release_note changes?

## Reminder (Please remove this when merging)
- Please try to Approve or Reject Changes the PR, keep PRs in review as
short as possible
- Our Short Guide for PRs:
[Link](https://docs.google.com/document/d/1T93KUtdvONq43vfTfUt8l92uo4e4SEEvFbIEKOxGr44/edit?tab=t.0)
- Remember the following Communication Standards - use comment prefixes
for clarity:
  * **blocking**: Must be addressed before approval.
  * **follow-up**: Can be addressed in a later PR or ticket.
  * **q**: Clarifying question.
  * **nit**: Non-blocking suggestions.
  * **note**: Side-note, non-actionable. Example: Praise
  * --> no prefix is considered a question
_Opened by Private Cloud Tools (PCT)_.

# Ticket
[CLOUDP-310795](https://jira.mongodb.org/browse/CLOUDP-310795)

# Description
Bump Ops Manager container image version to 8.0.6.
Bumps [golang.org/x/crypto](https://github.com/golang/crypto) from
0.33.0 to 0.37.0.
<details>
<summary>Commits</summary>
<ul>
<li><a
href="https://github.com/golang/crypto/commit/959f8f3db0fb8c3fb1f9507101058dda21e1fdcf"><code>959f8f3</code></a>
go.mod: update golang.org/x dependencies</li>
<li><a
href="https://github.com/golang/crypto/commit/769bcd6997ac6f3154e27b73b3587295f7720e66"><code>769bcd6</code></a>
ssh: use the configured rand in kex init</li>
<li><a
href="https://github.com/golang/crypto/commit/d0a798f774735c176ed0d3500ac986957a02660f"><code>d0a798f</code></a>
cryptobyte: fix typo 'octects' into 'octets' for asn1.go</li>
<li><a
href="https://github.com/golang/crypto/commit/acbcbef23f9b1b3b7c64673f0ed8baa83475edbe"><code>acbcbef</code></a>
acme: remove unnecessary []byte conversion</li>
<li><a
href="https://github.com/golang/crypto/commit/376eb1400636d0d687bee5520daadb5fdeac3311"><code>376eb14</code></a>
x509roots: support constrained roots</li>
<li><a
href="https://github.com/golang/crypto/commit/b369b723c8ad46b179f3a49d57bfc7d6a2740cdf"><code>b369b72</code></a>
crypto/internal/poly1305: implement function update in assembly on
loong64</li>
<li><a
href="https://github.com/golang/crypto/commit/6b853fbea37a941d918ac0760a5492802df42b9b"><code>6b853fb</code></a>
ssh/knownhosts: check more than one key</li>
<li><a
href="https://github.com/golang/crypto/commit/49bf5b80c8108983f588ecabd7bf996e6e63a515"><code>49bf5b8</code></a>
go.mod: update golang.org/x dependencies</li>
<li><a
href="https://github.com/golang/crypto/commit/24852b6b3fe89f0f239f5e7181473a28e39ae814"><code>24852b6</code></a>
ssh: add decode support for banners</li>
<li><a
href="https://github.com/golang/crypto/commit/bbc689cf5cfb1b9f9ea88939690590d3521c2487"><code>bbc689c</code></a>
ssh: use a more straightforward return value</li>
<li>Additional commits viewable in <a
href="https://github.com/golang/crypto/compare/v0.33.0...v0.37.0">compare
view</a></li>
</ul>
</details>
<br />

[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=golang.org/x/crypto&package-manager=go_modules&previous-version=0.33.0&new-version=0.37.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
Bumps [types-pyyaml](https://github.com/typeshed-internal/stub_uploader)
from 6.0.2 to 6.0.12.20250402.
<details>
<summary>Commits</summary>
<ul>
<li>See full diff in <a
href="https://github.com/typeshed-internal/stub_uploader/commits">compare
view</a></li>
</ul>
</details>
<br />

[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=types-pyyaml&package-manager=pip&previous-version=6.0.2&new-version=6.0.12.20250402)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
…(#4251)

# Summary

Opening this in place of
10gen/ops-manager-kubernetes#4227 .
Additional investigation with the agent team is required to understand
the race condition when enabling cluster authentication. Opened
https://jira.mongodb.org/browse/CLOUDP-311366 for this investigation.
This patch also removes the assertions for certificate rotation as they
were incorrectly waiting for the resource to leave `Running` state. The
assertion was only accidentally correct, because the resource would
occasionally transition to `Failed` state when waiting for the agents to
be ready would time-out.

Re-enabling certificate rotation check should use a different mechanism
for checking certificates were rotated and consumed correctly (for
example by checking automation config + agents goal state).

## Proof of Work

(Hopefully passing) test here:
https://spruce.mongodb.com/task/ops_manager_kubernetes_e2e_multi_cluster_kind_e2e_multi_cluster_tls_with_x509_patch_aadda155316f40bada5be3c13c9674e143be62f6_67f3958c729c090007eb2875_25_04_07_09_06_22/logs?execution=0

## Checklist
- [x] Have you linked a jira ticket and/or is the ticket in the title?
- [x] Have you checked whether your jira ticket required DOCSP changes?
- [x] Have you checked for release_note changes?

## Reminder (Please remove this when merging)
- Please try to Approve or Reject Changes the PR, keep PRs in review as
short as possible
- Our Short Guide for PRs:
[Link](https://docs.google.com/document/d/1T93KUtdvONq43vfTfUt8l92uo4e4SEEvFbIEKOxGr44/edit?tab=t.0)
- Remember the following Communication Standards - use comment prefixes
for clarity:
  * **blocking**: Must be addressed before approval.
  * **follow-up**: Can be addressed in a later PR or ticket.
  * **q**: Clarifying question.
  * **nit**: Non-blocking suggestions.
  * **note**: Side-note, non-actionable. Example: Praise
  * --> no prefix is considered a question
# Summary

CLOUDP-301237 OM no service mesh implementation. Basically this PR adds
support for fields:
- `spec.applicationDatabase.externalAccess`
- `spec.applicationDatabase.clusterSpecList.externalAccess`

## Proof of Work

New e2e tests:
 - `multicluster_om/multicluster_om_appdb_no_mesh.py`
- this tests leverages use of nginx server to emulate "real world"
LoadBalancer for Ops Manager processes running across all clusters. To
be able to run nginx in TLS passthrough mode we had to use
https://github.com/macbre/docker-nginx-http3 nginx build with `stream`
plugin enabled (flag is required on build time)
 - `om_appdb_external_connectivity.py`
   - single cluster test for external access in appDB

New unit tests:
- `externalDomain` uniqueness validation `opsmanager_validation_test.go`
- `externalDomain` merging and hostname generation test
`appdbreplicaset_controller_multi_test.go`
-  service creation test `appdbreplicaset_controller_test.go`

## Checklist
- [x] Have you linked a jira ticket and/or is the ticket in the title?
- [x] Have you checked whether your jira ticket required DOCSP changes?
- [x] Have you checked for release_note changes?

## Reminder (Please remove this when merging)
- Please try to Approve or Reject Changes the PR, keep PRs in review as
short as possible
- Our Short Guide for PRs:
[Link](https://docs.google.com/document/d/1T93KUtdvONq43vfTfUt8l92uo4e4SEEvFbIEKOxGr44/edit?tab=t.0)
- Remember the following Communication Standards - use comment prefixes
for clarity:
  * **blocking**: Must be addressed before approval.
  * **follow-up**: Can be addressed in a later PR or ticket.
  * **q**: Clarifying question.
  * **nit**: Non-blocking suggestions.
  * **note**: Side-note, non-actionable. Example: Praise
  * --> no prefix is considered a question

---------

Co-authored-by: Lucian Tosa <lucian.tosa@mongodb.com>
Co-authored-by: Lucian Tosa <49226451+lucian-tosa@users.noreply.github.com>
# Summary

Added a no-mesh reference architecture:
* deploy OM no-mesh
* deploy MCRS no-mesh
* deploy MCSC no-mesh

Notable changes:
* We use GKE load balancers to distribute traffic between the OM
replicas across clusters.
* We added externalDNS and a private DNS zone to register the domains.
* **Change the operator to non-static since that is still in preview.
(therefore changed the cert-manager setup to add the certificate chain
of downloads.mongodb.com)**
* Teardowns have been changed to ensure we don't leave resources behind
in our GCP account

Minor changes:
* Stopped forwarding OM service, we can use the external IP of the load
balancer
* For MC Sharded, we immediately set external access, since the bug was
fixed [CLOUDP-294373](https://jira.mongodb.org/browse/CLOUDP-294373)
* Improved teardowns: a new environment variable `code_snippets_reset`
will delete all resources but will keep the clusters
* Enabled backup for mongodb deployments
* We first install the operator and then Istio. This is because we will
now label the namespaces in the Istio section. This is to reuse the
operator for the no-mesh architecture
* Changed the image-registries-secret creation, create a yaml and then
applied. Now, it will not fail if the secret already exists
* Added a random number in the suffix of cluster names, since we will
now run tests in parallel and the `version_id` was not enough.

## Proof of Work

https://spruce.mongodb.com/version/67f3bd15aef79b0007e608de/tasks?sorts=STATUS%3AASC%3BBASE_STATUS%3ADESC

## Checklist
- [x] Have you linked a jira ticket and/or is the ticket in the title?
- [x] Have you checked whether your jira ticket required DOCSP changes?
- [x] Have you checked for release_note changes?

## Reminder (Please remove this when merging)
- Please try to Approve or Reject Changes the PR, keep PRs in review as
short as possible
- Our Short Guide for PRs:
[Link](https://docs.google.com/document/d/1T93KUtdvONq43vfTfUt8l92uo4e4SEEvFbIEKOxGr44/edit?tab=t.0)
- Remember the following Communication Standards - use comment prefixes
for clarity:
  * **blocking**: Must be addressed before approval.
  * **follow-up**: Can be addressed in a later PR or ticket.
  * **q**: Clarifying question.
  * **nit**: Non-blocking suggestions.
  * **note**: Side-note, non-actionable. Example: Praise
  * --> no prefix is considered a question

---------

Co-authored-by: Maciej Karaś <maciej.karas@mongodb.com>
# Summary

- only keep last three om and related agent versions

## Proof of Work

- New test suite
[test_update_release.py](https://github.com/10gen/ops-manager-kubernetes/pull/4249/files#diff-a91ab110a4fcb8cca8a7a591d31e30a2f29eee8a1ed78b493637d3003d4497e8)
passing.
- image building is green in
[patch](https://spruce.mongodb.com/version/67f4e0faee9ccf00078be755/tasks?sorts=STATUS%3AASC%3BBASE_STATUS%3ADESC)
- daily rebuilds are green in
[patch](https://spruce.mongodb.com/version/67f38d4d30c9d50007d809c2/tasks?sorts=STATUS%3AASC%3BBASE_STATUS%3ADESC)

## Checklist
- [x] Have you linked a jira ticket and/or is the ticket in the title?
- [x] Have you checked whether your jira ticket required DOCSP changes?
- [ ] Have you checked for release_note changes?
# Summary
Test is failing after OM bump, this PR fixes it.
[CLOUDP-311483](https://jira.mongodb.org/browse/CLOUDP-311483)

## Proof of Work

Relevant test passes again.

https://spruce.mongodb.com/task/ops_manager_kubernetes_e2e_om80_kind_ubi_e2e_om_remotemode_patch_059fe28d0c9068ec2b3e1d8ff8ce4aeb82b26011_67f56b840e743000070d4d01_25_04_08_18_31_36?execution=0&sortBy=STATUS&sortDir=ASC

## Checklist
- [x] Have you linked a jira ticket and/or is the ticket in the title?
- [x] Have you checked whether your jira ticket required DOCSP changes?
- [ ] Have you checked for release_note changes?

## Reminder (Please remove this when merging)
- Please try to Approve or Reject Changes the PR, keep PRs in review as
short as possible
- Our Short Guide for PRs:
[Link](https://docs.google.com/document/d/1T93KUtdvONq43vfTfUt8l92uo4e4SEEvFbIEKOxGr44/edit?tab=t.0)
- Remember the following Communication Standards - use comment prefixes
for clarity:
  * **blocking**: Must be addressed before approval.
  * **follow-up**: Can be addressed in a later PR or ticket.
  * **q**: Clarifying question.
  * **nit**: Non-blocking suggestions.
  * **note**: Side-note, non-actionable. Example: Praise
  * --> no prefix is considered a question
# Summary

Added telemetry for external domain with one of four options:
```
	ExternalDomainMixed           = "Mixed"
	ExternalDomainClusterSpecific = "ClusterSpecific"
	ExternalDomainUniform         = "Uniform"
	ExternalDomainNone            = "None"
```

Additionally fixed issue with telemetry collector that duplicated events
whenever there were more than one deployment type managed by the
operator. Example of code with the issue. Events are appended the the
events passed to `addEvents` function and the result is also appended to
events slice:

https://github.com/10gen/ops-manager-kubernetes/blob/1299c06bcdea1d9d6fe59069aca66082abc17ce5/pkg/telemetry/collector.go#L261-L263

## Proof of Work

Passing new unit and e2e tests. Also tested this locally and config map
gets properly populated.

## Checklist
- [x] Have you linked a jira ticket and/or is the ticket in the title?
- [x] Have you checked whether your jira ticket required DOCSP changes?
- [x] Have you checked for release_note changes?

## Reminder (Please remove this when merging)
- Please try to Approve or Reject Changes the PR, keep PRs in review as
short as possible
- Our Short Guide for PRs:
[Link](https://docs.google.com/document/d/1T93KUtdvONq43vfTfUt8l92uo4e4SEEvFbIEKOxGr44/edit?tab=t.0)
- Remember the following Communication Standards - use comment prefixes
for clarity:
  * **blocking**: Must be addressed before approval.
  * **follow-up**: Can be addressed in a later PR or ticket.
  * **q**: Clarifying question.
  * **nit**: Non-blocking suggestions.
  * **note**: Side-note, non-actionable. Example: Praise
  * --> no prefix is considered a question

---------

Co-authored-by: Lucian Tosa <lucian.tosa@mongodb.com>
Co-authored-by: Lucian Tosa <49226451+lucian-tosa@users.noreply.github.com>
…ster (#4192)

Bumps [golang.org/x/net](https://github.com/golang/net) from 0.34.0 to
0.36.0.
<details>
<summary>Commits</summary>
<ul>
<li><a
href="https://github.com/golang/net/commit/85d1d54551b68719346cb9fec24b911da4e452a1"><code>85d1d54</code></a>
go.mod: update golang.org/x dependencies</li>
<li><a
href="https://github.com/golang/net/commit/cde1dda944dcf6350753df966bb5bda87a544842"><code>cde1dda</code></a>
proxy, http/httpproxy: do not mismatch IPv6 zone ids against hosts</li>
<li><a
href="https://github.com/golang/net/commit/fe7f0391aa994a401c82d829183c1efab7a64df4"><code>fe7f039</code></a>
publicsuffix: spruce up code gen and speed up PublicSuffix</li>
<li><a
href="https://github.com/golang/net/commit/459513d1f8abff01b4854c93ff0bff7e87985a0a"><code>459513d</code></a>
internal/http3: move more common stream processing to genericConn</li>
<li><a
href="https://github.com/golang/net/commit/aad0180cad195ab7bcd14347e7ab51bece53f61d"><code>aad0180</code></a>
http2: fix flakiness from t.Log when GOOS=js</li>
<li><a
href="https://github.com/golang/net/commit/b73e5746f64471c22097f07593643a743e7cfb0f"><code>b73e574</code></a>
http2: don't log expected errors from writing invalid trailers</li>
<li><a
href="https://github.com/golang/net/commit/5f45c776a9c4d415cbe67d6c22c06fd704f8c9f1"><code>5f45c77</code></a>
internal/http3: make read-data tests usable for server handlers</li>
<li><a
href="https://github.com/golang/net/commit/43c2540165a4d1bc9a81e06a86eb1e22ece64145"><code>43c2540</code></a>
http2, internal/httpcommon: reject userinfo in :authority</li>
<li><a
href="https://github.com/golang/net/commit/1d78a085008d9fedfe3f303591058325f99727d7"><code>1d78a08</code></a>
http2, internal/httpcommon: factor out server header logic for
h2/h3</li>
<li><a
href="https://github.com/golang/net/commit/0d7dc54a591c12b4bd03bcd745024178d03d9218"><code>0d7dc54</code></a>
quic: add Conn.ConnectionState</li>
<li>Additional commits viewable in <a
href="https://github.com/golang/net/compare/v0.34.0...v0.36.0">compare
view</a></li>
</ul>
</details>
<br />

[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=golang.org/x/net&package-manager=go_modules&previous-version=0.34.0&new-version=0.36.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
This is more realistic: this is what we have in operatorhub.io and Red
Hat certified catalogs.
Currently we build a bundle with pinned images for Red Hat certified OLM
catalog and non-pinned (tagged) images for operatorhub.io.

With this change we stop building bundles with non-pinned images and
unify bundle.
…243)

_Opened by Private Cloud Tools (PCT)_.

# Ticket
[CLOUDP-310790](https://jira.mongodb.org/browse/CLOUDP-310790)

# Description
Bump Ops Manager container image version to 7.0.15.

---------

Co-authored-by: Evergreen <kubernetes-hosted-team@mongodb.com>
Co-authored-by: Anand <13899132+anandsyncs@users.noreply.github.com>
Co-authored-by: Lucian Tosa <lucian.tosa@mongodb.com>
Co-authored-by: Anand Singh <anand.singh@mongodb.com>
@Julien-Ben Julien-Ben changed the title Master test pr Merge operator history into master Apr 10, 2025
@Julien-Ben Julien-Ben closed this May 20, 2025
lucian-tosa added a commit that referenced this pull request Jun 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.