Since five days ago, one of our managed Kubernetes clusters has been malfunctioning, and we are unsure how to address this issue. The symptoms include some workloads on the cluster having trouble verifying node certificates, which we resolved by rotating all existing nodes in our node pools. However, a troubling issue remains: the cluster is no longer automatically creating or updating endpoint objects for service objects. This means that when we deploy a new deployment with a service, for example, we cannot reach it from other pods because the cluster does not create or bind an endpoint for it. Deleting a service occasionally deletes the corresponding endpoint, but this behavior is not reliable.
The cluster does not log or report errors that we can relate to this issue, and on the surface, everything appears to be functioning normally. Have you experienced a similar issue and know the best way to address it?
[Edit] corrected spelling and grammar..
I'm seeing that the ovhcloud-konnectivity-agent is logging failed connections: