Since 5 days ago, one of our managed Kubernetes clusters is malfunctioning and we do not know how to address this issue. The symptoms are, that the some workload on the cluster hat trouble verifying node certificates, which we solved via rotating all existing nodes in our node pools. But there is a troubling issue remaining, that the cluster is no longer automatically creating or updating endpoint objects for service objects. Meaning, when we deploy a new e. g. deployment with a service, we are not able to reach it from other pod, because it will never create and bind an endpoint for it. Deleting a service is occasionally deleting the corresponding endpoint still, but not reliably.
The Cluster does not log or report errors we could relate to this issue, from the surface all looks green. Do you guys have potentially made a the same or a similar experience and know how to best address it?
I'm seeing that the ovhcloud-konnectivity-agent is logging failed connections: