Kubernetes service endpoints no longer auto created, updated or deleted, how to fix?
... / Kubernetes service endpoi...
BMPCreated with Sketch.BMPZIPCreated with Sketch.ZIPXLSCreated with Sketch.XLSTXTCreated with Sketch.TXTPPTCreated with Sketch.PPTPNGCreated with Sketch.PNGPDFCreated with Sketch.PDFJPGCreated with Sketch.JPGGIFCreated with Sketch.GIFDOCCreated with Sketch.DOC Error Created with Sketch.
Question

Kubernetes service endpoints no longer auto created, updated or deleted, how to fix?

by
Uniqbit
Created on 2025-06-03 08:50:00 (edited on 2025-06-09 15:24:29) in Containers and Orchestration

Since 5 days ago, one of our managed Kubernetes clusters is malfunctioning and we do not know how to address this issue. The symptoms are, that the some workload on the cluster hat trouble verifying node certificates, which we solved via rotating all existing nodes in our node pools. But there is a troubling issue remaining, that the cluster is no longer automatically creating or updating endpoint objects for service objects. Meaning, when we deploy a new e. g. deployment with a service, we are not able to reach it from other pod, because it will never create and bind an endpoint for it. Deleting a service is occasionally deleting the corresponding endpoint still, but not reliably.

The Cluster does not log or report errors we could relate to this issue, from the surface all looks green. Do you guys have potentially made a the same or a similar experience and know how to best address it?


1 Reply ( Latest reply on 2025-06-09 15:30:39 by
Uniqbit
)

I'm seeing that the ovhcloud-konnectivity-agent is logging failed connections:

I0603 14:13:34.506695 1 client.go:354] "Received DIAL_REQ" serverID="b121eaba-7112-443c-b6a7-686026918cc2" agentID="63a50384-0369-4b5e-9fd0-a263fadc4ea5" dialID=7498280033314238644 dialAddress="10.2.1.33:443"
I0603 14:13:34.510103 1 client.go:354] "Received DIAL_REQ" serverID="b121eaba-7112-443c-b6a7-686026918cc2" agentID="63a50384-0369-4b5e-9fd0-a263fadc4ea5" dialID=5599432136654785947 dialAddress="10.2.1.33:443"
I0603 14:13:34.512266 1 client.go:483] "received CLOSE_REQ" connectionID=0
I0603 14:13:34.512281 1 client.go:489] "Failed to find connection context for close" connectionID=0
I0603 14:13:35.397189 1 client.go:420] "error dialing backend" error="dial tcp 10.2.3.24:10250: i/o timeout" dialID=6139996188483838982 connectionID=5782 dialAddress="10.2.3.24:10250"
I0603 14:13:35.406239 1 client.go:483] "received CLOSE_REQ" connectionID=0
I0603 14:13:35.406258 1 client.go:489] "Failed to find connection context for close" connectionID=0
I0603 14:13:36.093710 1 client.go:354] "Received DIAL_REQ" serverID="b121eaba-7112-443c-b6a7-686026918cc2" agentID="63a50384-0369-4b5e-9fd0-a263fadc4ea5" dialID=8667963574853404645 dialAddress="10.2.3.101:9443"
I0603 14:13:36.114566 1 client.go:420] "error dialing backend" error="dial tcp 10.2.1.33:443: i/o timeout" dialID=6571743257240054316 connectionID=5783 dialAddress="10.2.1.33:443"
I0603 14:13:36.124114 1 client.go:483] "received CLOSE_REQ" connectionID=0
I0603 14:13:36.124130 1 client.go:489] "Failed to find connection context for close" connectionID=0
I0603 14:13:36.262956 1 client.go:420] "error dialing backend" error="dial tcp 10.2.1.33:443: i/o timeout" dialID=8826262292868951193 connectionID=5784 dialAddress="10.2.1.33:443"
I0603 14:13:36.272076 1 client.go:483] "received CLOSE_REQ" connectionID=0
I0603 14:13:36.272089 1 client.go:489] "Failed to find connection context for close" connectionID=0
I0603 14:13:36.455660 1 client.go:354] "Received DIAL_REQ" serverID="b121eaba-7112-443c-b6a7-686026918cc2" agentID="63a50384-0369-4b5e-9fd0-a263fadc4ea5" dialID=6633832615179440962 dialAddress="10.2.3.24:10250"
I0603 14:13:36.504801 1 client.go:420] "error dialing backend" error="dial tcp 10.2.1.33:443: i/o timeout" dialID=53346797863079866 connectionID=5785 dialAddress="10.2.1.33:443"
I0603 14:13:36.513935 1 client.go:483] "received CLOSE_REQ" connectionID=0
I0603 14:13:36.513953 1 client.go:489] "Failed to find connection context for close" connectionID=0
I0603 14:13:36.524846 1 client.go:420] "error dialing backend" error="dial tcp 10.2.1.33:443: i/o timeout" dialID=762598440534119604 connectionID=5786 dialAddress="10.2.1.33:443"
I0603 14:13:36.524852 1 client.go:420] "error dialing backend" error="dial tcp 10.2.1.33:443: i/o timeout" dialID=1150912048355849252 connectionID=5787 dialAddress="10.2.1.33:443"
I0603 14:13:36.533900 1 client.go:483] "received CLOSE_REQ" connectionID=0
I0603 14:13:36.533915 1 client.go:489] "Failed to find connection context for close" connectionID=0
I0603 14:13:36.534034 1 client.go:483] "received CLOSE_REQ" connectionID=0
I0603 14:13:36.534049 1 client.go:489] "Failed to find connection context for close" connectionID=0