Metric Server
For my local K8S cluster of 4 nodes I wanted to get the resource usage metrics. From the official documentation of kubernetes.io
These metrics can be accessed either directly by the user with the
https://kubernetes.iokubectl top
command, or by a controller in the cluster, for example Horizontal Pod Autoscaler, to make decisions.
I deployed Metric Server v0.4.1
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/download/v0.4.1/components.yaml
serviceaccount/metrics-server configured
clusterrole.rbac.authorization.k8s.io/system:aggregated-metrics-reader configured
clusterrole.rbac.authorization.k8s.io/system:metrics-server configured
rolebinding.rbac.authorization.k8s.io/metrics-server-auth-reader configured
clusterrolebinding.rbac.authorization.k8s.io/metrics-server:system:auth-delegator configured
clusterrolebinding.rbac.authorization.k8s.io/system:metrics-server configured
service/metrics-server created
deployment.apps/metrics-server created
apiservice.apiregistration.k8s.io/v1beta1.metrics.k8s.io configured
After installation is complete, fire in to check the top node stats.
k top node
For me I was getting
Error from server (ServiceUnavailable): the server is currently unable to handle the request (get nodes.metrics.k8s.io)
I could see the POD and deployment
kube-system pod/metrics-server-5d5c49f488-8dhdf 0/1 Running 0 16s
kube-system deployment.apps/metrics-server 0/1 1 0 16s
A quick describe for pod revealed
k describe pod/metrics-server-5d5c49f488-8dhdf -n kube-system
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 37s default-scheduler Successfully assigned kube-system/metrics-server-5d5c49f488-8dhdf to i01.samarthya.me
Normal Pulling 36s kubelet Pulling image "k8s.gcr.io/metrics-server/metrics-server:v0.4.1"
Normal Pulled 25s kubelet Successfully pulled image "k8s.gcr.io/metrics-server/metrics-server:v0.4.1" in 11.718855147s
Normal Created 24s kubelet Created container metrics-server
Normal Started 24s kubelet Started container metrics-server
Warning Unhealthy 7s (x2 over 17s) kubelet Liveness probe failed: HTTP probe failed with statuscode: 500
Warning Unhealthy 5s (x2 over 15s) kubelet Readiness probe failed: HTTP probe failed with statuscode: 500
Looking at the logs
k log pod/metrics-server-5d5c49f488-8dhdf -n kube-system
E1209 07:23:57.351296 1 server.go:132] unable to fully scrape metrics: [unable to fully scrape metrics from node mymachine.cluster.samarthya.me: unable to fetch metrics from node mymachine.cluster.samarthya.me: Get "https://10.80.241.70:10250/stats/summary?only_cpu_and_memory=true": x509: cannot validate certificate for 10.80.241.70 because it doesn't contain any IP SANs, unable to fully scrape metrics from node i01.samarthya.me: unable to fetch metrics from node i02.samarthya.me: Get "https://10.80.120.149:10250/stats/summary?only_cpu_and_memory=true": x509: cannot validate certificate for 10.80.120.149 because it doesn't contain any IP SANs, unable to fully scrape metrics from node ibndev003277samarthya.me: unable to fetch metrics from node ibndev003277samarthya.me: Get "https://10.80.120.148:10250/stats/summary?only_cpu_and_memory=true": x509: cannot validate certificate for 10.80.120.148 because it doesn't contain any IP SANs, unable to fully scrape metrics from node i03.samarthya.me: unable to fetch metrics from node i04.samarthya.me: Get "https://10.80.241.80:10250/stats/summary?only_cpu_and_memory=true": x509: cannot validate certificate for 10.80.241.80 because it doesn't contain any IP SANs, unable to fully scrape metrics from node i03.samarthya.me: unable to fetch metrics from node i03.samarthya.me: Get "https://10.80.241.78:10250/stats/summary?only_cpu_and_memory=true": x509: cannot validate certificate for 10.80.241.78 because it doesn't contain any IP SANs]
and after some time I could see
kube-system pod/metrics-server-5d5c49f488-8dhdf 0/1 CrashLoopBackOff 4 2m48s
I could see the selfsigned certificares were problematic for the configuration
k edit deployment.apps/metrics-server -n kube-system
Added `
--kubelet-insecure-tls
to the deployment and saved it.
containers:
- args:
- --cert-dir=/tmp
- --secure-port=4443
- --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
- --kubelet-use-node-status-port
- --kubelet-insecure-tls
It fired up the new container
kube-system pod/metrics-server-56c59cf9ff-qcxlc 0/1 ContainerCreating 0 5s
k describe pod/metrics-server-56c59cf9ff-qcxlc -n kube-system
Normal Created 23s kubelet Created container metrics-server
Normal Started 23s kubelet Started container metrics-server
k logs pod/metrics-server-56c59cf9ff-qcxlc -n kube-system
I1209 07:27:03.832542 1 serving.go:325] Generated self-signed cert (/tmp/apiserver.crt, /tmp/apiserver.key)
I1209 07:27:04.575200 1 secure_serving.go:197] Serving securely on [::]:4443
I1209 07:27:04.575282 1 requestheader_controller.go:169] Starting RequestHeaderAuthRequestController
I1209 07:27:04.575295 1 shared_informer.go:240] Waiting for caches to sync for RequestHeaderAuthRequestController
I1209 07:27:04.575372 1 dynamic_serving_content.go:130] Starting serving-cert::/tmp/apiserver.crt::/tmp/apiserver.key
I1209 07:27:04.575450 1 tlsconfig.go:240] Starting DynamicServingCertificateController
I1209 07:27:04.575546 1 configmap_cafile_content.go:202] Starting client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I1209 07:27:04.575551 1 shared_informer.go:240] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I1209 07:27:04.575563 1 configmap_cafile_content.go:202] Starting client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I1209 07:27:04.575567 1 shared_informer.go:240] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I1209 07:27:04.675539 1 shared_informer.go:247] Caches are synced for RequestHeaderAuthRequestController
I1209 07:27:04.675584 1 shared_informer.go:247] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I1209 07:27:04.675731 1 shared_informer.go:247] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
With things appearing to be running fine time to find the top node
k top node
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
i01.samarthya.me 131m 0% 7004Mi 21%
i02.samarthya.me 85m 0% 5542Mi 17%
i03.samarthya.me 163m 1% 13011Mi 40%
i04.samarthya.me 395m 2% 7191Mi 22%
i05.samarthya.me 94m 0% 11536Mi 36%
Help
- https://kubernetes.io/docs/tasks/debug-application-cluster/resource-metrics-pipeline/
- https://github.com/kubernetes/community/blob/master/contributors/design-proposals/instrumentation/metrics-server.md