Kubernetes as an open-source container orchestration engine has provided well-defined interfaces to automate container deployment, scheduling, and scaling.
With its thousands of contributors spanning across the globe, Kubernetes multi-layer(containers, pods, services) architecture adoption has been consistently increasing, simplifying the deployment of various containerized workflows.
Despite the popularity and widespread adoption Kubernetes has gained in container orchestration space, monitoring a Kubernetes cluster due to its multi-layer abstractions is still challenging giving rise to many complexities.
To understand these complexities, it’s best to know how Kubernetes monitoring works. That’s why in this blog, we will discuss the Kubernetes monitoring structure and how to use its functionality for monitoring Kubernetes components to implement an end-to-end observability strategy for your workloads.
Kubernetes monitoring design directly stems from the extensible and pluggable architecture which provides the freedom to integrate plugins or extensions to gain insights into Kubernetes resources.
Kubernetes ships with some built-in monitoring features that are implemented as extensions and plugins running on top of Kubernetes cluster components.
Starting from cAdvisor, cAdvisor is an open-source Kubernetes monitoring tool developed by Google to monitor and analyze container resources and performance.
It collects and exports metrics such as CPU, memory, and network usage for all containers which are pulled by Kubelet, scheduler, and controllers for various orchestration tasks.
Kubelet is the primary node component that manages the running and deployment of containers on nodes in Kubernetes. It handles all the communication happening between the master and worker nodes while acting as a node resource usage collector. Container metrics pulled by kubelet from cadvisor can also be exposed as an aggregate pod resource usage through a REST API.
Scheduler and controllers are master components that manage the pod’s lifecycle through available pod spec or pod metrics. A scheduler, also known as a kube scheduler, uses pod specifications and decides to run a pod on the desired node. Controllers such as the kube-controller-manager or cloud-controller manager are responsible for node failure detection, replication, and interaction with underlying cloud providers.
Probes are also built into the Kubernetes monitoring paradigm, and they actively monitor the health of a container. Through Liveness and Readiness Probes, Kubernetes determines and restart the pods for optimum functioning of applications.
The last major inbuilt monitoring component in Kubernetes is the Kubernetes dashboard which is available as an add-on for providing an overview of the resources running on your cluster through an interactive user interface.
Although inbuilt Kubernetes monitoring tools provide enough capabilities for recovering from pod failures and crashes, there are contexts where these tools cant provide enough monitoring.
To overcome these situations, Kubernetes, from its 1.6 and 1.8 version releases has pushed an API-based structure based on resource and custom metrics API. These API’s are created so third party monitoring solutions can easily access Kubernetes metrics generated by node and master components to provide more advanced monitoring capabilities.
API should be implemented by monitoring pipeline vendors on top of their metrics storage solutions which can store different types of core and service metrics. Top monitoring vendors that implement Kubernetes API architecture as of now are Prometheus, Datadog, and Heapster.
Monitoring Kubernetes Components
After understanding the architecture of Kubernetes monitoring its important to discuss how this architecture provides insights at different layers of the Kubernetes environment
They represent the high-level layer of Kubernetes which includes pods, nodes, and applications to monitor. When monitoring clusters, some metrics to look for are CPU, disk, failed pods, and nodes’ network resources.
Monitoring the CPU and disk usage of a failed pod will determine why it is not running properly. Maybe it is bottlenecked by storage which indicates low CPU usage while high read and write disk speeds.
Network resource monitoring will help monitor whether a compromised system is eating all the bandwidth or data consumption patterns have changed over months.
Pods work best when they are assigned the right amount of resources. When monitoring your pods, monitor their pattern of deployment. According to their specs, are they running on the desired nodes, misconfiguration in nodes can create performance bottlenecks.
Also, pods can be tracked to know whether they are configured with the right resources for high availability while not hogging a large number of data.
Many monitoring factors have to be taken into consideration when running Kubernetes in Cloud. Kubernetes Cloud controller manager API configuration has to match with cloud vendor API for efficient metrics integration. Monitoring of identities has to be done to avoid unauthorized access as a root user.
Charges for cloud resources can multiply very easily, so it is necessary to monitor charges for every instance you deploy on the cloud.
Monitoring Network performance is essential for retrieving application data from the cloud. Any bottlenecks in the network may be signaling towards DOS attack or poorly configured network resources.
Depending upon the amount of code and services you have implemented, a Cloud provider might require a load balancing service configuration to maintain high performance under heavy loads.
Also, If nodes are running intense services, there may be a chance that the cloud storage is running out of disk space leading to slow write speeds. Implementing dynamic storage provisioning in this context will automatically resize the disk space while keeping an eye on storage bottlenecks.