Kubernetes Monitoring: A Beginner’s Guide

What Is Kubernetes Monitoring?

Kubernetes monitoring involves tracking application performance and resource utilization across cluster components, such as pods, containers, and services. The goal is to gain visibility into the health and security of your clusters. Kubernetes provides built-in features for monitoring, including the resource metrics pipeline that tracks several metrics like node CPU and memory usage and a full metrics pipeline.

Kubernetes monitoring enables you to gain visibility into your cluster behavior. This information is critical to ensure you can proactively manage clusters effectively and efficiently. Since each Kubernetes scenario is unique, you may need to track different metrics and configure alerts to notify specific stakeholders of certain events. However, all scenarios typically require visibility into resource utilization, misconfigurations, failures, and security.

Why Is Kubernetes Monitoring Important?

Containers are immutable. While traditional software development models let you update a program as needed, you cannot take the same approach with containers. You can only update the code by retiring a container and replacing it with a new container. It means you need to handle and monitor numerous deployments to keep your applications up-to-date. Doing this manually is inefficient, verging on impossible.

Kubernetes monitoring features and tools enable you to gain visibility into your cluster performance, but visibility is only one advantage of monitoring. Here are additional benefits of Kubernetes monitoring:

Reporting—Kubernetes monitoring features and tools provide reports that offer insights into clusters, Kubernetes deployments, pods, nodes, and containers. Each feature and tool provides different levels of granularity and control into the types of metrics you can track and the level of visibility.
Insights—monitoring tools and reports use tracked metrics and display this data as meaningful insights. You can use this information to optimize your cluster health, performance, and security configurations. Resource utilization insights are also useful in optimizing costs.
Alerts—most monitoring tools provide alerting capabilities, which you can configure to push notifications to specific stakeholders during certain events. Alerting is critical to ensure teams can respond timely to performance and security events.

Try OnPage for FREE! Request an enterprise free trial.

8 Key Kubernetes Metrics to Monitor

Each Kubernetes scenario has different characteristics requiring a unique set of metrics. When choosing metrics, you should start by assessing your project and then choose the most appropriate metrics. Here are key performance metrics you can start with:

Deployments and DaemonSets—a deployment is a Kubernetes controller that creates pods as needed to meet the desired state, and a DaemonSets is a Kubernetes resource that manages stateful applications. Monitoring current deployments and DaemonSets can help you monitor the performance of your cluster.
Storage volume health monitoring—this metric lets you see CSI Drivers and detect anomalous volume conditions from the underlying storage systems. It lets you report these anomalies as events on Kubernetes Persistent Volumes or PersistentVolumeClaims (PVCs).
Missing and failed pods—this metric can help determine the performance of pods, identifying missing and failed pods.
Pod restarts—you should monitor this metric to see how many times your pods restarted.
Pods in the CrashLoopBackOff state—pods in this state can indicate several issues. It may be that a container’s application keeps crashing or a misconfiguration causing the pod to crash.
Running vs desired pods—use this metric to see the number of instances actually ready for each service compared to the expected, desired number.
Pod resource usage vs requests and limits—these metrics can help you determine whether pod limits are configured appropriately and the actual CPU and memory usage.
Available and unavailable pods—use this metric to determine whether a pod is running but remains available. If this occurs, the pod cannot really accept traffic. Spikes in the number of unavailable pods can indicate a misconfiguration issue.

Kubernetes Monitoring Best Practices

Set Up On-Call Notifications

Kubernetes monitoring is not useful if you do not have an effective way to push notifications to cluster administrators. Adopt tooling that allows you to define staff responsible for a Kubernetes cluster, and push high-priority alerts to them using channels like email, SMS, Slack, or dedicated notification apps (i.e., message redundancies). It is also valuable to define escalation paths, so that if an individual is not available or cannot resolve the issue, the notification is immediately passed to their superior.

Automated alerting tools integrate seamlessly with Kubernetes, and they streamline the incident detection-to-resolution process for response teams through:

Configurable fail-safe schedules
Immediate, prioritized mobile alerting
Historic and real-time visibility into incidents
Real-time, cross-team collaboration
Encrypted mobile messaging with persistent alerts that bypass the mute switch

Alerting systems ensure that critical incidents rise above the noise, and they orchestrate incident alerts to notify the right people at the right time, every time. That way, critical incidents are resolved promptly without implicated consequences.

Monitoring Kubernetes in a Cloud Environment

Cloud environments present unique challenges. Here are important metrics to monitor when deploying Kubernetes in a cloud environment:

IAM events—you should monitor various identity and access management (IAM) events related to permissions changes and logins. Monitoring IAM events can help you identify and block insider and malicious threats trying to misuse credentials.
Cloud APIs—each cloud vendor provides a different set of APIs and Kubernetes uses these vendor APIs to request cloud resources. Monitoring cloud APIs can help you ensure these connections remain secure and do not introduce threats or cause performance issues.
Costs—cloud providers offer various pricing models, including resources you can consume on-demand and discounts for reserved instances. Cost monitoring can help you keep track of different costs to ensure billing remains on budget.
Network performance—the performance of your Kubernetes workloads relies heavily on the network performance provided by the cloud vendor. You must monitor your cloud network regularly to ensure data moves as quickly as required and watch for malicious traffic. Use Kubernetes networking technologies to set security policies and define network segmentation.

Try OnPage for FREE! Request an enterprise free trial.

Track the API Gateway for Microservices

API metrics can help you gain visibility into the performance of your microservices. For example, latency, request rate, and call error metrics can indicate degraded performance in a specific component within a specific service.

You can identify service-level metrics by automatically detecting anomalies on API requests on the service’s load balancer. You can use an ingress controller like Istio or Nginx. It can help you gain visibility into agnostic metrics you can use to measure all Kubernetes services.

Always Alert on High Disk Usage

High disk usage (HDU) is a common issue in Kubernetes workloads. HDU alerts typically indicate there’s an issue that can affect an application’s end-users. You should monitor all of your disk volumes, including the root file system. Ideally, you should set HDU alerts to 75%-80% utilization.

Conclusion

In this article, we explained the basics of Kubernetes monitoring, covered several critical Kubernetes metrics including failed pods, pod restart, and CrashLoopBackOff, and provided four best practices that can help you more effectively monitor your Kubernetes clusters:

Set up on-call notifications—ensure you have an effective way to notify on-call Kubernetes cluster administrators.
Track metrics relevant for cloud environments—if you are running Kubernetes in the cloud, pay attention to critical cloud-specific metrics.
Track API gateways—an API gateway can provide early warning of problems in downstream microservices.
Always alert on high disk usage (HDU)—HDU is an issue that almost always affects end users and can result in disruption of service.

We hope this will be useful as you improve the observability of your Kubernetes environment.