What is Kubernetes?
Kubernetes (K8s) is an open-source project designed to manage a cluster of Linux containers as a single system. Kubernetes manages and runs Docker containers on a large number of hosts and provides co-hosting and replication of a large number of containers. The project was started by Google and is now supported by many companies, including Microsoft, RedHat, IBM, and Docker.
Google has been using container technology for over a decade. It started by launching over 2 billion containers in one week. With the help of the Kubernetes project, the company shares its experience in creating an open platform designed to run containers at scale.
The project has two goals. If you are using Docker containers, the next question is how to scale and run containers on many Docker hosts at once and how to balance them. The project offers a high-level API that defines a logical grouping of containers, allowing you to define container pools and load balance, and set their placement.
How Kubernetes appeared
The book Site Reliability Engineering describes an internal Google project – the Borg cluster management system. In 2014, Google published the source codes for this project. In 2015, in partnership with the Linux Foundation, they organized the Cloud Native Computing Foundation (CNCF), to which they transferred Kubernetes sorts as their technical contribution. This foundation develops open-source projects to establish utilities and libraries that allow you to create applications focused on cloud architecture models.
Now Kubernetes is an alumnus Cloud Native Computing Foundation. He was brought to a stable version and received the status of Graduated Project (completed project) in CNCF terminology.
The first versions of Kubernetes were more monolithic and tailored to work with Docker in the background. In the CNCF program, Kubernetes has become a stable and extensible product, and technology change has become possible at almost every level of the virtual infrastructure. Now Kubernetes lends itself to a fairly high customization: you can choose any technology for working with containers, storage, or a network.
This approach to development has made Kubernetes a popular solution for production systems and corporations; it has more security-related components and more stable resource and process management algorithms.
Main components of Kubernetes
- Node. Nodes or nodes are virtual or physical machines on which containers are deployed and run. A collection of nodes forms a Kubernetes cluster. The first running node or controller node directly manages the group using the controller manager and the scheduler. It is responsible for the user interaction interface through the API server and contains storage with the cluster configuration, metadata, and object statuses.
- Namespace. An object designed to delimit cluster resources between teams and projects. Namespaces are several virtual clusters running on one physical one.
- Pod. Immediate deployment and main logical unit in K8s. Pods are a set of one or more containers for joint deployment on a node. Grouping containers of different types are required when they are interdependent and must run in the same node. This allows you to increase the speed of response during the interaction. For example, these can be containers that store a web application and a service for caching it.
- ReplicaSet. An object responsible for describing and managing multiple instances (replicas) of pods created on a cluster. Having more than one replica improves the resiliency and scalability of the application. In practice, a ReplicaSet is made using a Deployment. ReplicaSet is a more advanced version of the previous way to organize the creation of replicas in K8s – Replication Controller.
- Deployment. An object that stores a description of the pods, the number of replicas, and the algorithm for replacing them if the parameters change. The deployment controller lets you perform declarative updates (using a desired state description) on objects such as nodes and replica sets.
- StatefulSet. Like other objects, such as ReplicaSet or Deployment, Statefulset allows you to deploy and manage one or more Pods. But unlike them, pod IDs have predictable and persistent values across restarts.
- DaemonSet. An object responsible for ensuring that one instance of the selected pod is launched on each node (or several selected ones).
- Job/CronJob. Objects to regulate the one-time or regular launch of selected Pods and control their completion. The Job controller is responsible for a single launch; the CronJob is responsible for launching several jobs on a schedule.
- Label/Selector. Labels are for marking resources. Allow simplifying group manipulations with them. Selectors allow you to select/filter objects based on the value of the labels. Labels and selectors are not independent Kubernetes objects, but without them, the system cannot function fully.
- Service. A tool for publishing an application as a network service. They are used, among other things, to balance traffic/load between pods.
How Kubernetes works
Kubernetes consists of two large parts:
- Control Plane – orchestrator, API, and configuration base.
- Node Pools – servers with available resources.
The Kubernetes Controller Node server is responsible for the Control Plane. Kubernetes Worker Nodes are grouped into a Node Pool (pool of nodes, nodes). As a rule, one Node Pool corresponds to a group of servers with the same specified characteristics—a Windows-based server pool, a Linux-based server pool, and a GPU server pool.
On each node (or node, a separate physical server or virtual machine), a kubelet agent is installed – it helps to receive instructions from the Controller Node server and various components, drivers, and extensions for network security monitoring. All this adds up to a platform for deploying an application.
In summary, the application itself is described through a deployment resource containing several pods (from now on referred to as pods), each of which has one to several containers. Pods are just the units of system scaling in Kubernetes – the molecules that make up the system. For example, when choosing which node this or that application component will be launched, the minimum resource area is considered under and not a separate container or physical server.
Hierarchy of Kubernetes components. Node Pool consists of nodes (Worker Node). An Application consists of a Deployment, which consists of Pods containing containers.
The principle of operation of Kubernetes is similar to classic clusters. The brain of the system is the Kubernetes Controller Node, which is responsible for the Control Plane and contains the following:
- API for administrators and developers;
- Configuration base with parameters of containers, applications, deployment, network, and storage;
- An orchestrator or scheduler that runs containers.
Let me remind you that Kubernetes unites node pools- servers connected by common characteristics – for example, a pool of Windows servers, a pool of Linux servers, and a pool of servers with a GPU. Nodes are combined into collections according to certain characteristics. Then the administrator tells Kubernetes what these characteristics are: computing power, memory, storage – and Kubernetes allocates resources on its own, finds nodes in the cluster that satisfy these characteristics, and launches application pods on them.
You can set a minimum and a maximum number of pods for an application, and Kubernetes will try to support that number. But the administrator, hypervisors, or cloud platforms like Azure, GCP, and AWS can already be responsible for maintaining and scaling the number of nodes (individual physical servers or virtual machines as part of a pod), but not Kubernetes itself.
And suppose any of the infrastructure elements fail. In that case, Kubernetes automatically tries to solve the problem: for example, restart one pod or deployment so that the state of the system as a whole corresponds to the configurations loaded via the API and saved to the Controller Node. As a result, the application, whose components are divided into pods and containers, spins in the space of resources united in a Kubernetes cluster – in a kind of container cloud.
Advantages of Kubernetes
- Service discovery and load balancing. Containers can run on their IP addresses or use a common DNS name for the entire group. K8s can throttle and distribute network traffic to keep the deployment stable.
- Automatic storage management. The user can set which storage to use for deployment by default – internal, external cloud provider (GKE, Amazon EKS, AKS), or other options.
- Automatic implementation and rollback of changes. The user can add to the current container configuration on the fly. If this breaks the stability of the deployment, K8s will automatically roll back the changes to a stable working version.
- Automatic resource allocation. Kubernetes allocates space and RAM from a dedicated cluster of nodes to provide each container with everything it needs.
- Manage passwords and settings. K8s can serve as an application for securely processing sensitive information related to the operation of applications – passwords, OAuth tokens, and SSH keys. Data and settings can be updated without re-creating the container, depending on the application.
- Self-healing when a failure occurs. The system can quickly identify corrupted or unresponsive containers using specific metrics and tests. Failed containers are recreated and restarted on the same pod.
Kubernetes is a convenient container orchestration tool. However, this solution only works independently, with preparation and additional settings. For example, users must deal with database schema migrations or API backward compatibility issues.
Disadvantages of Kubernetes
- Safety. There are many different components in one technology, the security of which has to be closely monitored: external, at the level of clusters and nodes, and internal, at the level of specific images. Special AntiMalware solutions for Kubernetes have already appeared that scan everything that happens inside the container. Unlike containers, a virtual machine is completely abstracted from the host server machine that runs it: it has its operating system, so the risk of penetration into the server from a virtual machine is much less than from a container. You can solve this problem by scanning activity, encrypting network traffic, installing components for security, and solutions for authenticating containers and application components. This makes life difficult for Kubernetes administrators, but something needs to be done about it – it’s part of our job.
- Kubernetes is still not PaaS, so you also have to pack middleware into containers and dependencies for the application. But this task is also simplified due to the huge number of ready-made and pre-configured images for containers posted in public registries.
- The requirement for applications to be containerized. This limits opportunities for companies that need to develop applications themselves. Therefore, we often need to maintain a legacy infrastructure parallel to the main one. The gradual replacement of applications solves this problem: old applications are rewritten under new architecture models with support for new technologies for infrastructure, or there is a complete transition to SaaS-type applications in which there is no need to manage infrastructure.
- The complexity of administration. It is solved using cloud services. I do not recommend setting up Kubernetes independently because this requires very narrow specialists, and it isn’t easy to maintain a cluster independently.
Future of Kubernetes
According to CNCF, Kubernetes is currently the second-largest open-source project in the world after Linux. Since the advent of Kubernetes, it’s safe to say that almost all other orchestrators are either irrelevant or have faded into the background compared to Kubernetes. Typically, every major public cloud provider has a managed Kubernetes service or is in the process of developing one.
Kubernetes continues to grow rapidly. Initially focused primarily on basic container scheduling, it soon added additional capabilities to address production concerns such as security, stateful applications, cloud integration, and batch processing, to name a few.
As the platform matured, the rate of fundamental change slowed down. While improvements in scalability and accessibility will continue indefinitely, the underlying form of the platform has become fairly stable. The platform has become very extensible, and much of the interesting work in the future will be based on Kubernetes rather than “inside” Kubernetes itself. This is a sign of success. For many, Kubernetes will all but disappear or become taken for granted as essential plumbing.
Exciting Kubernetes-based systems for networking, serverless, IoT, and edge computing are currently being researched and built that leverage the scalability, flexibility, and efficiency of microservices/container-based architectures.
Conclusion
Kubernetes is the most advanced container orchestration tool available today. It allows automates not only the deployment process but also simplifies the further process of working with arrays of containers as much as possible.