How Storage Works in a Kubernetes Container Cluster

Organizations worldwide are undergoing a shift from traditional, legacy applications to a more modern approach, including cloud-native microservices. The technology making this shift possible is containers. Kubernetes is the defacto standard for container orchestration and is helping to drive the significant push to containerized infrastructure.

One of the struggles that many find as they move to containerized workloads is understanding the differences in container architecture compared to virtual machines. Let’s look at Kubernetes container storage and detail the differences between ephemeral and persistent storage, use cases, and implementation.

What is Kubernetes?

Before diving into the storage concepts mentioned, let’s take a step back and understand Kubernetes, why it is essential to containerized workloads, and how businesses are using it today. Kubernetes is an open-source platform developed by Google to manage and orchestrate container workloads.

It also allows businesses to automate and configure their containerized workloads using a declarative model. It means your automation or configuration management code declares how the workloads should look and brings the environment in line with this desired state.

Kubernetes helps to solve the problem of making containers highly available. Going back in time, as organizations started to move to use containers, there was no mechanism to ensure a specific container was highly available if the container host failed. Similar to how hypervisors today protect virtual machine workloads in a hypervisor cluster, containers need the same type of protection preventing a single point of failure.

While containers are lightweight and easy to spin up on another container host, manual processes do not scale well if you are running hundreds or even thousands of containers. These features are where Kubernetes comes into play. Kubernetes provides a solution that allows managing containerized workloads seamlessly. It automatically scales and fails over workloads for your application.

Note the following benefits:

Built-in load balancing – If traffic to a container is high, Kubernetes can automatically balance the load to distribute the network traffic to stabilize the workloads.
Storage management and orchestration – Kubernetes can automatically mount storage for your containerized workloads, including public cloud and local storage
Implement desired state in your containerized infrastructure – You can determine the desired state of your Kubernetes infrastructure, including creating new containers for a Kubernetes deployment, removing existing containers, and adopting resources to the new containers
Automatic bin packaging – Tell Kubernetes the amount of CPU and memory resources to allocate, and it will fit containers in nodes to make the best use of resources
Automatic self-healing – If a container begins to fail or have issues, Kubernetes can automatically restart containers and even replace containers if needed
Built-in secrets management – You can store and manage sensitive information, including passwords, OAuth tokens, and SSH keys. You can also deploy and update secrets in your application configuration without rebuilding container images

Bringing our focus back to storage, Kubernetes provides storage management and orchestration, as listed above. First, let’s consider how it works with two different types of storage – ephemeral and persistent storage and the use cases for both.

Kubernetes ephemeral storage

Containers, in general, are ephemeral. It means they can be provisioned and destroyed quickly and with ease. Unless explicitly configured otherwise, container storage is ephemeral also, and data does not persist once the container is stopped or deleted. What are some examples of ephemeral storage with Kubernetes?

Kubernetes ephemeral storage volumes:

EmptyDir volume – The emptyDir storage comes from a kubelet base directory provisioned in the root disk or RAM
configMap, downwardAPI, secret – These storage types are used to inject data into Kubernetes Pods
CSI ephemeral volumes – These are provided by special CSI drivers provided by third-party storage drivers
Generic ephemeral volumes – These can also be provided by third-party providers but can be anything that supports dynamic provisioning

Below is an example of creating a Kubernetes Pod with generic ephemeral volumes:

kind: Pod

apiVersion: v1

metadata:

name: my-new-app

spec:

containers:

- name: my-web

image: busybox

volumeMounts:

- mountPath: "/data"

name: my-web-inline-vol

command: [ "sleep", "1000000" ]

volumes:

- name: my-web-inline-vol

csi:

driver: inline.storage.kubernetes.io

volumeAttributes:

foo: info

kind: Pod

apiVersion: v1

metadata:

spec:

containers:

- name: my-web

image: busybox

volumeMounts:

- mountPath: "/data"

command: [ "sleep", "1000000" ]

volumes:

- name: my-web-inline-vol

csi:

driver: inline.storage.kubernetes.io

volumeAttributes:

foo: info

When it comes to your applications, the ephemeral volumes are the easiest to implement since you don’t have to worry about their persistence or losing data. Their whole purpose is to provide a location for data to live temporarily. When the container is stopped or deleted, the data is gone. However, what if you have an application that needs to write data to a location that does not go away and needs to persist?

Persistent volumes

Suppose you have a frontend application that writes data to a MySQL database. You decide to spin up a frontend Nginx container and a backend MySQL container. You create new MySQL users and tables in the MySQL database running in the MySQL container. However, to your surprise, when your containers restart, all the changes to your MySQL database are gone. This is because the Kubernetes storage provided to containers by default depends on the container lifecycle. Ideally, we need to have persistent storage that doesn’t go away when the Pod is stopped or deleted in this scenario. If this is the case, a new Pod can be created and attached to the existing storage left from the previous container.

Kubernetes, by default, does not give you the data persistence you need for these types of use cases. It is essential to point out with persistent storage that it must be available from each Kubernetes host. In our example above, if Kubernetes schedules a new MySQL Pod, it might be scheduled on a different host than the previous MySQL Pod. If persistent storage exists but is only available on one of the Kubernetes worker hosts, it can prevent a newly scheduled container from accessing the persistent storage.

Most are familiar with the way traditional “shared storage” works in a virtualization cluster with a hypervisor. If shared storage is not available to one of the hosts, the host will not be able to run the VM located in the shared storage since it would be inaccessible to the host. Similarly, the Kubernetes hosts will all need to access the persistent storage to ensure Pods can be scheduled across all the nodes in the cluster.

Below, Kubernetes pods are using local storage that is only available on each Kubernetes host.

Much like shared storage presented to a hypervisor, management of the shared storage may happen outside of the realm of the hypervisor. For example, a storage area network (SAN) device may have its own management and provisioning outside the virtual environment.

PersistentVolume plugins

PersistentVolume types in Kubernetes operate like “plugins” to the cluster. The following Persistent Volume plugins are supported for use in Kubernetes:

awsElasticBlockStore – AWS Elastic Block Store (EBS) configured as part of AWS Storage services
azureDisk – Azure Disk storage
azureFile – Azure File storage
cephfs – CephFS volume storage
csi – Container Storage Interface (CSI) – Allows third-party storage plugin interoperability
fc – Fibre Channel (FC) storage provides PV storage using Fibre Channel
gcePersistentDisk – GCE Persistent Disk in Google Compute Cloud
glusterfs – Glusterfs volume storage
hostPath – HostPath volume (for single node testing only). ***Note*** This will not work in a multi-node cluster; consider using local volume instead)
iscsi – iSCSI (SCSI over IP) storage that works with most SAN devices
local – local storage devices mounted on nodes.
nfs – Network File System (NFS) storage from SAN or NAS devices
portworxVolume – Portworx volume
rbd – Rados Block Device (RBD) volume
vsphereVolume – vSphere VMDK volume storage from vSphere VMDKs

PersistentVolumeClaim (PVC)

Persistent Volumes in Kubernetes are allocated using a PersistentVolumeClaim (PVC). A PVC is the request for storage from a Persistent Volume to be mounted and used by a Pod. The PersistentVolumeClaims allow users to allocate storage for a Pod without knowing the exact details of the underlying storage. The PVC serves as an abstraction of these underlying complexities.

Below, three Kubernetes hosts run MySQL Pods that have access to the same persistent volume.

Allocating a PersistentVolume using the PersistentVolumeClaim

Creating a PersistentVolume

What does a PersistentVolume mount look like in the YAML configuration? It must contain a spec and status.

apiVersion: v1

kind: PersistentVolume

metadata:

name: pv2022

spec:

capacity:

storage: 4Gi

volumeMode: Filesystem

accessModes:

- ReadWriteOnce

persistentVolumeReclaimPolicy: Recycle

storageClassName: slow

mountOptions:

- hard

- nfsvers=4.1

nfs:

path: /tmp

server: 192.168.60.30

apiVersion: v1

kind: PersistentVolume

metadata:

spec:

capacity:

storage: 4Gi

volumeMode: Filesystem

accessModes:

- ReadWriteOnce

persistentVolumeReclaimPolicy: Recycle

storageClassName: slow

mountOptions:

- hard

- nfsvers=4.1

nfs:

path: /tmp

server: 192.168.60.30

Persistent volume states

A PersistentVolume can exist in a few different states, including:

Available – The PersistentVolume available and ready to be used in the cluster
Bound – The PersistentVolume is currently assigned and claimed by a PVC
Released – The PVC has been removed but has not yet been reclaimed by a cluster resource
Failed – The PV is in an error state

Wrapping Up

Organizations today are pivoting application architecture away from traditional monolithic applications to cloud-native microservices. Kubernetes is the driving technology enabling businesses to rearchitect applications and utilize containers in their new microservices architectures. In addition, Kubernetes allows managing and orchestrating containers in an intelligent, scalable, and highly available way.

One of the challenges encountered with the shift from traditional, stateful applications to cloud-native architectures is understanding how storage works within a Kubernetes cluster operating Pods. Kubernetes can make use of both ephemeral and persistent storage. It is essential to understand that, by default, Kubernetes does not natively provision and manage persistent storage.

However, organizations can create persistent storage and create PersistentVolumes (PV) in Kubernetes that are claimed using a PersistentVolumeClaim (PVC). By using Persistent Volumes with Kubernetes, organizations can create robust applications with persistent data, using cloud-native containerized microservices on the frontend.

Kubernetes persistent vs ephemeral storage volumes and their uses