Container storage is ephemeral
, it goes away when the container does.
Because of this, storage needs to be independent of the container in order to live beyond the container.
In Kubernetes, volumes
provide the abstraction to decouple storage from the pod’s containers.
When we attach a volume to a pod it provides a directory mounted inside the pod’s containers so that we can access files.
There are many different volume types.
emptyDir
The emptyDir volume type is the simplest. It is backed by the host currently running the pod. When the pod is scheduled on the node, it creates an empty directory on the node.
It is not completely permanent, if the pod is removed from the node and moved to another node, the data is deleted. However, it will persist beyond the life of the container.
It allows multiple containers within a pod to have both read and write access to the files in the emptyDir
volume.
apiVersion: v1
kind: Pod
metadata:
name: my-pod
spec:
containers:
- name: busybox1
image: busybox
command: [ "/bin/sh", "-c", "while true; do sleep 3600; done" ]
volumeMounts:
- mountPath: /data
name: data
- name: busybox2
image: busybox
command: [ "/bin/sh", "-c", "while true; do sleep 3600; done" ]
volumeMounts:
- mountPath: /data
name: data
volumes:
- name: data
emptyDir: { }
After creating the above pod in our cluster, we can validate that a file created with the first container can be accessed by the second container.
$ kubectl exec -it my-pod -c busybox1 -- /bin/sh
$ cd /data
$ echo "hello" > hello.txt
$ kubectl exec -it my-pod -c busybox2 -- /bin/sh
$ cat /data/hello.txt
hello
PersistentVolume and PersistentVolumeClaim
PersistentVolume
or PV
represents a storage resource. If a Node
in Kubernetes represents CPU and memory resources for a pod, a PersistentVolume
represents storage resource to a pod.
A PersistentVolumeClaim
or PVC
is an abstraction between the pod and the PV. When you create your pod you don’t need to worry about where the storage is located, how is implemented, only you need to specify the PVC
with the required storageClass
and accessModes
. (more about these later)
First create our local Kubernetes cluster. With kind is very easy:
$ kind delete cluster --name kubernetes-storage-demo
Now we are ready to create a PV
and consume that storage resource within a pod.
apiVersion: v1
kind: PersistentVolume
metadata:
name: mysql-pv
spec:
accessModes:
- ReadWriteOnce
capacity:
storage: 1Gi
storageClassName: localdisk
hostPath:
path: /mnt/data
The hostPath
will allocate storage on an actual node in the cluster where the pod is running.
After creating the PV
in our cluster, we can see that its status is Available
.
$ kubectl get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
mysql-pv 1Gi RWO Retain Available localdisk 3s
Next we create a PVC
.
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: mysql-pvc
spec:
accessModes:
- ReadWriteOnce
storageClassName: localdisk
resources:
requests:
storage: 500Mi
After creating the PVC
in our cluster, we can see that both the status of the PV
and PVC
is BOUND.
$ kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
mysql-pvc Bound mysql-pv 1Gi RWO localdisk 7s
$ kubectl get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
mysql-pv 1Gi RWO Retain Bound default/mysql-pvc localdisk 91s
Finally, we create a pod referencing the PVC
apiVersion: v1
kind: Pod
metadata:
name: mysql-pod
spec:
containers:
- name: mysql
image: mysql:5.6
ports:
- containerPort: 3306
env:
- name: MYSQL_ROOT_PASSWORD
value: password
volumeMounts:
- name: mysql-volume
mountPath: /var/lib/mysql
volumes:
- name: mysql-volume
persistentVolumeClaim:
claimName: mysql-pvc
After creating the pod in our cluster, we can verify that the storage was created on the Node. With our one node Kubernetes cluster is easy:
$ docker exec -it kubernetes-storage-demo-control-plane du -sh /mnt/data/mysql
7.0M /mnt/data/mysql
In order to delete our local 1 node Kubernetes cluster we can use the following command.
$ kind delete cluster --name kubernetes-storage-demo
gcePersistentDisk
Since pods come and go, they are scheduled on different nodes, we need a more robust solution for our data persistence. In this section we are going to use a GKE cluster and delegate the storage to a PersistentDisk
.
First lets create a GKE cluster:
$ gcloud container clusters create kubernetes-storage-demo-cluster
$ kubectl get nodes
gke-kubernetes-storage-d-default-pool-378cb5bc-5tq5 Ready <none> 7m14s v1.16.15-gke.6000
gke-kubernetes-storage-d-default-pool-378cb5bc-dq78 Ready <none> 7m14s v1.16.15-gke.6000
gke-kubernetes-storage-d-default-pool-378cb5bc-z1dm Ready <none> 7m14s v1.16.15-gke.6000
Next we create a GCE Persistent Disk:
$ gcloud compute disks create --size=10Gi --zone=europe-west6-a mongodb
We create a PV
using the previously created PersistentDisk
and reference it using gcePersistentDisk
.
apiVersion: v1
kind: PersistentVolume
metadata:
name: mongodb-pv
spec:
accessModes:
- ReadWriteOnce
capacity:
storage: 1Gi
storageClassName: ssd-disk
gcePersistentDisk:
pdName: mongodb
After this object is created in our cluster we have a 1Gi PersistentVolume
available in our cluster for consumption that will outlive the lifecycle of any pod that use it.
To consume the volume we create a PVC
.
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: mongodb-pvc
spec:
accessModes:
- ReadWriteOnce
storageClassName: ssd-disk
resources:
requests:
storage: 500Mi
Kubernetes will match our claim to an available PV
that meets our requirements (accessMode
, storageClassName
, resource.requests.storage
). For example if the claim is asking for more storage capacity than the PV
, it will not bind and it will sit there pending. Once a claim is bound, nothing else can claim the same PV
object unless we free it up.
Finally, we need to consume the PVC
in our pod:
apiVersion: v1
kind: Pod
metadata:
name: mongodb
spec:
containers:
- name: mongo
image: mongo
ports:
- containerPort: 27017
volumeMounts:
- name: mongodb-persitent-storage
mountPath: /data/db
volumes:
- name: mongodb-persitent-storage
persistentVolumeClaim:
claimName: mongodb-pvc
After the pod is created on the cluster, let’s create some data, for example a document using the mongodb shell:
$ kubectl exec -it mongodb -- mongo
> use mydb
switched to db mydb
> db.customers.insert({name:'John Doe'})
WriteResult({ "nInserted" : 1 })
> db.customers.find()
{ "_id" : ObjectId("6002be707b7ff91b50cb9fe7"), "name" : "John Doe" }
We verify on which node is the pod is running:
$ kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
mongodb 1/1 Running 0 3m11s 10.100.1.8 gke-kubernetes-storage-d-default-pool-378cb5bc-dq78 <none> <none>
Next we would like to see if the pod is scheduled on another node, we can still access the data. For this first we mark the node as unschedulable
using
$ kubectl cordon gke-kubernetes-storage-d-default-pool-378cb5bc-dq78
Then we delete and recreate the pod.
$ kubectl delete -f mongodb-pod.yaml
$ kubectl create -f mongodb-pod.yaml
Next, we verify that the pod is running on a different node now:
$ kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
mongodb 1/1 Running 0 52s 10.100.0.3 gke-kubernetes-storage-d-default-pool-378cb5bc-5tq5 <none> <none>
Finally, we see that we still have our data:
$ kubectl exec -it mongodb -- mongo
> use mydb
switched to db mydb
> db.customers.find()
{ "_id" : ObjectId("600361e62ba042906cb98af1"), "name" : "John Doe" }
AccessModes
There are 3 ways a pod can access a volume. Not all volumes support all these access modes, generally block based volumes support ReadWriteOnce
and ReadOnlyMany
, and file based volumes can support even ReadWriteMany
access mode. The PersistentVolume
in GCP does not support for example ReadWriteMany
access mode. A single volume can only be opened in one mode at a time.
ReadWriteOnce
(RWO) - can only be mounted as read-write by one pod in the cluster.ReadOnlyMany
(ROM) - many pods can mount it, but only in read only mode.ReadWriteMany
(RWM) - many pods can mount it in read write mode
Worth to note that all replicas included in a Deployment
will share the same PVC
. The only way we can support this is by keeping the volume in ReadOnlyMany
access mode. Even if there is a single pod in our deployment with ReadWriteOnce
can be problematic depending on the rollout strategy to update a deployment. An updated replica set is not going to be able to mount a volume in read-write mode while the previous replica set still exists.
Reclaim Policy
A reclaim policy specifies what happens when a claim on a volume is released. It can be Delete
(default) or Retain
. In case of Delete
when the claim is released the data is removed, however with Retain
when we can keep the volume and its content, when the claim is released, or even when the pod is failed or deleted.
StorageClasses
In the previous example we needed first to create manually the PersistentDisk
in and create also the PV
in order to use reference it in our PVC
. Luckily storage classes make this way more dynamic.
StorageClasses
enable dynamic provisioning of volumes. Like everything in Kubernetes, a storage class is an API resource defined in a YAML file:
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: fast
provisioner: kubernetes.io/gce-pd
parameters:
type: pd-ssd
The provisioner describers what volume plugin is used for provisioning PVs. In the above example the storage class will provision a zonal sdd disk with ext4 filesystem type.
Our GKE cluster has already a default storage class:
$ kubectl get sc
NAME PROVISIONER AGE
standard (default) kubernetes.io/gce-pd 91s
which is using non sdd disk
$ kubectl describe sc standard
Name: standard
IsDefaultClass: Yes
Annotations: storageclass.kubernetes.io/is-default-class=true
Provisioner: kubernetes.io/gce-pd
Parameters: type=pd-standard
AllowVolumeExpansion: True
MountOptions: <none>
ReclaimPolicy: Delete
VolumeBindingMode: Immediate
Events: <none>
After creating the fast
StorageClass in our GKE cluster let’s use it in PVC:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: mongodb-pvc
spec:
accessModes:
- ReadWriteOnce
storageClassName: fast
resources:
requests:
storage: 1Gi
After creating the mongodb-pvc
we can see the that the status is BOUND
as opposed to AVAILABLE
and is using our fast
storage class.
$ kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
mongodb-pvc Bound pvc-c0719ec2-9fbd-4d56-8650-7bba6890fabc 1Gi RWO fast 3s
Behind the scene we can see that both the PersistentVolume
and the underlying GCP Persistent Disk
were created:
$ kubectl get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
pvc-c0719ec2-9fbd-4d56-8650-7bba6890fabc 1Gi RWO Delete Bound default/mongodb-pvc fast 4m30s
$ gcloud compute disks list
gke-kubernetes-storage-pvc-c0719ec2-9fbd-4d56-8650-7bba6890fabc europe-west6-a zone 1 pd-ssd READY
Storage can be a pretty dry topic, but hopefully with these examples (which you can also try) it is easier to understand. All the examples used in this blog post can be found in my github repo.