Since the release of the vSphere CSI driver in vSphere 6.7U3, I have had a number of requests about how we plan to migrate applications between Kubernetes clusters that are using the original in-tree vSphere Cloud Provider (VCP) and Kubernetes clusters that are built with the new vSphere CSI driver. All I can say at this point in time is that we are looking at ways to seamlessly achieve this at some point in the future, and that the Kubernetes community has a migration design in the works to move from in-tree providers to the new CSI driver as well.
However, I had also seen some demonstrations from the Velero team on how to use Velero for application mobility. I wanted to see if Velero could also provide us with an interim solution to move applications with persistent storage between a K8s cluster running on vSphere using the in-tree VCP and a native K8s cluster that uses the vSphere CSI driver.
Note that this method requires downtime to move the application between clusters, so the application will be offline for part of this exercise.
It should also be noted the Cassandra application that I used for demonstration purposes was idle at the time of backup (no active I/O), so that should also be taken into account.
tl;dr Yes – we can use Velero for such a scenario, in the understanding that (a) you will need resources to setup the new CSI cluster and (b) there is no seamless migration, and that the application will need to be shutdown on the VCP cluster and restarted on the CSI cluster. Here are the detailed steps.
External S3 object store
The first step is to setup an external S3 object store that can be reached by both clusters. Velero stores both metadata and (in the case of vSphere backups using restic) data in the S3 object store. In my example, I am using MinIO as I have had the most experience with that product. I have a post on how to set this up on vSphere if you want to learn more. In my lab, my VCP K8s cluster is on VLAN 50, and my CSI K8s cluster is on VLAN 51. Thus, for the CSI cluster to access the MinIO S3 object store, and thus the backup taken from the VCP cluster, I will need to re-IP my MinIO VMs to make the backup visible to the CSI cluster. More detail on that later.
VCP StorageClass
Before going any further, it is probably of interest to see how the VCP driver is currently being used. The reference to the provider/driver is placed in the StorageClass. Here is the StorageClass being used by the Cassandra application in the VCP cluster, which we will shortly be backing up and moving to a new cluster.
$ kubectl get sc NAME PROVISIONER AGE cass-sc kubernetes.io/vsphere-volume 64d $ cat cassandra-sc-vcp.yaml kind: StorageClass apiVersion: storage.k8s.io/v1 metadata: name: cass-sc provisioner: kubernetes.io/vsphere-volume parameters: diskformat: thin storagePolicyName: raid-1 datastore: vsanDatastore
Deploy Velero on VCP K8s cluster
At this point, the S3 object store is available on IP 192.50.0.20. It is reachable via port 9000. Thus when I deploy Velero, I have to specify this address:port combination in the s3Url and publicUrl as follows:
$ velero install --provider aws
--bucket velero \
--secret-file ./credentials-velero \
--use-volume-snapshots=false \
--use-restic
--backup-location-config region=minio,s3ForcePathStyle="true",\
s3Url=http://192.50.0.20:9000,publicUrl=http://192.50.0.20:9000
Velero is installed! ⛵ Use 'kubectl logs deployment/velero -n velero' to view the status.
Prepping the Stateful Application
For the purposes of this test, I deployed a Cassandra stateful set with 3 replicas on my VCP cluster. I also populated it with some data so that we can verify that it gets successfully restored on the CSI cluster.
$ kubectl exec -it cassandra-0 -n cassandra -- nodetool status Datacenter: DC1-K8Demo ====================== Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns (effective) Host ID Rack UN 172.16.5.2 104.38 KiB 32 66.6% 8fd5fda2-d236-4f8f-85c4-2c57eab06417 Rack1-K8Demo UN 172.16.5.3 100.05 KiB 32 65.9% 6ebf73bb-0541-4381-b232-7f277186e2d3 Rack1-K8Demo UN 172.16.5.4 75.93 KiB 32 67.5% 0f5387f9-149c-416d-b1b6-42b71059c2fa Rack1-K8Demo $ kubectl exec -it cassandra-0 -n cassandra -- cqlsh Connected to K8Demo at 127.0.0.1:9042. [cqlsh 5.0.1 | Cassandra 3.9 | CQL spec 3.4.2 | Native protocol v4] Use HELP for help. cqlsh> CREATE KEYSPACE demodb WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 3 }; cqlsh> use demodb; cqlsh:demodb> CREATE TABLE emp(emp_id int PRIMARY KEY, emp_name text, emp_city text, emp_sal varint,emp_phone varint); cqlsh:demodb> INSERT INTO emp (emp_id, emp_name, emp_city, emp_phone, emp_sal) VALUES (100, 'Cormac', 'Cork', 999, 1000000); cqlsh:demodb> select * from emp; emp_id | emp_city | emp_name | emp_phone | emp_sal --------+----------+----------+-----------+--------- 100 | Cork | Cormac | 999 | 1000000 (1 rows) cqlsh:demodb>
Backup Cassandra using Velero
We are now ready to backup Cassandra. The first part is to annotate the volumes so that restic knows that it needs to copy the contents of these volumes as part of the backup process.
$ kubectl -n cassandra annotate pod/cassandra-2 \ backup.velero.io/backup-volumes=cassandra-data pod/cassandra-2 annotated $ kubectl -n cassandra annotate pod/cassandra-1 \ backup.velero.io/backup-volumes=cassandra-data pod/cassandra-1 annotated $ kubectl -n cassandra annotate pod/cassandra-0 \ backup.velero.io/backup-volumes=cassandra-data pod/cassandra-0 annotated $ velero backup create cassandra-pks-1010 --include-namespaces cassandra Backup request "cassandra-pks-1010" submitted successfully. Run `velero backup describe cassandra-pks-1010` or `velero backup logs cassandra-pks-1010` for more details.
Restic Backups: Completed: cassandra/cassandra-0: cassandra-data cassandra/cassandra-1: cassandra-data cassandra/cassandra-2: cassandra-data
$ velero backup get NAME STATUS CREATED EXPIRES STORAGE LOCATION SELECTOR cassandra-pks-1010 Completed 2019-10-10 10:38:32 +0100 IST 29d default <none> nginx-backup Completed 2019-10-01 15:12:20 +0100 IST 21d default app=nginx
At this point, you may want to double-check that the application was backed up successfully by attempting to restore it to the same VCP cluster that it was backed up from. I am going to skip such a step here and move straight onto the restore part of the process.
Switch contexts to the Kubernetes CSI cluster
$ kubectl config get-contexts CURRENT NAME CLUSTER AUTHINFO NAMESPACE * cork8s-cluster-01 cork8s-cluster-01 d8ab6b15-f7d7-4d20-aefe-5dfe3ecbf63b cork8s-csi-01 kubernetes kubernetes-admin $ kubectl get nodes -o wide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME 140ab5aa-0159-4612-b68c-df39dbea2245 Ready <none> 68d v1.13.5 192.168.192.5 192.168.192.5 Ubuntu 16.04.6 LTS 4.15.0-46-generic docker://18.6.3 ebbb4c31-375b-4b17-840d-db0586dd948b Ready <none> 68d v1.13.5 192.168.192.4 192.168.192.4 Ubuntu 16.04.6 LTS 4.15.0-46-generic docker://18.6.3 fd8f9036-189f-447c-bbac-71a9fea519c0 Ready <none> 68d v1.13.5 192.168.192.3 192.168.192.3 Ubuntu 16.04.6 LTS 4.15.0-46-generic docker://18.6.3 $ kubectl config use-context cork8s-csi-01 Switched to context "cork8s-csi-01". $ kubectl get nodes -o wide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME k8s-master Ready master 51d v1.14.2 10.27.51.39 10.27.51.39 Ubuntu 18.04.3 LTS 4.15.0-58-generic docker://18.6.0 k8s-worker1 Ready <none> 51d v1.14.2 10.27.51.40 10.27.51.40 Ubuntu 18.04.3 LTS 4.15.0-58-generic docker://18.6.0 k8s-worker2 Ready <none> 51d v1.14.2 10.27.51.41 10.27.51.41 Ubuntu 18.04.3 LTS 4.15.0-58-generic docker://18.6.0
OK – at this point, I am now working with my CSI cluster. I now need to re-IP my MinIO S3 object store so that it is visible on the same VLAN as my CSI cluster. Once I can see the cluster on the new VLAN, I can now install Velero on the CSI cluster, and point it to the external S3 object store. The install command will be identical to the install on the VCP cluster, apart from the s3Url and publicUrl entries.
$ velero install --provider aws \
--bucket velero \
--secret-file ./credentials-velero \
--use-volume-snapshots=false \
--use-restic \
--backup-location-config region=minio,s3ForcePathStyle="true",\
s3Url=http://10.27.51.49:9000,publicUrl=http://10.27.51.49:9000
Again, as before, a successful install should result in an output as follows:
Velero is installed! ⛵ Use 'kubectl logs deployment/velero -n velero' to view the status.
Assuming the MinIO object store is setup correctly and is accessible to the CSI cluster, a velero backup get should show the backups taken on the VCP cluster, including our Cassandra backup.
$ velero backup get NAME STATUS CREATED EXPIRES STORAGE LOCATION SELECTOR cassandra-pks-1010 Completed 2019-10-10 10:38:32 +0100 IST 29d default <none> nginx-backup Completed 2019-10-01 15:12:20 +0100 IST 21d default app=nginx
You can also run the velero backup describe, velero backup describe –details commands that we saw earlier, ensuring that all the necessary components of the Cassandra application have been captured and are available for restore.
Restore Stateful App on the Kubernetes CSI Cluster
The first step is to make sure that there is a StorageClass which is the same name (cass-sc) as the StorageClass used on the VCP cluster. However in this CSI cluster, the StorageClass needs to reference the CSI driver rather than the VCP driver that we saw earlier.
$ kubectl get sc NAME PROVISIONER AGE $ cat cassandra-sc-csi.yaml kind: StorageClass apiVersion: storage.k8s.io/v1 metadata: name: cass-sc annotations: storageclass.kubernetes.io/is-default-class:"false" provisioner: csi.vsphere.vmware.com parameters: storagepolicyname:"Space-Efficient" $ kubectl apply -f cassandra-sc-csi.yaml storageclass.storage.k8s.io/cass-sc created $ kubectl get sc NAME PROVISIONER AGE cass-sc csi.vsphere.vmware.com 3s
A restore command is quite simple – the only notable point is to specify which backup to restore. Here is the command used to restore the cassandra-pks-1010 backup on the Kubernetes CSI Cluster:
$ velero create restore cassandra --from-backup cassandra-pks-1010 Restore request "cassandra" submitted successfully. Run `velero restore describe cassandra` or `velero restore logs cassandra` for more details.
$ velero restore describe cassandra --details Name: cassandra Namespace: velero Labels: <none> Annotations: <none> Phase: Completed Backup: cassandra-pks-1010 Namespaces: Included: * Excluded: <none> Resources: Included: * Excluded: nodes, events, events.events.k8s.io, backups.velero.io, restores.velero.io, resticrepositories.velero.io Cluster-scoped: auto Namespace mappings: <none> Label selector: <none> Restore PVs: auto Restic Restores: Completed: cassandra/cassandra-0: cassandra-data cassandra/cassandra-1: cassandra-data cassandra/cassandra-2: cassandra-data
New cluster cannot access the image repository
This step may not be necessary, but you may have a situation where the new CSI cluster is unable to access the image repository of the original VCP cluster. This might happen if the original K8s cluster’s image repository (e.g. Harbor) is on a network that cannot be accessed by the new CSI cluster. If that is the case, the Cassandra application objects will restore, but the Pods will never come online due to ‘image pull’ errors. To resolve this issue, you can use kubectl to edit the Pods, and change the location of where to find the Cassandra image.
For example, let’s say that my original VCP cluster did have access to the internal Harbor repo for the images and that my new CSI cluster does not have access to the Harbor repository, but my new CSI cluster does have access to the outside world. Thus, I may want to edit my Pod image location from the internal Harbor to an external repo, e.g. from:
image: harbor.rainpole.com/library/cassandra:v11
to:
image: gcr.io/google-samples/cassandra:v11
To achieve this, just edit each of the Pods as follows, and make the changes to the image location:
$ kubectl edit pod cassandra-0 -n cassandra pod/cassandra-0 edited $ kubectl edit pod cassandra-1 -n cassandra pod/cassandra-1 edited $ kubectl edit pod cassandra-2 -n cassandra pod/cassandra-2 edited
Verify that the restore is successful
Apart from verifying that the Pods, PVCs, PV, Service and StatefulSet have been restored, we should now go ahead and check the contents of the Cassandra database once it has been stood up on the CSI cluster. Let’s look at the node status first, and note that the Casandra nodes have a new range of IP addresses.
$ kubectl exec -it cassandra-0 -n cassandra -- nodetool status Datacenter: DC1-K8Demo ====================== Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns (effective) Host ID Rack UN 10.244.2.61 192.73 KiB 32 100.0% 79820874-edec-4254-b048-eaceac0ec6c8 Rack1-K8Demo UN 10.244.2.62 157.17 KiB 32 100.0% ea0e8ef2-aad2-47ee-ab68-14a3094da5be Rack1-K8Demo UN 10.244.1.58 139.97 KiB 32 100.0% 110d3212-526b-4a58-8005-ecff802d7c20 Rack1-K8Demo
$ kubectl exec -it cassandra-0 -n cassandra -- cqlsh Connected to K8Demo at 127.0.0.1:9042. [cqlsh 5.0.1 | Cassandra 3.9 | CQL spec 3.4.2 | Native protocol v4] Use HELP for help. cqlsh> use demodb; cqlsh:demodb> select * from emp; emp_id | emp_city | emp_name | emp_phone | emp_sal --------+----------+----------+-----------+--------- 100 | Cork | Cormac | 999 | 1000000 (1 rows) cqlsh:demodb>
Nice. It looks like we have successfully moved the stateful application (Cassandra) from a K8s cluster using the original VCP driver to a K8s cluster that is using the new vSphere CSI driver. One last point in case your were wondering – yes, it also works in the other direction so you can also move stateful applications from a Kubernetes cluster using CSI on vSphere to a cluster using the VCP.
The post Moving a Stateful App from VCP to CSI based Kubernetes cluster using Velero appeared first on CormacHogan.com.