Quantcast
Channel: Containers Archives - CormacHogan.com
Viewing all 78 articles
Browse latest View live

Project Hatchway hitting the mainstream – persistent storage for containers

$
0
0

Regular readers will be aware that I “dabble” from time to time in the world of Cloud Native Apps. For me, a lot of this dabbling is trying to figure out how I can go about providing persistent storage to container based applications. Typically this in the shape of container volumes that are carved out of the underlying storage infrastructure, whether that is VMFS, NFS, vSAN or even Virtual Volumes. VMware Project Hatchway has enabled me to do this on multiple occasions. Project Hatchway was officially announced at VMworld 2017, but I’ve been working with this team since the early days of the “docker volume driver for vSphere”. Project Hatchway now includes the vSphere Docker Volume Service (vDVS) plugin AND a vSphere Cloud Provider (vCP) to provide container volumes for Kubernetes. With Kubecon17 in Austin this week, there have been some exciting news on the Project Hatchway front that I want to share with you in this post.

The first piece of news is that Project Hatchway is now included in the Container Storage Interface (CSI), an initiative from the {code} team over at DELL Tech. This means that using CSI, you will not have to understand the intricacies of each of the different underlying storage systems in order to provision persistent container storage. CSI sits between the storage provider and the orchestration, and using a common set of commands, allows you to provision persistent container storage in a uniform manner each and every time, no matter the underlying storage infrastructure. There is a good write-up on CSI, as well as some details about the Project Hatchway (vSphere) inclusion.

The next piece of news comes from the Pivotal Container Service (PKS) initiative. This was also an announcement from VMworld 2017, which talks about how using Pivotal’s BOSH, we now have a very simple mechanism for deploying Kubernetes on vSphere. Project Hatchway’s VCP is now part of PKS, and its role is once again to provide persistent storage to containers that are instantiated and managed by Kubernetes. This integration includes the ability to select and set policies for container volumes that are deployed on storage such as VMware vSAN.  Kubernetes supports a wide range of storage types, such as Persistent Volumes, Storage Classes and Stateful Sets. I’ve not delved in too deep into PKS yet, so I’m not 100% sure if it supports all of these different storage types. I don’t see why not, but something I need to research a bit more.

I mentioned already that Project Hatchway provides persistent storage for Kubernetes containers using VCP, the vSphere Cloud Provider component. I also previously said that this is now being used by PKS for container volumes. However, it is important to understand that you can still use VCP outside of PKS too, for example if you are running Kubernetes directly on top of vSphere. One of the main feature requests, according to Tushar Thole (engineering manager for Project Hatchway) is the ability to support Kubernetes clusters that span across multiple vCenters and datacenters. VCP now supports this.

The vSphere Docker Volume Service has been the stalwart of Project Hatchway, and has been around for at least 18 months now, with constant improvements all the time. One of those enhancements was changing vDVS to a docker plugin, and it also became one of the first docker plugins to support Windows as a Container Host/OS. I’ve used vDVS regularly to provide persistent storage for my containers, such as my “Project Harbor” registry and most recently for my testing of S3 object stores on top of vSAN. However Tushar informed me recently that the vDVS can now also be used with Virtual Volumes (VVols), which is very interesting and something I need to take a closer look at soon as well.

So as you can see, there is a lot of stuff going on around Project Hatchway. However it would be remiss of me not to mention vFile. vFile provides a way of doing shared file storage for cloud-native applications. This is still in its experimental phase, but it getting a lot of interest from those folks who have been putting it through its paces.

It’s great to see the traction that Project Hatchway is now getting. If this is something you are interested in getting started with, check it out on github here. And the engineering team are always looking for feedback, so feel free to leave feedback either here on this post or on github.

The post Project Hatchway hitting the mainstream – persistent storage for containers appeared first on CormacHogan.com.


Building a Docker Swarm with Photon OS

$
0
0

I’ve decided to take a look at our new vFile docker volume plugin. If you haven’t heard, vFile volume plugin for Docker provides simultaneous persistent volume access between hosts in the same Docker Swarm cluster for the base volume plugin service such as VDVS [vSphere Docker Volume Service], with zero configuration effort, along with high availability, scalability, and load balancing support. As you can see, this has a requirement on Docker Swarm. Since I hadn’t set this up in a while, I decided to set it up on a recent release of Photon OS, but ran into a small issue.

I’m using the following builds of Photon OS using photon-custom-hw11-2.0-31bb961.ova. If I check the /etc/os-release file, I see the following:

root@photon-machine [ ~ ]# cat /etc/os-release
NAME="VMware Photon OS"
VERSION="2.0"
ID=photon
VERSION_ID=2.0
PRETTY_NAME="VMware Photon OS/Linux"
ANSI_COLOR="1;34"
HOME_URL="https://vmware.github.io/photon/"
BUG_REPORT_URL="https://github.com/vmware/photon/issues"

I am also using quite a recent version of docker:

root@photon-machine [ ~ ]# docker version
Client:
 Version: 17.06.0-ce
 API version: 1.30
 Go version: go1.8.1
 Git commit: 02c1d87
 Built: Fri Sep 29 05:57:21 2017
 OS/Arch: linux/amd64

Server:
 Version: 17.06.0-ce
 API version: 1.30 (minimum version 1.12)
 Go version: go1.8.1
 Git commit: 02c1d87
 Built: Fri Sep 29 05:58:18 2017
 OS/Arch: linux/amd64
 Experimental: false
root@photon-machine [ ~ ]#

To create a Docker Swarm, I need to first initialize one node as my master and join other nodes as workers. The command to create a master is as follows:

root@photon-machine [ ~ ]# docker swarm init
Swarm initialized: current node (1nmqf02m5mkv4yh3ecjqsjjs6) is now a manager.

To add a worker to this swarm, run the following command:

docker swarm join --token SWMTKN-1-1dg2jdht61fxtehb906xyhdh1rubl7n46ffbyh1b5uj8t24kfv-2veb1hbc5v8l097jbi3ufle4a 10.27.51.47:2377

To add a manager to this swarm, run 'docker swarm join-token manager' and follow the instructions.

root@photon-machine [ ~ ]#

That seems pretty straight forward. Now, if I login to my worker VM, I should be able to join it as a worker using the command above.

root@photon-worker [ ~ ]# docker swarm join --token SWMTKN-1-1dg2jdht61fxtehb906xyhdh1rubl7n46ffbyh1b5uj8t24kfv-2veb1hbc5v8l097jbi3ufle4a 10.27.51.47:2377
Error response from daemon: Timeout was reached before node was joined. The attempt to join the swarm will continue in the background. Use the "docker info" command to see the crrent swarm status of your node.

I eventually traced this to a firewall port issue. I simply needed to open port 2377 on the master to allow the slave to connect.

root@photon-machine [ ~ ]# iptables -A INPUT -p tcp --dport 2377 -j ACCEPT

Now I can successfully join the worker to the master:

root@photon-worker [ ~ ]# docker swarm join --token SWMTKN-1-4hyqxyt8z15lhdoyc51jqb2i4ctnv0u76m7sqw8msmgi04816b-7kurar4w68v7p4zym73ew8rp0 10.27.51.47:2377
This node joined a swarm as a worker.

We can run a docker info command to check the status of the swarm (this output from worker):

root@photon-worker [ ~ ]# docker info
Containers: 0
 Running: 0
 Paused: 0
 Stopped: 0
Images: 13
Server Version: 17.06.0-ce
Storage Driver: overlay2
 Backing Filesystem: extfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: active
 NodeID: rikcngnbtuerovom8z13ghlk0
 Is Manager: false
 Node Address: 10.27.51.17
 Manager Addresses:
 10.27.51.47:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: cfb82a876ecc11b5ca0977d1733adbe58599088a
runc version: 2d41c047c83e09a6d61d464906feb2a2f3c52aa4
init version: 949e6fa
Security Options:
 seccomp
 Profile: default
Kernel Version: 4.9.60-1.ph2-esx
Operating System: VMware Photon OS/Linux
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 7.792GiB
Name: photon-machine
ID: R7DL:MSZ4:MCAE:SKFS:2HN3:ZZOV:2TJC:T757:H5DM:DRWV:QC6P:YE2R
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
No Proxy: 10.27.51.47
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

To prove that Docker is working in Swarm mode, we can launch a service with just one replica instance.

root@photon-machine [ ~ ]# docker service create --replicas 1 --name helloworld alpine ping docker.com

To check on the service, use:

root@photon-machine [ ~ ]# docker service ls
 ID           NAME       MODE       REPLICAS IMAGE        PORTS
 pnmztlolpl2u helloworld replicated 1/1      alpine:latest
 root@photon-machine [ ~ ]#

The container that provides the service can appear on both the master and the worker. Check with docker ps:

root@photon-worker [ ~ ]# docker ps
CONTAINER ID IMAGE         COMMAND           CREATED       STATUS                PORTS NAMES
38b72a221125 alpine:latest "ping docker.com" 5 seconds ago Up Less than a second       helloworld.1.uoth5a14e4tacx7l8pxr6jaax
root@photon-worker [ ~ ]#

Great – that is my Docker Swarm up and running. Now to take a closer look at vFile. Watch this space.

The post Building a Docker Swarm with Photon OS appeared first on CormacHogan.com.

Validating overlay network when docker swarm running on Centos VMs on vSphere

$
0
0

I got a chance to revisit my docker swarm deployment this week after a bit of a break. I was a little curious about my setup because when I spoke to some of our ‘Project Hatchway‘ engineers, I was told that I should be able to launch a single instance of Nginx in Docker Swarm (“docker service create –replicas 1 -p 8080:80 –name web nginx”) and I should be able to access the web service using the following command from any swarm node – “curl 127.0.0.1:8080”. This was not what I was seeing. When I launched the Nginx service, the curl command was successful on the container host where the service was running, but on every other host/node in the swarm cluster, I got a “Failed connect/connection refused”. So why wasn’t it working?

Eventually I traced it to yet another firewall issue on the container hosts/swarm nodes (using Centos 7). It seems that the overlay network needed some ports opened to work as well. These are the ports that I figured out needed to be opened on the firewall of my swarm nodes:

  • 7946/tcp – port for “control plane” discovery communication
  • 7946/udp – port for “control plane”  discovery communication
  • 4789/udp – port for “data plane” overlay network traffic

I used the following command on Centos 7 to modify the firewall:

[root@centos-swarm-master ~]# firewall-cmd --zone=public --add-port=7946/tcp --permanent
[root@centos-swarm-master ~]# firewall-cmd --zone=public --add-port=7946/udp --permanent
[root@centos-swarm-master ~]# firewall-cmd --zone=public --add-port=4789/udp --permanent
[root@centos-swarm-master ~]# firewall-cmd --reload

To verify that the changes took place, I used the following command:

[root@centos-swarm-master ~]# firewall-cmd --list-all
public (active)
 target: default
 icmp-block-inversion: no
 interfaces: ens192
 sources:
 services: dhcpv6-client ssh
 ports: 2379/tcp 4789/udp 2377/tcp 7946/udp 7946/tcp 2380/tcp
 protocols:
 masquerade: no
 forward-ports:
 sourceports:
 icmp-blocks:
 rich rules:

The other ports related to Swarm, which is discussed here, and ETCD, which is for vFile (which I haven’t yet blogged about – watch this space). With these ports opened, we have allowed our docker overlay network to communicate between Swarm nodes. Now if I launch a single replica for the Nginx web service and retry the curl test on  different nodes, lets see what happens:

[root@centos-swarm-master ~]# docker service ls
 ID           NAME                 MODE       REPLICAS IMAGE                PORTS
 rxspku5i98cc vFileServerSharedVol replicated 1/1      luomiao/samba-debian *:30000->445/tcp

[root@centos-swarm-master ~]# docker service create --replicas 1 -p 8080:80 --name web nginx
 xvtzr79sb0fdut85yssxd7z1n
 overall progress: 1 out of 1 tasks
 1/1: running [==================================================>]
 verify: Service converged

[root@centos-swarm-master ~]# docker service ls
 ID           NAME                 MODE       REPLICAS IMAGE                PORTS
 rxspku5i98cc vFileServerSharedVol replicated 1/1      luomiao/samba-debian *:30000->445/tcp
 xvtzr79sb0fd web                  replicated 1/1      nginx:latest         *:8080->80/tcp

[root@centos-swarm-master ~]# curl 127.0.0.1:8080
 <!DOCTYPE html>
 <html>
 <head>
 <title>Welcome to nginx!</title>
 <style>
  body {
  width: 35em;
  margin: 0 auto;
  font-family: Tahoma, Verdana, Arial, sans-serif;
  }
 </style>
 </head>
 <body>
 <h1>Welcome to nginx!</h1>
 <p>If you see this page, the nginx web server is successfully installed and
 working. Further configuration is required.</p>

<p>For online documentation and support please refer to
 <a href="http://nginx.org/">nginx.org</a>.<br/>
 Commercial support is available at
 <a href="http://nginx.com/">nginx.com</a>.</p>

<p><em>Thank you for using nginx.</em></p>
 </body>
 </html>
 [root@centos-swarm-master ~]#

Let’s switch to a worker node, and retry the same test.

[root@centos-swarm-w1 ~]# curl 127.0.0.1:8080
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
 body {
 width: 35em;
 margin: 0 auto;
 font-family: Tahoma, Verdana, Arial, sans-serif;
 }
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>

<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>

<p><em>Thank you for using nginx.</em></p>
</body>
</html>
[root@centos-swarm-w1 ~]#

Success! Now that my overlay network is working successfully, I can reach a single instance of a service working on docker swarm from any of the nodes in the cluster.

The post Validating overlay network when docker swarm running on Centos VMs on vSphere appeared first on CormacHogan.com.

A first look at vFile – Sharing a persistent volume between containers

$
0
0

Regular readers will have noticed that I have been doing a bit of work recently with docker swarm, and what you need to do to get it to work on VMs running on vSphere. The reason why I had taken such an interest is because I wanted to look at a new product that our Project Hatchway team have been cooking up, namely vFile. In a nutshell, vFile provides simultaneous, persistent volume access between nodes in the same Docker Swarm cluster. In some ways, it can be thought of as an extension to vDVS, the vSphere Docker Volume Service (from the same team) that provides persistent storage for containers. vFile allows these persistent volumes to be shared between containers, even when the container hosts are on completely different ESXi hosts. Let’s take a closer look.

Swarm, Overlay and ETCD networking Requirements

This is probably the area that trips up most people when they get started with vFile (it certainly took me a while). There are a number of networking prerequisites i.e. firewall ports that must be opened. First off all, there is a requirement to open a port to allow the docker swarm nodes to talk. Then there is the communication needed for the docker networking overlay. Please take a look at these posts which talk about which firewall ports need to be opened for docker swarm and the overlay network respectively. Unless these are working correctly, you won’t get far.

Secondly, there is a requirement to use ETCD. ETCD is used by Swarm for cluster coordination and state management. You need to make sure that ETCD ports (2379, 2380) are opened on your swarm manager VM(s). 2379 is for ETCD client requests, and 2380 is for peer communication. You can easily identify this issue in the /var/log/vfile.log – if it is not working, it will contain a line similar to:

2017-12-21 09:56:41.996637765 +0000 UTC [ERROR] Failed to create ETCD client according to manager info Swarm ID=ovn08qobd7tjs3qrr4pwetxa5 IP Addr=....
What you need to be able to see in the log file is ETCD working like this (but you won’t see this until the vFile plugin is deployed):
2017-12-21 11:53:36.941373995 +0000 UTC [INFO] vFile plugin started version="vFile Volume Driver v0.2"
2017-12-21 11:53:36.941477095 +0000 UTC [INFO] Going into ServeUnix - Listening on Unix socket address="/run/docker/plugins/vfile.sock"
2017-12-21 11:53:36.941884271 +0000 UTC [INFO] Started loading file server image
2017-12-21 11:53:36.974808787 +0000 UTC [INFO] getEtcdPorts: clientPort=:2379 peerPort=:2380
2017-12-21 11:53:36.974836665 +0000 UTC [INFO] Swarm node role: worker. Return from NewKvStore nodeID=z6oep8ngrpaqf25oqnsyomjfk

So, to recap, ports that need to be opened are:

  • Swarm – 2377/tcp
  • Swarm overlay network – 7946/tcp, 7946/udp, 4789/udp,
  • ETCD – 2379/tcp, 2380/tcp

 

Deploy vFile docker plugin

vFile, just like vDVS, is also a docker plugin – the installation steps can be found by clicking here. But it is a simple “docker plugin install”.

[root@centos-swarm-w1 ~]# docker plugin install --grant-all-permissions --alias vfile vmware/vfile:latest VFILE_TIMEOUT_IN_SECOND=90
latest: Pulling from vmware/vfile
cb8aba6f1749: Download complete
Digest: sha256:7ab7abc795e60c443583639325011f878e79ce5f085c56a525fc098b02fce343
Status: Downloaded newer image for vmware/vfile:latest
Installed plugin vmware/vfile:latest
[root@centos-swarm-w1 ~]#

This plugin must be deployed on all swarm nodes/VMs.

[root@centos-swarm-master ~]# docker plugin ls
ID           NAME           DESCRIPTION                         ENABLED
e84b390e832d vsphere:latest VMWare vSphere Docker Volume plugin true
fccf36e22ecc vfile:latest   VMWare vFile Docker Volume plugin   true
[root@centos-swarm-master ~]#

As I mentioned, vDVS (vSphere Docker Volume Service) is also required. There are a number of posts on this site on how to get started with vDVS. Alternatively, you can checkout the official docs here.

Create a shared volume

Now, we can go ahead and create a small test volume using the vFile plugin. After it is created, we can examine it in more detail. Note that there will be two volumes listed after this command completes, one volume for vFile and another for the vSphere Docker Volume Service (but it is the same volume). This is normal. Also note that I am specifying a policy for this volume called R5 (RAID-5). This is a policy I created for vSAN, since this Container Host VM is on vSAN, and so the container volume that I create will also be on vSAN.

[root@centos-swarm-master ~]#  docker volume create --driver=vfile --name=SharedVol -o size=10gb -o vsan-policy-name=R5
SharedVol


[root@centos-swarm-master ~]# docker volume ls
DRIVER VOLUME NAME
vfile:latest SharedVol
vsphere:latest _vF_SharedVol@vsanDatastore


[root@centos-swarm-master ~]# docker volume inspect SharedVol
[
 {
 "CreatedAt": "0001-01-01T00:00:00Z",
 "Driver": "vfile:latest",
 "Labels": {},
 "Mountpoint": "/mnt/vfile/SharedVol/",
 "Name": "SharedVol",
 "Options": {
 "size": "10gb",
 "vsan-policy-name": "R5"
 },
 "Scope": "global",
 "Status": {
 "Clients": null,
 "File server Port": 0,
 "Global Refcount": 0,
 "Service name": "",
 "Volume Status": "Ready"
 }
 }
]
[root@centos-swarm-master ~]#

Excellent. Let’s see now if we can share this volume to containers launched from the other nodes in the cluster.

 

Access it from multiple containers

In this example, we shall use a simple busybox image, and present this volume to it. Let’s begin from a worker node. First, verify that the volume is visible, and then start the busybox with the shared volume mounted (on /mnt/myvol). Once inside the busybox shell, make some files and directories.

[root@centos-swarm-w1 ~]# docker volume ls
DRIVER         VOLUME NAME
vfile:latest   SharedVol
vsphere:latest _vF_SharedVol@vsanDatastore

[root@centos-swarm-w1 ~]# docker run --rm -it -v SharedVol:/mnt/myvol --name busybox-on-worker busybox
/ # cd /mnt/myvol
/mnt/myvol # ls
lost+found
/mnt/myvol # mkdir cormac
/mnt/myvol # cd cormac/
/mnt/myvol/cormac # touch xxx
/mnt/myvol/cormac # touch yyy
/mnt/myvol/cormac # touch zzz
/mnt/myvol/cormac # ls
xxx yyy zzz
/mnt/myvol/cormac #

 

Let’s now verify if we can see the same data by mounting this volume to another container, this time on another node (normally this would be another worker, but in this example I am using my master):

[root@centos-swarm-master ~]# docker run --rm -it -v SharedVol:/mnt/myvol --name busybox-on-master busybox
/ # cd /mnt/myvol/
/mnt/myvol # ls
cormac lost+found
/mnt/myvol # cd cormac
/mnt/myvol/cormac # ls
xxx yyy zzz
/mnt/myvol/cormac #

 

Success! A persistent container volume shared between multiple containers and container hosts without the need for something like NFS, Ceph or Gluster. How cool is that?

Let’s take one final peak at our volume:

[root@centos-swarm-master ~]# docker volume ls
DRIVER         VOLUME NAME
vfile:latest   SharedVol
vsphere:latest _vF_SharedVol@vsanDatastore

[root@centos-swarm-master ~]# docker volume inspect SharedVol
[
 {
 "CreatedAt": "0001-01-01T00:00:00Z",
 "Driver": "vfile:latest",
 "Labels": {},
 "Mountpoint": "/mnt/vfile/SharedVol/",
 "Name": "SharedVol",
 "Options": {
 "size": "10gb",
 "vsan-policy-name": "R5"
 },
 "Scope": "global",
 "Status": {
 "Clients": [
 "10.27.51.146",
 "10.27.51.147"
 ],
 "File server Port": 30000,
 "Global Refcount": 2,
 "Service name": "vFileServerSharedVol",
 "Volume Status": "Mounted"
 }
 }
]

The status has changed from Ready to Mounted. Looks good.

If you plan to do some testing with vFile, the hatchway team would love to get your feedback. They can be easily contacted on github by clicking here.

The post A first look at vFile – Sharing a persistent volume between containers appeared first on CormacHogan.com.

A simple Pivotal Container Service (PKS) deployment

$
0
0

This post will walk you through a simplified PKS (Pivotal Container Service) deployment in my lab. The reason why I say this is simplified is because all of the components will be deployed on a single flat network. PKS has a number of network dependencies. These include the bosh agents deployed on the Kubernetes (K8s) VMs being able to reach the BOSH Director, as well as the vCenter server. Let’s not get too deep into the components just yet – these will be explained over the course of the post. So rather than trying to set up routing between multiple different networks, I have deployed everything on a single flat network. Again, like some of my previous posts, this is more of a learning exercise than a best practice. What it will show you are the various moving parts that need to be considered when deploying your own PKS. If you are looking for a more complex network deployment, have a read of the excellent series of posts that my good pal William Lam put together. He highlights all of the steps needed to get PKS deployed with NSX-T. I will be referencing William’s blogs from time to time as they contain some very useful snippets of information. Note that this is going to be quite a long post as there are a lot of pieces to deploy to get to the point where a K8s cluster can be rolled out.

If you want to learn more about what PKS is about, have a read of this blog post where I discuss the PKS announcement from VMworld 2017.

1. Networking overview

Let’s begin by having a quick look at the various components and how they are connected. I put together this diagram to assist with understanding the network dependencies.

To get started, the Pivotal Ops Manager is deployed. This needs to be able to communicate with your vSphere environment. This is responsible for deploying the BOSH Director component and then the Pivotal Container Service (PKS). Once these are deployed, a PKS client VM is setup which contains the UAA (user account and authorization) and PKS command line extensions. These are used to get the correct privileges to roll out the Kubernetes cluster(s). As I mentioned in the opening paragraph, the BOSH agents on the K8s nodes need to reach back to the BOSH director to pull down appropriate packages, as well as reach back to the vSphere environment to request the creation of persistent disks for the master and workers. While this makes the integration with NSX-T really useful, I wanted just look at the steps involved without having NSX-T in the mix. And because I am restricted in my own lab environment, I went with a single flat network as follows:

Now that we know what the networking looks like, lets go ahead and check on what components we need to deploy it.

 

2. PKS Components

We already mentioned that we will need the Pivotal Ops Manager. This is an OVF which, once deployed, can be used to deploy the BOSH Directory and PKS, the Pivotal Container Service.

We will then need to deploy the necessary components on a VM which we can refer to as the PKS Client. The tools we need to install in this VM are the UAA CLI (for user authentication), the PKS CLI (for creating K8s clusters) and then finally the Kubectl CLI (which provides a CLI interface to manage our K8s cluster).

This is all we need, and with this infrastructure in place, we will be able to deploy K8s clusters via PKS.

 

3. Deploying Pivotal Ops Manager

This is a straight-forward OVF deploy. You can download the Pivotal Cloud Foundry Ops Manager for vSphere by clicking here. The preference is to use a static IP address and/or FQDN for the Ops Manager. Once deployed, open a browser to the Ops Manager, and you will be presented with an Authentication System to select as follows:

I used “Internal Authentication”. You will then need to populate password fields and agree to the terms and conditions. Once the details are submitted, you will be presented with the login page where you can now login to Ops Manager. The landing page will look something like this, where we see the BOSH Director tile for vSphere. BOSH is basically a deployment mechanism for deploying Pivotal software on various platforms. In this example, we are deploying on VMware vSphere, so now we need to populate a manifest file with all of our vSphere details. To proceed, we simply click on the tile below. Orange means that it is not yet populated; green means that is has been populated.

 

4. Configuring BOSH Director for vSphere

The first set of details that we need to populate are related to vCenter. As well as credentials, you also need to provide the name of your data center, a datastore and whether you are using standard vSphere network (which is what I am using) or NSX. Here is an example taken from my environment.

The next screen is the Director configuration. There are only a handful of items to add here. The first is an NTP server. The others are check-boxes, shown below. The interesting one is the “Enable Post Deploy Scripts”. If this is enabled, BOSH will automatically deploy K8s applications on your Kubernetes cluster immediately after the cluster is deployed. This includes the K8s dashboard. If this checkbox is not checked, then you will not have any applications deployed and the cluster will be idle after it is deployed.

This brings us to Availability Zones (AZ). This is where you can define multiple pools of vSphere resources that are associated with a network. When a K8s cluster is deployed, one can select a particular network for the K8s cluster, and in turn the resources associated with the AZ. Since I am going with one large flat network, I will also create a single AZ which is the whole of my cluster.

Now we come to the creation of a network and the assigning of my AZ to that network. I am just going to create two networks, one for my BOSH Director and PKS and another for K8s. But these will be on the same segment since everything is going to sit on one flat network. And this is the same network that my vCenter server resides on. As William also points out in his blog, the Reserved IP Ranges need some explaining. This entry defines which IP addresses are already in use in the network. So basically, you are blocking out IP ranges from BOSH and K8s, and anything that is not defined can be consumed by BOSH and K8s. In effect, for BOSH, we will required 2 IP address to be free – the first for BOSH Director and the second for PKS which will deploy shortly. So we will need to block all IP addresses except for the two we want to use. For K8s, we will require 4 IP addresses for each cluster we deploy, one for the master and 3 workers. So block all IP addresses except for the 4 you want to use. This is what my setup looks like, with my 2 networks:

Finally, we associate an Availability Zone and a Network with the BOSH Director. I chose the network with 2 free IP addresses created previously, and my AZ is basically the whole of my vSphere cluster, also created previously.

I can now return to my installation dashboard (top left hand corner in the above screenshot), and see that my BOSH Director tile has now turned green, meaning it has been configured. I can see on the right hand side that there are “Pending Changes” which is to install the BOSH Director. If I click on Apply changes, this will start the roll out of my BOSH Director VM in vSphere.

You can track the progress of the deployment by clicking on the Verbose Output link in the top right hand corner:

And if everything is successful, you will hopefully see a deployment message to highlight that the changes were successfully deployed.

You should now be able to see your BOSH Director VM in vSphere. The custom attributes of the VM should reveal that it is a BOSH director. That now completes the deployment of the BOSH director. We can now turn our attention to PKS.

 

5. Adding the PKS tile and necessary stem cell

Now we install the Pivotal Container Service. This will create a new tile once imported. We then add a “stemcell” to this tile. A “stemcell” is a customized operating system image containing the filesystem for BOSH-managed virtual machines. For all intents and purposes, vSphere admins can think of this as a template.  PKS can be downloaded from this location. On the same download page, you will notice that there are a list of available stemcells. The stemcell version needs to match the PKS version. The required version will be obvious when PKS is deployed.

To begin the deploy, click on “Import a Product” on the left hand side of the Ops Manager UI. Browse to the PKS download, and if the import is successful, you will see the PKS product available on the left hand side as well, along with the version.

Click on the + sign to add the PKS tile, and it will appear in an orange un-configured state, as shown below.

You will also notice that this tile requires a stemcell, as highlighted in red in the PKS tile. Click on this to import the tile.

Here you can also see the required version of stemcell (3468.21) so this is the one that you need to download from Pivotal. This will be different depending on the version of PKS that you choose. From the PKS download page, if we click on the 3468.21 release, we get taken to this download page where we can pick up the Ubuntu Trusty Stemcell for vSphere 3468.21. Once the stemcell is imported and applied, we can return to the Installation Dashboard and start to populate the configuration information needed for PKS by clicking on the tile.

 

6. Configure PKS

The first configuration item is Availability Zones and Networks. As I have only a single AZ and a single flat network, that is easy. For the Network selection, the first network placement is for the PKS VM and the second is for the K8s VMs. I will place PKS on the same network as my BOSH Director, and the Service Network will be assigned the network with the 4 free IP address (for master and 3 workers). Now in my flat network setup, these are all on the same segment and VLAN, but in production environments, these will most likely be on separate segments/VLANs, as shown in my first network diagram. If that is the case, then you will have to make sure that the Service Network has a route back to the BOSH Director and PKS VMs, as well as your vCenter server and ESXi hosts. In a future blog, I’ll talk about network dependencies and the sort of deployment issues you will see if these are not correct.

The next step is to configure access to the PKS API. This will be used later when we setup a PKS client VM with various CLI components. This certificate will be generated based on a domain, which in my case is rainpole.com. Populate the domain, then click generate:

And once generated, click on Save.

Next step is to populate the Plans. These decide the resources assigned to the VMs which are deployed when your Kubernetes clusters are created. You will see later how to select a particular plan when we create a K8s cluster. These can be left at the default settings; the only step in each of the plans is to select the Availability Zone. Once that is done, save the plans. Plan 1 is small, Plan 2 is medium. Plan 3 (large) can be left inactive.

This brings us to the K8s Cloud Provider. Regular readers of this blog might remember posts regarding Project Hatchway, which is a VMware initiative to provider persistent storage for containers. PKS is leveraging this technology to provide “volumes” to cloud native applications running in K8s. This has to be able to communicate with vSphere, so this is where these details are added. You will also need to provide a datastore name as a VM folder. I matched these exactly with the settings in the vCenter configuration for the BOSH Director. I’m not sure if this is necessary (probably not) but I didn’t experience any issues reusing them here for PKS.

Networking can be left at the default of Flannel rather than selecting NSX.

The final configuration step in the PKS tile is the UAA setting. This is the user account and authorization part, and this is how we will manage the PKS environment, and basically will define who can manage and deploy K8s clusters. This takes a DNS entry and once PKS is deployed, that DNS entry will need to point to the PKS VM, once it is deployed. I used uaa.rainpole.com.

If we now return to the installation dashboard, we should once again see a set of pending changes, including changes to BOSH Director and the installation of Pivotal Container Service (PKS). Click on Apply changes as before. The changes can be tracked via the verbose output link as highlighted previously.

If the changes are successful, the deployment dashboard should now look something like this:

So far so good. The next step is to set up a PKS client with the appropriate CLI tools so that we can now go ahead and roll out K8s clusters.

 

7. Configuration a PKS Client

I’m not going to spend too much time on the details here. William already does a great job on how to deploy the various components (uaac, pks and kubectl) on an Ubuntu VM in his blog post here. If you’d rather not use Ubuntu, we already saw where the CLI components can be downloaded previously in this post. The CLI components are in the same download location as PKS. When the components are installed, it would now be a good time to do your first DNS update. You will need to add uaa.rainpole.com to your DNS to match the same IP as the PKS VM (or add it to the PKS client /etc/hosts file).

 

8. Deploy your first K8s cluster

First step is to retrieve the secret token for your UAA Admin. Select the Pivotal Container Service tile in Pivotal Ops Manager, then select the credentials tab and then click on the Link to Credential. Here you will find the secret needed to  allow us to create an admin user that can then be used to create K8s clusters via PKS. In the final command, we include the role “pks.clusters.admin” which will give us full admin rights to all PKS clusters.

  • uaac target https://uaa.rainpole.com:8443 –skip-ssl-validation
  • uaac token client get admin -s
  • uaac user add admin –emails admin@rainpole.com -p
  • uaac member add pks.clusters.admin admin

 

Before we create our K8s cluster, there is another very useful set of bosh CLI commands. In order to run these commands however, we need to authenticate against our Pivotal Operations Manager. Here are the two “om” (short for ops manager) commands to do that (you will replace the pivotal-ops-mgr.rainpole.com with your own ops manager):

root@pks-cli:~# om –target https://pivotal-ops-mgr.rainpole.com-u admin -p -k curl -p /api/v0/certificate_authorities | jq -r ‘.certificate_authorities | select(map(.active == true))[0] | .cert_pem’ > /root/opsmanager.pem
Status: 200 OK
Cache-Control: no-cache, no-store
Connection: keep-alive
Content-Type: application/json; charset=utf-8
Date: Mon, 16 Apr 2018 10:17:31 GMT
Expires: Fri, 01 Jan 1990 00:00:00 GMT
Pragma: no-cache
Server: nginx/1.4.6 (Ubuntu)
Strict-Transport-Security: max-age=15552000
X-Content-Type-Options: nosniff
X-Frame-Options: SAMEORIGIN
X-Request-Id: 8dea4ce9-a1d5-4b73-b673-fadd439d4689
X-Runtime: 0.034777
X-Xss-Protection: 1; mode=block
root@pks-cli:~# om –target https://pivotal-ops-mgr.rainpole.com-u admin -p -k curl -p /api/v0/deployed/director/credentials/bosh2_commandline_credentials -s | jq -r ‘.credential’
BOSH_CLIENT=ops_manager 
BOSH_CLIENT_SECRET=DiNXMj11uyC9alp3KJGOMO5xyATCg—F 
BOSH_CA_CERT=/var/tempest/workspaces/default/root_ca_certificate 
BOSH_ENVIRONMENT=10.27.51.177 

 

Ignore the BOSH_CA_CERT output, but take the rest of the output from the command and update your ~/.bash_profile. Add the following entries:

  • export BOSH_CLIENT=ops_manager
  • export BOSH_CLIENT_SECRET=<whatever-the command-returned-to-you>
  • export BOSH_CA_CERT=/root/opsmanager.pem
  • export BOSH_ENVIRONMENT=<whatever-the-BOSH-ip-returned-to-you>

 

Source your ~/.bash_profile so that the entries take effect. Now we can run some PKS CLI commands to login to the UAA endpoint configured during the PKS configuration phase.

  • pks login -a uaa.rainpole.com -u admin -p -k
  • pks create-cluster k8s-cluster-01  –external-hostname pks-cluster-01  –plan small –num-nodes 3

Note that there are two cluster references in the final command. The first, k8s-cluster-01, is how PKS identifies the cluster. The second, pks-cluster-01, is simply a K8s thing – essentially the expectation that there is some external load-balancer front-end sitting in front of the K8s cluster. So once again, we will need to edit our DNS and add this entry to coincide with the IP address of the K8s master node, once it is deployed.The plan entry relates to the plans that we set up in PKS earlier. One of the plas was labeled “small” which is what we have chosen here. Lastly, the number of nodes refers to the number of K8s worker nodes. In this example, there will be 3 workers, alongside the master.

Here is the output of the create command:

root@pks-cli:~# pks create-cluster k8s-cluster-01 –external-hostname pks-cluster-01 –plan small –num-nodes 3
Name: k8s-cluster-01
Plan Name: small
UUID: 72540a79-82b5-4aad-8e7a-0de6f6b058c0
Last Action: CREATE
Last Action State: in progress
Last Action Description: Creating cluster Kubernetes Master
Host: pks-cluster-01 Kubernetes Master
Port: 8443
Worker Instances: 3
Kubernetes Master IP(s): In Progress

 

And while that is running, we can run the following BOSH commands (assuming you have successfully run the om commands above):

root@pks-cli:~# bosh task
Using environment ‘10.27.51.181’ as client ‘ops_manager’

Task 404

Task 404 | 10:53:36 | Preparing deployment: Preparing deployment (00:00:06)
Task 404 | 10:53:54 | Preparing package compilation: Finding packages to compile (00:00:00)
Task 404 | 10:53:54 | Creating missing vms: master/a975a9f9-3f74-4f63-ae82-61daddbc78df (0)
Task 404 | 10:53:54 | Creating missing vms: worker/64a55231-cdb6-4ce5-b62c-83cc3b4b233d (0)
Task 404 | 10:53:54 | Creating missing vms: worker/c3c9b49a-f89a-41e6-a1af-36ea0416f3c3 (1)
Task 404 | 10:53:54 | Creating missing vms: worker/e6fd330e-4e95-4d46-a937-676a05a32e5e (2) (00:00:59)
Task 404 | 10:54:54 | Creating missing vms: worker/c3c9b49a-f89a-41e6-a1af-36ea0416f3c3 (1) (00:01:00)
Task 404 | 10:54:58 | Creating missing vms: worker/64a55231-cdb6-4ce5-b62c-83cc3b4b233d (0) (00:01:04)
Task 404 | 10:55:00 | Creating missing vms: master/a975a9f9-3f74-4f63-ae82-61daddbc78df (0) (00:01:06)
Task 404 | 10:55:00 | Updating instance master: master/a975a9f9-3f74-4f63-ae82-61daddbc78df (0) (canary) (00:02:03)
Task 404 | 10:57:03 | Updating instance worker: worker/64a55231-cdb6-4ce5-b62c-83cc3b4b233d (0) (canary) (00:01:25)
Task 404 | 10:58:28 | Updating instance worker: worker/c3c9b49a-f89a-41e6-a1af-36ea0416f3c3 (1) (00:01:28)
Task 404 | 10:59:56 | Updating instance worker: worker/e6fd330e-4e95-4d46-a937-676a05a32e5e (2) (00:01:28)

Task 404 Started Tue Apr 24 10:53:36 UTC 2018
Task 404 Finished Tue Apr 24 11:01:24 UTC 2018
Task 404 Duration 00:07:48
Task 404 done

Succeeded

 

root@pks-cli:~# bosh vms
Using environment ‘10.27.51.181’ as client ‘ops_manager’

Task 409
Task 410
Task 409 done

Task 410 done

Deployment ‘pivotal-container-service-e7febad16f1bf59db116’

Instance Process State AZ IPs VM CID VM Type Active
pivotal-container-service/d4a0fd19-e9ce-47a8-a7df-afa100a612fa running CH-AZ 10.27.51.182 vm-54d92a19-8f98-48a8-bd2e-c0ac53f6ad70 micro false

1 vms

Instance Process State AZ IPs VM CID VM Type Active
master/a975a9f9-3f74-4f63-ae82-61daddbc78df running CH-AZ 10.27.51.185 vm-1e239504-c1d5-46c0-85fc-f5c02bbfddb1 medium false
worker/64a55231-cdb6-4ce5-b62c-83cc3b4b233d running CH-AZ 10.27.51.186 vm-a0452089-b2cc-426e-983c-08a442d15f46 medium false
worker/c3c9b49a-f89a-41e6-a1af-36ea0416f3c3 running CH-AZ 10.27.51.187 vm-54e3ff52-a9a0-450a-86f4-e176afdb47ff medium false
worker/e6fd330e-4e95-4d46-a937-676a05a32e5e running CH-AZ 10.27.51.188 vm-fadafe3f-331a-4844-a947-2390c71a6296 medium false

4 vms

Succeeded
root@pks-cli:~#

 

root@pks-cli:~# pks clusters

Name Plan Name UUID Status Action
k8s-cluster-01 small 72540a79-82b5-4aad-8e7a-0de6f6b058c0 succeeded CREATE

 

root@pks-cli:~# pks cluster k8s-cluster-01

Name: k8s-cluster-01
Plan Name: small
UUID: 72540a79-82b5-4aad-8e7a-0de6f6b058c0
Last Action: CREATE
Last Action State: succeeded
Last Action Description: Instance provisioning completed
Kubernetes Master Host: pks-cluster-01
Kubernetes Master Port: 8443
Worker Instances: 3
Kubernetes Master IP(s): 10.27.51.185

cormac@pks-cli:~$

 

The canary steps are interesting. This is where it creates a test node with the new components/software and if that all works, we can use the new node in place of the old node rather than impact running environment. You will see one for the master, and one for the workers. If the worker one is successful, then we know it will work for all workers so no need to repeat it for all workers.

The very last command has returned the IP address of the master. We can now add the DNS entry for the pks-cluster-01 with this IP address.

 

9. Using kubectl

Excellent – now you have your Kubernetes cluster deployed. We also have a K8s CLI utility called kubectl, so let’s run a few commands and examine our cluster. First, we will need to authenticate. We can do that with the following command:

root@pks-cli:~# pks get-credentials k8s-cluster-01

Fetching credentials for cluster k8s-cluster-01.
Context set for cluster k8s-cluster-01.

You can now switch between clusters by using:
$kubectl config use-context <cluster-name>

 

root@pks-cli:~# kubectl config use-context k8s-cluster-01
Switched to context “k8s-cluster-01”.
root@pks-cli:~#

 

You can now start using kubectl commands to examine the state of your cluster.

root@pks-cli:~# kubectl get nodes
NAME STATUS ROLES AGE VERSION
2ca99275-244c-4e21-952a-a2fb3586963e Ready <none> 15m v1.9.2
7bdaac2e-272e-45ae-808c-ada0eafbb967 Ready <none> 18m v1.9.2
a3d76bbe-31da-4531-9e5d-72bdbebb9b96 Ready <none> 16m v1.9.2

root@pks-cli:~# kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
2ca99275-244c-4e21-952a-a2fb3586963e Ready <none> 15m v1.9.2 10.27.51.188 Ubuntu 14.04.5 LTS 4.4.0-116-generic docker://1.13.1
7bdaac2e-272e-45ae-808c-ada0eafbb967 Ready <none> 18m v1.9.2 10.27.51.186 Ubuntu 14.04.5 LTS 4.4.0-116-generic docker://1.13.1
a3d76bbe-31da-4531-9e5d-72bdbebb9b96 Ready <none> 17m v1.9.2 10.27.51.187 Ubuntu 14.04.5 LTS 4.4.0-116-generic docker://1.13.1

root@pks-cli:~# kubectl get pods
No resources found.

 

The reason I have no pods is that in this deployment, I omitted the “Enable Post Deploy Scripts” option when setting up the Director initially. If I had checked this, I would have the K8s dashboard running automatically. No dig deal – I can deploy it manually.

root@pks-cli:~# kubectl create -f https://raw.githubusercontent.com/kubernetes/dashboard/master/src/deploy/recommended/kubernetes-dashboard.yaml
secret “kubernetes-dashboard-certs” created
serviceaccount “kubernetes-dashboard” created
role “kubernetes-dashboard-minimal” created
rolebinding “kubernetes-dashboard-minimal” created
deployment “kubernetes-dashboard” created
service “kubernetes-dashboard” created

root@pks-cli:~# kubectl get pods –all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system kubernetes-dashboard-5bd6f767c7-z2pql 1/1 Running 0 2m

root@pks-cli:~# kubectl get pods –all-namespaces -o wide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE
kube-system kubernetes-dashboard-5bd6f767c7-z2pql 1/1 Running 0 3m 10.200.22.2 a3d76bbe-31da-4531-9e5d-72bdbebb9b96
root@pks-cli:~#

Let’s run another application as well – a simple hello-world app.

root@pks-cli:~# kubectl run hello-node –image gcr.io/google-samples/node-hello:1.0
deployment “hello-node” created

root@pks-cli:~# kubectl get pods –all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
default hello-node-6c59d566d6-85m5s 0/1 ContainerCreating 0 2s
kube-system kubernetes-dashboard-5bd6f767c7-z2pql 1/1 Running 0 30m

root@pks-cli:~# kubectl get pods –all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
default hello-node-6c59d566d6-85m5s 1/1 Running 0 3m
kube-system kubernetes-dashboard-5bd6f767c7-z2pql 1/1 Running 0 30m

 

And now we have the K8s dashboard running. Now, we are not able to point directly at our master node to access this dashboard due to authorization restrictions. However William once again saves the day with this steps on how to access the K8s dashboard via a tunnel and the kubectl proxy. Once you have connected to the dashboard and uploaded your K8s config file for the PKS client, you should now be able to access the K8s dashboard and see any apps that you have deployed (in my case, the simple hello app).

There you  have it. Now you have infrastructure in place to allow you to very simply and very simply deploy K8s clusters for your developers. I’ll follow up with a post on some of the challenges I met, especially with the networking, as this should help anyone looking to roll this out in production. But for now, I think this post is already long enough. Thanks for reading to the end. And kudos once more to William Lam and his great blog which provided a lot of guidance on how to successfully deploy PKS and K8s.

The post A simple Pivotal Container Service (PKS) deployment appeared first on CormacHogan.com.

PKS – Networking Setup Tips and Tricks

$
0
0

In my previous post, I showed how to deploy Pivotal Container Services (PKS) on a simplified flat network. In this post, I will highlight some of the issues one might encounter if you wish to deploy PKS on a more complex network topology. For example, you may have vCenter Server on a vSphere management network alongside the PKS management components (PKS  CLI client, Pivotal Ops Manager). You may then want to have another “intermediate network” for the deployment of the BOSH and PKS VMs. And then finally, you may finally have another network on which the Kubernetes (K8s) VMs (master, workers) are deployed. These component need to communicate to each other across the different networks, e.g. the bosh agent on the K8s master and worker VMs needs to be able to reach the vSphere infrastructure. What I want to highlight in this post are some of the issues and error messages that you might encounter when rolling out PKS on such a configuration, and what you can do to fix them. Think of this as a lessons learnt by me trying to do something similar.

A picture is worth a thousand words, so a final PKS deployment may look something similar to this layout here:

Let’s now look at what happens when certain components in this deployment cannot communicate/route to other components.

Issue #1: This is the error I observed when trying to deploy the PKS VM via the Pivotal Ops Manager on a network which could not route to my vSphere network. Note this is the Pivotal Container Service PKS VM (purple above) and not the PKS client with the CLI tools (orange above).

===== 2018-04-19 10:45:02 UTC Finished “/usr/local/bin/bosh –no-color –non-interactive –tty –environment=192.168.191.10 update-config runtime –name=pivotal-container-service-9b9223d27659ed342925-enable-pks-helpers /tmp/pivotal-container-service-9b9223d27659ed342925-enable-pks-helpers.yml20180419-1433-1sp8xkh”; Duration: 0s; Exit Status: 0
===== 2018-04-19 10:45:02 UTC Running “/usr/local/bin/bosh –no-color –non-interactive –tty –environment=192.168.191.10 upload-stemcell /var/tempest/stemcells/bosh-stemcell-3468.28-vsphere-esxi-ubuntu-trusty-go_agent.tgz”
Using environment ‘192.168.191.10’ as client ‘ops_manager’
0.00%    0.54% 11.16 MB/s 36s
Task 5
Task 5 | 10:45:35 | Update stemcell: Extracting stemcell archive (00:00:04)
Task 5 | 10:45:39 | Update stemcell: Verifying stemcell manifest (00:00:00)
Task 5 | 10:52:12 |Error: Unknown CPI error ‘Unknown’ with message ‘Please make sure the CPI has proper network access to vSphere. (HTTPClient::ConnectTimeoutError: execution expired)’ in ‘info’ CPI method
 Task 5 Started  Thu Apr 19 10:45:35 UTC 2018
Task 5 Finished Thu Apr 19 10:52:12 UTC 2018
Task 5 Duration 00:06:37
Task 5 error
Uploading stemcell file:
Expected task ‘5’ to succeed but state is ‘error’
Exit code 1
RESOLUTION #1: The clue is in the message – “proper network access to vSphere”. CPI is short for Cloud Provider Interface, and is basically how PKS communicates to different deployment types, in this case vSphere. To avoid this issue, you need to make sure that BOSH and PKS can communicate to your vCenter server/vSphere management network.

 

ISSUE #2: This next issue was to do with not being able to resolve fully qualified domain names. If DNS has not been configured correctly when you are setting up the network section of the manifests in Pivotal Ops Manager, then PKS will not be able resolve ESXi hostnames in your vSphere environment. I’m guessing that the upload-stemcell command which is getting an error here is where it is trying to upload the customized operating system image for the PKS VM to vSphere. But it is unable to resolve the FQDN to something more meaningful.

===== 2018-04-19 11:23:06 UTC Finished “/usr/local/bin/bosh –no-color –non-interactive –tty –environment=192.168.191.10 upload-stemcell /var/tempest/stemcells/bosh-stemcell-3468.28-vsphere-esxi-ubuntu-trusty-go_agent.tgz”; Duration: 198s; Exit Status: 1

===== 2018-04-19 11:23:06 UTC Running “/usr/local/bin/bosh –no-color –non-interactive –tty –environment=192.168.191.10 upload-stemcell /var/tempest/stemcells/bosh-stemcell-3468.28-vsphere-esxi-ubuntu-trusty-go_agent.tgz”

Using environment ‘192.168.191.10’ as client ‘ops_manager’

0.00%    0.51% 10.62 MB/s 38s

Task 14

Task 14 | 11:23:39 | Update stemcell: Extracting stemcell archive (00:00:04)

Task 14 | 11:23:43 | Update stemcell: Verifying stemcell manifest (00:00:00)

Task 14 | 11:23:45 | Update stemcell: Checking if this stemcell already exists (00:00:00)

Task 14 | 11:23:45 | Update stemcell: Uploading stemcell bosh-vsphere-esxi-ubuntu-trusty-go_agent/3468.28 to the cloud (00:00:28)

L Error: Unknown CPI error ‘Unknown’ with message ‘getaddrinfo: Name or service not known (esxi-dell-g.rainpole.com:443)’ in ‘create_stemcell’ CPI method

Task 14 | 11:24:13 | Error: Unknown CPI error ‘Unknown’ with message ‘getaddrinfo: Name or service not known (esxi-dell-g.rainpole.com:443)’ in ‘create_stemcell’ CPI method

Task 14 Started  Thu Apr 19 11:23:39 UTC 2018

Task 14 Finished Thu Apr 19 11:24:13 UTC 2018

Task 14 Duration 00:00:34

Task 14 error

Uploading stemcell file:

Expected task ’14’ to succeed but state is ‘error’

Exit code 1

RESOLUTION #2: Ensure that the DNS server entries in the network sections of the manifests in Ops Manager are correct so that both BOSH and PKS can resolve vCenter and ESXi host FQDNs.

 

ISSUE #3: The K8s master and worker VMs deploy but never enter a running state – they get left in a state of unresponsive agent. I used two very useful commands here on the PKS Client to troubleshoot. This PKS Client (orange above) is not the PKS VM (purple above), but the VM where I have my CLI tools deployed (see previous post for more info). One command is bosh vms and the other is bosh task. The first, bosh vms, shows me the current state of deployed VMs, including the K8s VMs, and bosh task command tracks the K8s cluster deploy tasks. As you can see, the deployment gives up after 10 minutes/600 seconds.

root@pks-cli:~# bosh vms

Using environment ‘10.27.51.181’ as client ‘ops_manager’

Task 53

Task 54. Done

Task 53 done

Deployment ‘pivotal-container-service-e7febad16f1bf59db116’

Instance                                                        Process State  AZ     IPs           VM CID                                   VM Type  Active

pivotal-container-service/d4a0fd19-e9ce-47a8-a7df-afa100a612fa  running        CH-AZ  10.27.51.182  vm-68aadcae-ba47-41e8-843a-fb3764670861  micro    false

1 vms

Deployment ‘service-instance_2214fcfa-c02f-498f-b37b-3a1b9cf89b27’

Instance                                     Process State       AZ     IPs           VM CID                                   VM Type  Active

master/4ed5f285-5c89-4740-b8bf-32682137cab6  unresponsive agent  CH-AZ  192.50.0.140  vm-75109d0d-5581-4cf9-9dcc-d873e9602b9b  –        false

worker/9d9ad944-dac1-4d3b-9838-1f7c61ffb5b1  unresponsive agent  CH-AZ  192.50.0.141  vm-5e882aff-f709-4b7b-ab47-8d6be80cb7dd  –        false

worker/dd673f29-9bc8-4921-b231-ea35f2cc66b1  unresponsive agent  CH-AZ  192.50.0.142  vm-e0b6dca3-cd92-4e0a-8429-9a2fe2a2dc56  –        false

worker/e36af3d7-e0cd-4c23-88e6-adde3f554300  unresponsive agent  CH-AZ  192.50.0.143  vm-cfd0c81e-9811-49cb-9c87-e23063f83a6b  –        false

4 vms

Succeeded

root@pks-cli:~#

 

root@pks-cli:~# bosh task

Using environment ‘10.27.51.181’ as client ‘ops_manager’

Task 48

Task 48 | 15:52:13 | Preparing deployment: Preparing deployment (00:00:05)

Task 48 | 15:52:30 | Preparing package compilation: Finding packages to compile (00:00:00)

Task 48 | 15:52:30 | Creating missing vms: master/4ed5f285-5c89-4740-b8bf-32682137cab6 (0)

Task 48 | 15:52:30 | Creating missing vms: worker/e36af3d7-e0cd-4c23-88e6-adde3f554300 (1)

Task 48 | 15:52:30 | Creating missing vms: worker/9d9ad944-dac1-4d3b-9838-1f7c61ffb5b1 (0)

Task 48 | 15:52:30 | Creating missing vms: worker/dd673f29-9bc8-4921-b231-ea35f2cc66b1 (2)

Task 48 | 16:02:53 | Creating missing vms: worker/9d9ad944-dac1-4d3b-9838-1f7c61ffb5b1 (0) (00:10:23)

L Error: Timed out pinging to 8876d9df-290f-41b9-8455-1c8efe5fc05d after 600 seconds

Task 48 | 16:02:58 | Creating missing vms: worker/dd673f29-9bc8-4921-b231-ea35f2cc66b1 (2) (00:10:28)

L Error: Timed out pinging to 893ccb7a-11d8-4055-b486-f435f922954c after 600 seconds

Task 48 | 16:02:58 | Creating missing vms: master/4ed5f285-5c89-4740-b8bf-32682137cab6 (0) (00:10:28)

L Error: Timed out pinging to 4741eb79-ca75-4352-ba8e-d70474c7beb8 after 600 seconds

Task 48 | 16:03:00 | Creating missing vms: worker/e36af3d7-e0cd-4c23-88e6-adde3f554300 (1) (00:10:30)

L Error: Timed out pinging to 0315851b-fdd3-48b5-9415-75d2bf52c945 after 600 seconds

Task 48 | 16:03:00 | Error: Timed out pinging to 8876d9df-290f-41b9-8455-1c8efe5fc05d after 600 seconds

Task 48 Started  Fri Apr 20 15:52:13 UTC 2018

Task 48 Finished Fri Apr 20 16:03:00 UTC 2018

Task 48 Duration 00:10:47

Task 48 error

Capturing task ’48’ output:

Expected task ’48’ to succeed but state is ‘error’

Exit code 1

root@pks-cli:~#

RESOLUTION #3: BOSH agents in the Kubernetes VMs needs to be to communicate back to BOSH VM, so there needs to be a route between Kubernetes VM deployed on the “Service Network” – the network that is configured in the BOSH manifest and consumed in the PKS manifest in Ops Manager – and the “intermediate network” on which BOSH and PKS VMs are deployed. If there is no route between the networks, then this is what you will observe.

 

ISSUE #4: In this final issue, the K8s cluster does not deploy successfully. The master and worker VMs are running but the first worker VM from K8s never restarts after stopping for the canary step. The canary step is where a duplicate master or worker node is updated with any necessary configuration/components/software, and if the update is successful, it replaces the current master or worker node. In this example, we are looking at the task after the failure, again using bosh task. If you give the task number to bosh task, it will list the task steps, as shown below:

root@pks-cli:~# bosh task 31

Using environment ‘192.168.191.10’ as client ‘ops_manager’

Task 31

Task 31 | 12:00:21 | Preparing deployment: Preparing deployment (00:00:06)

Task 31 | 12:00:40 | Preparing package compilation: Finding packages to compile (00:00:00)

Task 31 | 12:00:40 | Creating missing vms: master/3544e363-4a12-488b-a2ea-8fb76a480575 (0)

Task 31 | 12:00:40 | Creating missing vms: worker/dff1daa3-9bf0-4e6a-90a3-4dde6286d972 (0)

Task 31 | 12:00:40 | Creating missing vms: worker/363b9529-d7f7-4d64-a389-84a9a13fcc91 (2)

Task 31 | 12:00:40 | Creating missing vms: worker/8908ebd2-d28f-4a9c-b184-c5379fa35824 (1) (00:01:10)

Task 31 | 12:02:01 | Creating missing vms: worker/dff1daa3-9bf0-4e6a-90a3-4dde6286d972 (0) (00:01:21)

Task 31 | 12:02:03 | Creating missing vms: worker/363b9529-d7f7-4d64-a389-84a9a13fcc91 (2) (00:01:23)

Task 31 | 12:02:04 | Creating missing vms: master/3544e363-4a12-488b-a2ea-8fb76a480575 (0) (00:01:24)

Task 31 | 12:02:11 | Updating instance master: master/3544e363-4a12-488b-a2ea-8fb76a480575 (0) (canary) (00:02:02)

Task 31 | 12:04:13 | Updating instance worker: worker/dff1daa3-9bf0-4e6a-90a3-4dde6286d972 (0) (canary) (00:02:29)

L Error: Action Failed get_task: Task 7af7fdd9-fa53-4dc7-5b2a-6c9de2e7df3c result: 1 of 4 pre-start scripts failed. Failed Jobs: kubelet. Successful Jobs: bosh-dns-enable, bosh-dns, syslog_forwarder.

Task 31 | 12:06:42 | Error: Action Failed get_task: Task 7af7fdd9-fa53-4dc7-5b2a-6c9de2e7df3c result: 1 of 4 pre-start scripts failed. Failed Jobs: kubelet. Successful Jobs: bosh-dns-enable, bosh-dns, syslog_forwarder.

Task 31 Started  Thu Apr 19 12:00:21 UTC 2018

Task 31 Finished Thu Apr 19 12:06:42 UTC 2018

Task 31 Duration 00:06:21

Task 31 error

Capturing task ’31’ output:

Expected task ’31’ to succeed but state is ‘error’

Exit code 1

root@pks-cli:~#

In this case, because the K8s VMs are running, we can actually log onto the K8s VM and see if we can figure out why it failed by looking at the logs. There are 3 steps to this. First we use a new bosh command, bosh deployments.

Step 4.1 – get list of deployments via BOSH CLI and locate the service instance

root@pks-cli:~# bosh deployments

Using environment ‘192.168.191.10’ as client ‘ops_manager’

Name                                                   Release(s)                          Stemcell(s)                                       

Team(s)                                         Cloud Config

pivotal-container-service-9b9223d27659ed342925         bosh-dns/1.3.0                      bosh-vsphere-esxi-ubuntu-trusty-go_agent/3468.28  

–                                               latest

cf-mysql/36.10.0

docker/30.1.4

kubo/0.13.0

kubo-etcd/8

kubo-service-adapter/1.0.0-build.3

on-demand-service-broker/0.19.0

pks-api/1.0.0-build.3

pks-helpers/19.0.0

pks-nsx-t/0.1.6

syslog-migration/10

uaa/54

service-instance_20474001-494e-43b1-aca4-ab8f788078b6  bosh-dns/1.3.0                      bosh-vsphere-esxi-ubuntu-trusty-go_agent/3468.28  pivotal-container-service-9b9223d27659ed342925  latest

docker/30.1.4

kubo/0.13.0

kubo-etcd/8

pks-helpers/19.0.0

pks-nsx-t/0.1.6

syslog-migration/10

2 deployments

Succeeded

 

Step 4.2 – Open an SSH session to the first worker on your K8s cluster, worker/0

Once the service instance is located, we can specify that deployment in the bosh command, and request SSH access to one of the VMs in the K8s cluster, in this case the first worker which is identified as worker/0.

root@pks-cli:~# bosh -d service-instance_20474001-494e-43b1-aca4-ab8f788078b6 ssh worker/0

Using environment ‘192.168.191.10’ as client ‘ops_manager’

Using deployment ‘service-instance_20474001-494e-43b1-aca4-ab8f788078b6’

Task 130. Done

Unauthorized use is strictly prohibited. All access and activity

is subject to logging and monitoring.

Welcome to Ubuntu 14.04.5 LTS (GNU/Linux 4.4.0-116-generic x86_64)

* Documentation:  https://help.ubuntu.com/

The programs included with the Ubuntu system are free software;

the exact distribution terms for each program are described in the

individual files in /usr/share/doc/*/copyright.

Ubuntu comes with ABSOLUTELY NO WARRANTY, to the extent permitted by

applicable law.

Last login: Thu Apr 19 15:03:05 2018 from 192.168.192.131

To run a command as administrator (user “root”), use “sudo <command>”.

See “man sudo_root” for details.

worker/bcfbd60c-667f-45a8-9791-0b5d0a7a9565:~$

 

Step 4.3 – Examine the log files of the worker

The log files we are interested in is /var/cap/sys/log/kubetlet/*.log as it is the kubelet component which failed during the previous canary step. You will need superuser privileges to view this file, so simply sudo su – to get that. I’ve truncated the log file here, fyi.

worker/bcfbd60c-667f-45a8-9791-0b5d0a7a9565:/var/vcap/sys/log$ sudo su –

worker/bcfbd60c-667f-45a8-9791-0b5d0a7a9565:~# pwd

/root

worker/bcfbd60c-667f-45a8-9791-0b5d0a7a9565:~# cd /var/vcap/sys/log/kubelet/

worker/bcfbd60c-667f-45a8-9791-0b5d0a7a9565:/var/vcap/sys/log/kubelet# ls -ltr

total 8

-rw-r—– 1 root root   21 Apr 19 12:24 pre-start.stdout.log

-rw-r—– 1 root root 2716 Apr 19 12:25 pre-start.stderr.log

worker/bcfbd60c-667f-45a8-9791-0b5d0a7a9565:/var/vcap/sys/log/kubelet# cat pre-start.stdout.log

rpcbind stop/waiting

worker/bcfbd60c-667f-45a8-9791-0b5d0a7a9565:/var/vcap/sys/log/kubelet# cat pre-start.stderr.log

+ CONF_DIR=/var/vcap/jobs/kubelet/config

+ PKG_DIR=/var/vcap/packages/kubernetes

+ source /var/vcap/packages/kubo-common/utils.sh

+ main

+ detect_cloud_config

<—snip —>

+ export GOVC_DATACENTER=CH-Datacenter

+ GOVC_DATACENTER=CH-Datacenter

++ cat /sys/class/dmi/id/product_serial

++ sed -e ‘s/^VMware-//’ -e ‘s/-/ /’

++ awk ‘{ print tolower($1$2$3$4 “-” $5$6 “-” $7$8 “-” $9$10 “-” $11$12$13$14$15$16) }’

+ local vm_uuid=423c6dcf-d47b-53a3-5a1e-2251d6bdc4b7

+ /var/vcap/packages/govc/bin/govc vm.change -e disk.enableUUID=1 -vm.uuid=423c6dcf-d47b-53a3-5a1e-2251d6bdc4b7

/var/vcap/packages/govc/bin/govc: Post https://10.27.51.106:443/sdk: dial tcp 10.27.51.106:443: i/o timeout

worker/bcfbd60c-667f-45a8-9791-0b5d0a7a9565:/var/vcap/sys/log/kubelet#

RESOLUTION #4: In this example, we see the K8s worker node getting and i/o timeout while trying to communicate with my vCenter server (that is my VC IP that I added to the PKS manifest in Pivotal Operations Manager in the Kubernetes Cloud Provider section. This access is required by the K8s VMs to manage/create/delete persistent volumes as VMDKs for the application containers that will run on K8s. In this case, the K8s cluster was deployed on a network segment that allowed it to communicate to BOSH/PKS VMs, but not to the vCenter Server/vSphere environment.

 

Other useful things to know – how to log into BOSH and PKS VMs

We have seen how we can access our K8s VMs if we need to troubleshoot but what about the BOSH and PKS VM. This is quite straight forward. What you will need to do is login the Pivotal Operations Manager, click on the tile of the VM that you wish to login to, select credentials, and from there you can get a login to a shell on each of the VMs. Login as user vcap, supply the password retrieved from the Ops Manager and then sudo if you need superuser privileges.

Here is where to get them for BOSH Director:

Here is where to get them for PKS/Pivotal Container Service:

The post PKS – Networking Setup Tips and Tricks appeared first on CormacHogan.com.

My highlights from KubeCon and CloudNativeCon, Europe 2018

$
0
0

This week I attended KubeCon and CloudNativeCon 2018 in Copenhagen. I had two primary goals during this visit: (a) find out what was happening with storage in the world of Kubernetes (K8s), and (b) look at how people were doing day 2 operations, monitoring, logging, etc, as well as the challenges one might encounter running K8s in production.

Let’s start with what is happening in storage. The first storage related session I went to was on Rook. This was a presentation by Jared Watts. According to Jared, the issues that Rook is trying to solve are to avoid vendor lock-in with storage and also address the issue of portability. If I understood correctly, Rook is about deploying, then provisioning distributed network storage such as CEPH and making it available to applications running on your K8s cluster.  However, Rook only does the provisioning and management of storage for K8s – it is not in the data path itself.

It seems that one of the key features of Rook is the fact that it is implemented with K8s operators. This was part of the keynote in day #2, entitled Stateful Application Operators. What this basically means is that kubectl (command line interface for running commands against K8s clusters) can be extended through bespoke CRDs – Customer Resource Definitions – to do specific application related stuff. So, through Rook, when kubectl is asked to create a cluster/pool/storage object, the Rook operator is watching for these sorts of events. On receipt of an event like this, Rook communicates to the storage layer to instantiate the necessary storage components as well as talking to kubelet (agent that runs on each K8s node) to make sure the necessary persistent volume is created/accessible/mounted via the Rook Volume Plugin. If the underlying storage is CEPH, then the idea is to use kubectl to be able create any file stores or block stores and then be able to consume them within K8s. If I caught the drift of Jared’s session, once the operator is up and running in the K8s cluster, we can even get it to create the CEPH cluster in the first place.

Currently Rook supports CEPH but Jared also mentioned that integration with CockroachDB, Minio and Nexenta are in the works. There is a framework for other storage providers who want to integrate into K8s. Rook is currently in alpha state and is an inception level project at the Cloud Native Computing Foundation (CNCF). Find out more about Rook here – https://rook.io/

My second storage related session was the Storage SIG or Special Interest Group, to give it its full title. This was presented by Saad Ali @ Google. This session primarily focused on the Container Storage Interface (CSI) effort. Part of this project is focused on taking the 3rd party volume plugins out of the Kubernetes tree and create a separate volume plugin system instead. There are a number of reasons for this. Many 3rd parties in the storage space do not want to release their code as open source, nor do they want to be tied to K8s release cycles. However, Saad said that they will not deprecate the current volume plugins, but I guess this is something you will need to keep in mind as you move towards later versions of Kubernetes this year, and if you already using one of the volume plugins. A number of other storage projects were discussed, such as the ability to migrate and share data between K8s clusters, data gravity (don’t move the data to the pods but place pods on the same node/host as the data), as well as how to do volume snapshots and the ability to convert these snapshots to stand-alone volumes later on. Some of these projects are planned for later this year. A question was asked about the DELL-EMC initiative called REX-Ray, and how this compares to the CSI initiative. REX-Ray has now pivoted, according to Saad, to being a framework where storage vendors can develop their own CSI plugins with minimal code. If you’d like to be involved in the K8s Storage SIG, you can find the details here.

To finish on the storage aspect of the conference, we had a walk around the solutions exchange to see which storage vendors had a presence. We met the guys from Portworx. I also met them at DockerCon ’17 in Austin and wrote about them here. On asking what is new, they now have the ability to snapshot volumes belonging to an application that is across multiple containers. It seems that they can also encrypt and replicate at a container volume level. So, some nice enhancements since we last spoke. What I omitted to ask is whether they will need to change anything to align with the new CSI approach.

We also caught up with the StorageOS guys. They have both a CSI and in-tree drivers for storage. They are following along with the CSI designs. One thing they are waiting on is the outcome of how CSI will decide on how to do snapshots, and once that is understood, they plan to implement it. Good conversations all round.

Now it was the turn of monitoring. Basically, Prometheus is king of all things metric related in the world of Kubernetes. They were a bunch of different sessions dedicated to it. It seems that all applications (at least that is how it appeared to me) export their metrics in a format that Prometheus can understand. There was even a session Matt Layher @ Digital Ocean who explained how to export metrics from your app in a way that Prometheus could consume them. More on Prometheus here: https://prometheus.io/.

We met a number of companies in the solutions exchange who were focused on monitoring K8s. We had some good conversations with both LightStep and DataDog, and Google themselves had their own session talking about OpenCensus, which if I understood correctly, is a single set of libraries to allow metrics and traces to be captured on any application. When you are trying to track a request across multiple systems and/or across multiple micro-services, this becomes quite important. Morgan Mclean of Google stated that they are working  on integration with different exporters to export these metrics and traces to, such as Zipkin, Jaeger, SignalFX and of course, Prometheus.

One interesting session that I attended was by Eduardo Silva @ Treasure Data. He talked us through how docker containers and Kubernetes both generate separate log streams, which you really need to unify to get the full picture of what is happening in your cluster. Eduardo introduced us to fluentd data collector, which is run as a daemon set on the cluster (daemon set is a special pod that runs on every node in the cluster). It pulls in the container logs (available from the file system / journald) and the K8s logs from the master node. Although we were caught for time, we were also introduced to fluentbit, a less memory intensive version of fluentd which also does log processing and forwarding. It has various application parsers, can exclude certain pods from logging, has enterprise connectors to the likes of Splunk and Kafka and can redirect its output to, you guessed it, Prometheus. More on fluentd here: https://www.fluentd.org/. More on fluentbit here: https://fluentbit.io/.

Having seen what people were doing in the metrics, tracing and monitoring space, it was also good to see some real life examples highlighting why this is so important. There were a number of sessions describing what could happen when things went wrong with K8s. During the closing keynote on day #1, Oliver Beattie of Monzo Bank in the UK described how a single API change between K8s 1.6 and 1.7 to handle a null reference for replicas led to an outage of over an hour at the bank. It was interesting to hear about the domino effect one minor change could have. On day #2, we heard from the guys at Oath, the digital content division of Verizon, including Yahoo and AOL. They discussed various issues they have had with K8s in production. I guess you could summarize this session along the lines of K8s has a lot of moving parts, and being slightly out of versions with different components can lead to some serious problems. Bugs are also an issue, as is human error. And of course, they shared how they were preventing these issues from happening again through the various guard-rails they were putting in place.

Among the other notable announcements, gVisor was one that caught my attention. This was announced by Aparna Sinha of Google. Aparna mentioned that one of the problems with containers is that they do not contain very well. To address this, they have developed gVisor. This is a very lightweight kernel that runs in user space of an OS. This will allow you to have sandbox’ed containers isolated by gVisor and still get the benefit of containers (resource sharing, quick start). The idea is that this will provide some sort of isolation which will prevent a container impacting the underlying node/host. More details can be found here: https://github.com/google/gvisor

Something else that caught my eye was Kata containers. It was tagged as the “speed of containers with the security of a VM”. This is essentially containers running as lightweight virtual machines on KVM. Although the project is managed by the OpenStack Foundation, they made it clear that it was just managed there and other than that, there was no other connection to OpenStack. To me, Kata containers did appear to have some similarities to vSphere integrated Containers from VMware.  You can learn more about Kata here – https://katacontainers.io/ .

Both of these features (gVisor and Kata containers) would suggest that there are still many benefits to be gained from running containers in VMs which can provide  advantages such as security and sandbox’ing over a bare-metal approach.

Lastly, it would be remiss of me if I didn’t mention the VMware/Kubernetes SIG (Special Interest Group). This is led by Steve Wong and Fabio Rapposelli. This is our forum for discussing items like best practices for running K8s on VMware. It is also where we outline our feature roadmap on where we plan to integrate. It is also where we look for input and feedback. It was emphasized very strongly that this was not just for vSphere. If you are running K8s on Fusion or Workstation, you are also most welcome to join. Check it out here.

Lots happening in this space, and lots of things to think about for sure as Kubernetes gains popularity. Thanks for reading.

The post My highlights from KubeCon and CloudNativeCon, Europe 2018 appeared first on CormacHogan.com.

Integrating NSX-T and Pivotal Container Services (PKS)

$
0
0

If you’ve been following along my recent blog posts, you’ll have seen that I have been spending some time ramping up on NSX-T and Pivotal Container Services (PKS). My long term goal was to see how these two products integrate together and to figure out the various moving parts. As I was very unfamiliar with both products, I took a piece-meal approach to both. First, I tried to get some familiarity with NSX-T. You can find my previous posts on NSX-T here:

 

During this time, I also tried to familiarize myself with PKS by initially deploying it out on a simple flat network, and deploying my first Kubernetes cluster. You can read about how I did that here:

 

So now it is time to see if I can get them both working together, by deploying PKS and a Kubernetes cluster, and have NSX-T provide the necessary networking pieces.

I’m not going to go through everything from scratch. Based on my previous configurations, what I am describing below are the additional steps needed when you wish to integrate PKS with NSX-T. If everything works out well, my PKS + NSX-T deployment should look something like the following:

As we go through the configuration steps, I will explain the pieces that need to be pre-configured, and then you will see the components that are automatically instantiated in NSX-T by the PKS integration. Suffice to say that the NSX Controllers, Manager and Edge need to be deployed in advance, as well as the BOSH/PKS CLI and the Pivotal Ops Manager. The Ops Manager will then be used to deploy the BOSH and PKS VMs.

 

1. NSX-T Additional Requirements

If you need a starting point, use my previous NSX-T posts to get you going. The following are the list of additional items that you will need to configure in NSX-T to use it with PKS.

1.1. First, you will need an IP address management (IPAM) IP Block that will be used by the Kubernetes namespaces on demand. In the NSX-T Manager, navigate to IPAM, then do a +ADD to create your IP Block. I used 172.16.0.0/16 to provide plenty /24 address ranges for my Kubernetes PODs. Here is what my IP Block looks like.

1.2. Now we need an IP Pool for Load Balancers. These LBs are created by NSX-T and provide access to the K8s namespaces. There will be one created for every K8s namespace. This is done automatically as well. Navigate to Inventory > Groups > IP Pools and add yours. These IP addresses need have a route out to speak to the rest of the infrastructure. In my case, I have used a range of IP that are on the same network as the range of IPs that will be used for the Kubernetes Cluster in part 2 (the Service Network). Be careful to keep unique ranges for both the IP Pool and Service Network, and don’t have any overlap of IPs if you choose this approach.

1.3. Next, change the T0 Logical Router Route Advertisement. Previously, I was only advertising NSX connected routes. Now I need to add All NAT Routes and All LB VIP Routes. To modify this configuration, select your T1 Logical Router, then Routing, then Route Advertisement. This is what my configuration now looks like:

That completes all of the settings needed in NSX-T. I don’t have to do anything else with T0 Static Routers or Route Distribution, or anything else. Let’s now see what changes we need to make in BOSH and PKS.

 

2. BOSH and PKS additional requirements

2.1 Again, I’m not going to describe everything here. Check out my previous PKS post to get you started. I am going to start with the necessary BOSH deployment changes in the Pivotal Ops Manager to integrate it with NSX-T. When I did my original PKS deployment on a flat network, I put the Management and Service networks on the same VLAN/flat network. Now I want my service network to use NSX-T, so I will need to change the network configuration in BOSH so that my second network is now using an NSX-T network. I will basically use the 191 network described in my Routing and BGP post. This network is essentially an NSX-T logical switch which is routed externally, and whose associated Logical Switch is visible as a port group in vSphere. After making the changes, this is what my network configuration looks like from BOSH.

One thing to note is the reserved IP range. As you just read in part 1, I also used this network for the Load Balancing IP pool in NSX-T. Make sure you do not overlap these ranges if you are using the same segment for both purpose. Reserved here means that BOSH won’t use them.

2.2 Now let’s turn our attention to the PKS configuration. First, select the Assign AZs and Networks, and tell PKS to use Logical Switch “191” network as its service network, i.e. the network on which the K8s master and worked are to be deployed on.

2.3. We also need to change the Networking configuration. When you are populating this form, you will need to copy and paste a number of Ids from NSX-T, namely the T0 Logical Router ID, the ID of the load-balancer IP Pool and the ID of the IPAM IP Block. We shall see shortly when we deploy our first K8s cluster, how these all get tagged within NSX-T by PKS. I’ve also chosen to disable SSL certificate verification.

2.4 The final step in PKS is to turn on NSX-T Validation errand under Errands on PKS. By default, this is off. If you do not turn this on, you will not see the necessary NSX-T components such as load-balancers and virtual servers, T1 logical routers or tags being created. So make sure you turn this on, as there is no check to make sure you did it, and K8s cluster deployments will simply fail.

That completes all the necessary steps in Pivotal Ops Manager. We are now ready to deploy our K8s cluster with NSX-T networking. Login to your PKS CLI VM, run the necessary “uaa” and “pks” commands, and create your first K8s cluster. Again, refer back to my previous posts on how to setup and use the correct CLI commands if you haven’t already done so. Here I am using the “bosh task” command to track the deployment. The first time around, there is a lot of activity as many of the required NSX-T components need to be compiled. So you will see VMs getting created to take care of this task, and then deleted before the K8s master and worker VMs are created. Finally, I add a local /etc/host entry the external K8s cluster hostname to match the master IP. This will allow us to run “kubectl” commands later. Alternatively, you could have added this to your DNS.

root@pks-cli:~# pks create-cluster k8s-cluster-01  --external-hostname pks-cluster-01  --plan small --num-nodes 3

Name:                     k8s-cluster-01
Plan Name:                small
UUID:                     2ff760bc-bcf7-4ed6-8fc9-7fcadc4797c9
Last Action:              CREATE
Last Action State:        in progress
Last Action Description:  Creating cluster
Kubernetes Master Host:   pks-cluster-01
Kubernetes Master Port:   8443
Worker Instances:         3
Kubernetes Master IP(s):  In Progress

root@pks-cli:~# bosh task

Using environment '192.50.0.60' as client 'ops_manager'
Task 22
Task 22 | 12:42:36 | Preparing deployment: Preparing deployment (00:00:05)
Task 22 | 12:42:52 | Preparing package compilation: Finding packages to compile (00:00:00)
Task 22 | 12:42:52 | Compiling packages: nsx-ncp/2e97940ecfef6248e47df6d33f33401201951c39
Task 22 | 12:42:52 | Compiling packages: nsx-cni/02e4a54d92110d142484280ea0155aa6a62d66c6
Task 22 | 12:42:52 | Compiling packages: python_nsx/1dc4a2a093da236d60bc50c4dc00c2465c91f40c
Task 22 | 12:43:41 | Compiling packages: nsx-cni/02e4a54d92110d142484280ea0155aa6a62d66c6 (00:00:49)
Task 22 | 12:44:21 | Compiling packages: nsx-ncp/2e97940ecfef6248e47df6d33f33401201951c39 (00:01:29)
Task 22 | 12:46:03 | Compiling packages: python_nsx/1dc4a2a093da236d60bc50c4dc00c2465c91f40c (00:03:11)
Task 22 | 12:46:03 | Compiling packages: openvswitch/7fdc416abb4b5c40051d365181b4f187ee5c6c6b (00:03:30)
Task 22 | 12:49:56 | Creating missing vms: worker/b4b10717-3ef6-4d37-b30f-de41740e8e0d (0)
Task 22 | 12:49:56 | Creating missing vms: master/7558c290-a7e9-484a-970b-19fd31b68f04 (0)
Task 22 | 12:49:56 | Creating missing vms: worker/3c2de0bc-4714-4027-8351-4b3a0a87f59a (2)
Task 22 | 12:49:56 | Creating missing vms: worker/92464c03-74f6-43c2-9170-9c86675c7ccb (1) (00:00:59)
Task 22 | 12:50:56 | Creating missing vms: master/7558c290-a7e9-484a-970b-19fd31b68f04 (0) (00:01:00)
Task 22 | 12:51:00 | Creating missing vms: worker/3c2de0bc-4714-4027-8351-4b3a0a87f59a (2) (00:01:04)
Task 22 | 12:51:07 | Creating missing vms: worker/b4b10717-3ef6-4d37-b30f-de41740e8e0d (0) (00:01:11)
Task 22 | 12:51:07 | Updating instance master: master/7558c290-a7e9-484a-970b-19fd31b68f04 (0) (canary) (00:01:14)
Task 22 | 12:52:21 | Updating instance worker: worker/b4b10717-3ef6-4d37-b30f-de41740e8e0d (0) (canary) (00:02:09)
Task 22 | 12:54:30 | Updating instance worker: worker/3c2de0bc-4714-4027-8351-4b3a0a87f59a (2) (00:01:57)
Task 22 | 12:56:27 | Updating instance worker: worker/92464c03-74f6-43c2-9170-9c86675c7ccb (1) (00:01:54)
Task 22 Started  Mon May 14 12:42:36 UTC 2018
Task 22 Finished Mon May 14 12:58:21 UTC 2018
Task 22 Duration 00:15:45
Task 22 done
Succeeded

root@pks-cli:~# pks cluster k8s-cluster-01

Name:                     k8s-cluster-01
Plan Name:                small
UUID:                     2ff760bc-bcf7-4ed6-8fc9-7fcadc4797c9
Last Action:              CREATE
Last Action State:        in progress
Last Action Description:  Instance provisioning in progress
Kubernetes Master Host:   pks-cluster-01
Kubernetes Master Port:   8443
Worker Instances:         3
Kubernetes Master IP(s):  In Progress

root@pks-cli:~# pks cluster k8s-cluster-01
Name:                     k8s-cluster-01
Plan Name:                small
UUID:                     2ff760bc-bcf7-4ed6-8fc9-7fcadc4797c9
Last Action:              CREATE
Last Action State:        succeeded
Last Action Description:  Instance provisioning completed
Kubernetes Master Host:   pks-cluster-01
Kubernetes Master Port:   8443
Worker Instances:         3
Kubernetes Master IP(s):  192.168.191.201

root@pks-cli:~# vi /etc/hosts
root@pks-cli:~# grep 192.168.191.201 /etc/hosts
192.168.191.201 pks-cluster-01

root@pks-cli:~# pks get-credentials k8s-cluster-01
Fetching credentials for cluster k8s-cluster-01.
Context set for cluster k8s-cluster-01.
You can now switch between clusters by using:
$kubectl config use-context <cluster-name>

root@pks-cli:~# kubectl config use-context k8s-cluster-01
Switched to context "k8s-cluster-01".

root@pks-cli:~# kubectl get nodes
NAME                                   STATUS    ROLES     AGE       VERSION
6d8f1eaf-90cd-4569-aeff-f0e522c177a5   Ready     <none>    10m       v1.9.6
a33c52be-23dd-4bc9-bcf0-119a82cef84b   Ready     <none>    8m        v1.9.6
ae69009f-9a40-42dc-9815-2fe453373aee   Ready     <none>    12m       v1.9.6

Success! Our K8s cluster has deployed. Now let’s take a look at what was build inside in NSX-T to accommodate this.

 

3. NSX-T Tagged Components

When we filled in the NSX-T information in the PKS network section in part 2.3 above, we added IDs for the T0 Logical Router, the IP Block and the load balancer IP pool. These all get tagged now by PKS in NSX-T. Let’s look at the tags first of all.

3.1 The T0-Logical Router has a single tag associated with it. The tag is ncp/shared_resource. NCP is the NSX-T Container Plug-in for Kubernetes.

3.2 The next item that has a tag is the IPAM IP Block. The tag is the same as the T0 Logical Router, ncp/shared_resource.

3.3 The final component that is tagged is the load-blancer IP pool. This has two tags, ncp/shared_resource and ncp/external.

What is important to remember is that if you decide to reinstall PKS, or change the IP Block and the LB pool, you will need to manually remove these tags from the existing components in NSX-T, or things may get very confused as there may be multiple components tagged and it will not know which one to use.

 

4. NSX-T Automatically Instantiated Components

Now we will take a look at the set of components that are automatically instantiated when PKS is integrated with NSX-T. We’ve already mentioned a number of these, so lets take a closer look.

4.1 First of all, there is an NSX-T Load Balancer and associated Virtual Servers. You will find this under Load Balancing > Load Balancers in the NSX-T Manager:

And this Load Balancer is backed by two Virtual Servers, one for http (port 80) and the other for https (port 443), which can be seen when you select the Virtual Servers link. This is what mine look like. Note that this also need an IP address from the load balancer IP Pool created in step 1.2.

4.2 The next thing we observe are a set of logical switches created for each of the Kubernetes namespaces. We see one for a load balancer, and the other 4 are for the 4 K8s namespaces (default, kube-public, kube-system and pks-infrastructure).

Just FYI, to compare this to the list all the namespaces, you can use the following kubectl command. Note some namespaces (default, kube-public) don’t have any PODs.

root@pks-cli:~# kubectl get ns
NAME                 STATUS    AGE
default              Active    20h
kube-public          Active    20h
kube-system          Active    20h
pks-infrastructure   Active    20h

root@pks-cli:~# kubectl get pods --all-namespaces
NAMESPACE            NAME                                    READY     STATUS    RESTARTS   AGE
kube-system          heapster-586c6bcbff-dq4q7               1/1       Running   0          3h
kube-system          kube-dns-5c996f55c8-l9fkw               3/3       Running   0          3h
kube-system          kubernetes-dashboard-55d97799b5-8fmd4   1/1       Running   0          3h
kube-system          monitoring-influxdb-744b677649-kkgmq    1/1       Running   0          3h
pks-infrastructure   nsx-ncp-79bbd9fc44-v99fp                1/1       Running   0          3h
pks-infrastructure   nsx-node-agent-8frbb                    2/2       Running   0          3h
pks-infrastructure   nsx-node-agent-96p28                    2/2       Running   1          3h
pks-infrastructure   nsx-node-agent-hsfdk                    2/2       Running   0          3h
root@pks-cli:~#

4.3 All of the logical switches are connected to the T0 Logical Switch by a set of T1 Logical Routers.

4.4 And of course, for these to reach the outside, they are linked to the T0 Logical Router via a set of router ports.

4.5 Last but not least, remember that the PODs have been assigned various addresses from the IPAM IP Block, and these are in the 172.16.0.0/16 range. These are not given direct access to the outside, but instead are SNAT’ed to our Load Balancer IP Pool. This is implemented on the T0 Logical Router. If we look at the NAT Rules on the T0 Logical Router, we see that this is also taken care of:

There we can see the different namespace/POD ranges each having a SNAT rule to map their internal IPs to the external load balancer assigned range. And this is all done automatically for you. Pretty neat!

 

Summary

Hopefully this has helped show you the power of integrating NSX-T with PKS. While there is a lot of initial setup to get right, the ease of rolling out multiple Kubernetes clusters with unique networking is greatly simplified by NSX-T.

Again this wouldn’t have been possible without guidance from a number of folks. Kudos again to Keith Lee of DELL-EMC (follow Keith on twitter for some upcoming blogs on this), and also Francis Guillier (Technical Product Manager) and Gaetano Borgione (PKS Architect) from our VMware Cloud Native Apps team. I’d also recommend checking out William Lam’s blog series on this, as well as Sam McGeown’s NSX-T 2.0 deployment series of blogs, both of which I relied on heavily. Thanks all!

The post Integrating NSX-T and Pivotal Container Services (PKS) appeared first on CormacHogan.com.


PKS Revisited – Project Hatchway / K8s vSphere Cloud Provider review

$
0
0

As I am going to be doing some talks around next-gen applications at this year’s VMworld event, I took the opportunity to revisit Pivotal Container Services (PKS) to take a closer look at how we can set persistent volumes on container based applications. Not only that, but I also wanted to leverage the vSphere Cloud Provider feature which is part of our Project Hatchway initiative. I’ve written about Project Hatchway a few times now, but in a nutshell this allows us to create persistent container volumes on vSphere storage, and at the same time set a storage policy on the volume. For example, when deploying on vSAN, you could select to protect the container volume using RAID-1, RAID-5 or RAID-6. OK, let’s get started. The following steps will explain how to dynamically provision a volume with a specific storage policy.

Obviously you will need to have a PKS environment, and there are some steps on how to do this in other posts on this site. PKS provisions Kubernetes cluster that can then be used for deploying container based applications. The container based application that I am going to use is a simple Nginx web server application, and I will create a persistent container volume (PV) that can be associated with the application to create some persistent content. There are two parts to the creation of a persistent volume. The first is the StorageClass which can be used to define the intended policy for a PV which will be dynamically provisioned. Here is the sample storage class manifest (yaml) file that I created:

root@pks-cli:~/nginx# cat nginx-storageclass.yaml
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: nginx-storageclass
provisioner: kubernetes.io/vsphere-volume
parameters:
  diskformat: thin
  hostFailuresToTolerate: "0"
  datastore: vsanDatastore
root@pks-cli:~/nginx#

As you can see, I have selected the vsanDatastore and my policy is NumberOfFailuresToTolerate = 0. I could of course have added other policy settings, but I just wanted to see it working, so I kept it simple. You will need to note the name of the StorageClass, as you will need to use that in the claim next. Here is what my persistent volume claim manifest (yaml) file looks like:

root@pks-cli:~/nginx# cat nginx-pvc-claim.yaml
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: nginx-pvc-claim
  annotations:
   volume.beta.kubernetes.io/storage-class: nginx-storageclass
spec:
  accessModes:
    - ReadWriteOnce
  resources:
  requests:
    storage: 2Gi
root@pks-cli:~/nginx#

Note that the storage class created previously is referenced here. Now it is time to create the manifest/yaml file for our Nginx application. You will see in that application’s manifest file where the StorageClass and PV are referenced.

root@pks-cli:~/nginx# cat nginx-harbor-lb-pv.yaml
apiVersion: v1
kind: Service
metadata:
  name: nginx
  labels:
    app: nginx
  namespace: default
spec:
  type: LoadBalancer
  ports:
  - port: 80
    protocol: TCP
    targetPort: 80
  selector:
    app: nginx
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  labels:
    app: nginx
  name: nginx
  namespace: default
spec:
  replicas: 1
  template:
    metadata:
      labels:
        app: nginx
      namespace: default
    spec:
      containers:
      - name: webserver
        image: harbor.rainpole.com/library/nginx
        ports:
        - containerPort: 80
        volumeMounts:
        - name: nginx-storageclass
        mountPath: /test
      volumes:
      - name: nginx-storageclass
        persistentVolumeClaim:
        claimName: nginx-pvc-claim
root@pks-cli:~/nginx#

A few things to point out with this manifest. I configured a LoadBalancer so that I can point my application is easily accessible from the outside world. Also I am using Harbor for my images and I am not downloading them from an external source. Now we are ready to start deploying out application with its persistent volume. Note kubectl is the client interface to the Kubernetes cluster. It talks to the API server on the master node. This then reads the manifest/yaml file and deploys the application on the cluster.

root@pks-cli:~/nginx# kubectl create -f nginx-storageclass.yaml
storageclass "nginx-storageclass" created


root@pks-cli:~/nginx# kubectl create -f nginx-pvc-claim.yaml
persistentvolumeclaim "nginx-pvc-claim" created


root@pks-cli:~/nginx# kubectl get sc
NAME                 PROVISIONER                    AGE
nginx-storageclass   kubernetes.io/vsphere-volume   10s


root@pks-cli:~/nginx# kubectl get pv
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS    CLAIM                     STORAGECLASS         REASON    AGE
pvc-99724675-8c10-11e8-939b-005056826ff1   2Gi        RWO            Delete           Bound     default/nginx-pvc-claim   nginx-storageclass             15s
root@pks-cli:~/nginx#

The PV is now setup. Let’s deploy our Nginx application, and get the PV mounted on /test in the container. We can also look at the kubectl describe command to see the events associated with the mount operation.

root@pks-cli:~/nginx# kubectl create -f nginx-harbor-lb-pv.yaml
service "nginx" created
deployment "nginx" created

root@pks-cli:~/nginx# kubectl describe pods
Name: nginx-65b4fcccd4-vqrxc
Namespace: default
Node: 86bfa120-77f7-4eb2-9783-87c9247da886/192.168.191.203
Start Time: Fri, 20 Jul 2018 12:50:27 +0100
Labels: app=nginx
        pod-template-hash=2160977780
Annotations: <none>
Status: Running
IP: 172.16.9.2
Controlled By: ReplicaSet/nginx-65b4fcccd4
Containers:
 webserver:
  Container ID: docker://57a3b709b1f18d60f9b4e7472c7f4c4b8657d8e233eedc25a2118740af83000b
  Image: harbor.rainpole.com/library/nginx
  Image ID: docker-pullable://harbor.rainpole.com/library/nginx@sha256:edad5e71815c79108ddbd1d42123ee13ba2d8050ad27cfa72c531986d03ee4e7
  Port: 80/TCP
  State: Running
   Started: Fri, 20 Jul 2018 12:50:31 +0100
  Ready: True
  Restart Count: 0
  Environment: <none>
  Mounts:
    /test from nginx-storageclass (rw)
    /var/run/secrets/kubernetes.io/serviceaccount from default-token-b62vf (ro)
Conditions:
  Type Status
  Initialized True
  Ready True
  PodScheduled True
Volumes:
  nginx-storageclass:
    Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName: nginx-pvc-claim
    ReadOnly: false
  default-token-b62vf:
    Type: Secret (a volume populated by a Secret)
    SecretName: default-token-b62vf
    Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: <none>
Events:
  Type Reason Age From Message
  ---- ------ ---- ---- -------
  Normal Scheduled <invalid> default-scheduler Successfully assigned nginx-65b4fcccd4-vqrxc to 86bfa120-77f7-4eb 2-9783-87c9247da886
  Normal SuccessfulMountVolume <invalid> kubelet, 86bfa120-77f7-4eb2-9783-87c9247da886 MountVolume.SetUp succeeded for volume "default-token-b62vf"
  Normal SuccessfulMountVolume <invalid> kubelet, 86bfa120-77f7-4eb2-9783-87c9247da886 MountVolume.SetUp succeeded for volume "pvc-99724675-8c10-11e8-93 9b-005056826ff1"
  Normal Pulling <invalid> kubelet, 86bfa120-77f7-4eb2-9783-87c9247da886 pulling image "harbor.rainpole.com/library/nginx"
  Normal Pulled <invalid> kubelet, 86bfa120-77f7-4eb2-9783-87c9247da886 Successfully pulled image "harbor.rainpole.com/library/nginx"
  Normal Created <invalid> kubelet, 86bfa120-77f7-4eb2-9783-87c9247da886 Created container
  Normal Started <invalid> kubelet, 86bfa120-77f7-4eb2-9783-87c9247da886 Started container
root@pks-cli:~/nginx#

Excellent. It does look like the volume has mounted (slide the window from left to right to see the full event output). We can now open a shell session to the container and verify.

root@pks-cli:~/nginx# kubectl get pods
NAME                   READY STATUS  RESTARTS AGE
nginx-65b4fcccd4-vqrxc 1/1   Running 0        2m

root@pks-cli:~/nginx# kubectl exec -it nginx-65b4fcccd4-vqrxc -- /bin/bash

root@nginx-65b4fcccd4-vqrxc:/# mount | grep /test
/dev/sdd on /test type ext4 (rw,relatime,data=ordered)
root@nginx-65b4fcccd4-vqrxc:/#

You will also notice a number of events taking place on vSphere at this stage. The creation of the PV requires the creation of a temporary VM, so you will notice events of this nature happening:

Once the volume is created, the PV should be visible in the kubevols folder. Since I specified the vsanDatastore in the StorageClass manifest, that is where it will appear.

Now let’s try to do something interesting to show that the data is persisted in my PV. With the shell prompt to the container application, we will do the following. We will navigate to /usr/share/nginx where the default nginx landing page is found, we will copy it to /test where our PV is mounted. We will make changes to the index.html file and save it off. Next we will stop the application, modify its manifest file so that our PV is now mounted on /usr/share/nginx/ and when the application is accessed, it should show us the modified landing page.

root@nginx-68558fb67c-mdlh5:/# cd /usr/share/nginx
root@nginx-68558fb67c-mdlh5:/usr/share/nginx# ls
html
root@nginx-68558fb67c-mdlh5:/usr/share/nginx# cp -r html/ /test/
root@nginx-68558fb67c-mdlh5:/usr/share/nginx# cd /test/html
root@nginx-68558fb67c-mdlh5:/test/html# ls
50x.html index.html

root@nginx-68558fb67c-mdlh5:/test/html# mv index.html oldindex.html
root@nginx-68558fb67c-mdlh5:/test/html# sed -e 's/nginx/cormac nginx/g' oldindex.html >> index.html
root@nginx-68558fb67c-mdlh5:/test/html# exit

root@pks-cli:~/nginx# kubectl delete -f nginx-harbor-lb-pv.yaml
service "nginx" deleted
deployment "nginx" deleted
root@pks-cli:~/nginx#

Now change the manifest file as follows.

from:

volumeMounts: 
- name: nginx-storageclass 
mountPath: /test 

to:

volumeMounts: 
- name: nginx-storageclass 
mountPath: /usr/share/nginx

 

Launch the application once more:

root@pks-cli:~/nginx# kubectl create -f nginx-harbor-lb-pv.yaml
service "nginx" created
deployment "nginx" created

 

And now the Nginx landing page should have your changes persisted, proving that it is storing data even when the application goes away.

Now, when I deployed this application, I requested that the policy should be FTT=0. How do I confirm that. First thing – let’s figure out which worker VM our application is running on. The kubectl describe pod command has the IP address in the Node field. Now we just need to check the IP address of our K8s worker VMs which were deployed by PKS. In my case, it is the VM with the name beginning with vm-617f.

For me, the easiest way now is to use RVC (old habits die hard). If I login and launch the Ruby vSphere Console on my vCenter server, then navigate to VMs, I can list all of the storage objects associated with a particular VM, in this case, my K8s worker VM. As I am not doing anything else with PVs, I expect that there will be only one. And from there I can verify the policy.

/vcsa-06/CH-Datacenter/vms/pcf_vms> ls
0 7b70b6a2-4ae5-42b1-83f9-5dc189881c99/
1 vm-25370103-3c8b-4393-ba4c-e46873515da3: poweredOn
2 vm-5e0fe3c7-5761-41b6-8ecb-1079e853378c: poweredOn
3 vm-b8bb2ff6-6e6c-4830-98d8-f6f8b62d5522: poweredOn
4 vm-6b3c049f-710d-44dd-badb-bbdef4c81ceb: poweredOn
5 vm-205d59bd-5646-4300-8a51-78fe1663b899: poweredOn
6 vm-617f8efd-b3f5-490d-8f8d-9dc9c58cea0f: poweredOn

/vcsa-06/CH-Datacenter/vms/pcf_vms> vsan.vm_object_info 6
VM vm-617f8efd-b3f5-490d-8f8d-9dc9c58cea0f:
VM has a non-vSAN datastore
Namespace directory
Couldn't find info about DOM object 'vm-617f8efd-b3f5-490d-8f8d-9dc9c58cea0f'
Disk backing: [vsanDatastore] fef94d5b-fa8b-491f-bf0a-246e962f4850/kubernetes-dynamic-pvc-99724675-8c10-11e8-939b-005056826ff1.vmdk
DOM Object: 6ec8515b-8520-fc51-dc46-246e962c2408 (v6, owner: esxi-dell-e.rainpole.com, proxy owner: None, policy: hostFailuresToTolerate = 0, CSN = 7)
Component: 6ec8515b-281b-7552-4835-246e962c2408 (state: ACTIVE (5), host: esxi-dell-g.rainpole.com, capacity: naa.500a07510f86d693, cache: naa.5001e82002675164,
votes: 1, usage: 0.0 GB, proxy component: false)
/vcsa-06/CH-Datacenter/vms/pcf_vms>

That looks good to me (again, scroll left to right to see the full output). I hope that helps you get started with our vSphere Cloud Provider for PVs on PKS (and of course native Kubernetes).

 

Caveat

During my testing, I had issues creating the PV. I kept hitting the following issues (scroll to see the full error).

root@pks-cli:~#kubectl describe pvc
Name:          cormac-slave-claim
Namespace:     default
StorageClass:  thin-disk
Status:        Pending
Volume:
Labels:        <none>
Annotations:   kubectl.kubernetes.io/last-applied-configuration={"apiVersion":"v1","kind":"PersistentVolumeClaim",\
               "metadata":{"annotations":{"volume.beta.kubernetes.io/storage-class":"thin-disk"},"name":"cormac-slav...
               volume.beta.kubernetes.io/storage-class=thin-disk
               volume.beta.kubernetes.io/storage-provisioner=kubernetes.io/vsphere-volume
Finalizers:    []
Capacity:
Access Modes:
Events:
  Type     Reason              Age                     From                         Message
  ----     ------              ----                    ----                         -------
  Warning  ProvisioningFailed  <invalid> (x2 over 6s)  persistentvolume-controller  Failed to \
  provision volume with StorageClass "thin-disk": folder '/CH-Datacenter/vm/pcf_vms/7b70b6a2-4ae5-42b1-83f9-5dc189881c99' \
  not found
root@pks-cli:~# 

I’m unsure if this was something that I failed to do during setup, or if it is actually an issue. We’re investigating currently. However, once I manually created the folder with the ID in the warning under the folder, everything worked absolutely fine.

 

Shameless Plug

If you are planning to attend VMworld 2018, I’ll be running a whole breakout session on next-gen applications running on vSphere/vSAN, and will be paying particular attention to applications that require persistent storage. I’ll be hosting this session with our Storage and Availability CTO and VMware Fellow, Christos Karamanolis. The session is HCI1338BU and is titled HCI: The ideal operating environment for Cloud Native Applications. Hope you can make it. Along the same lines, my colleagues Frank Denneman and Michael Gasch are presenting CNA1553BU Deep Dive: The value of Running Kubernetes on vSphere. This should be another great session if Kubernetes and/or next-gen applications are your thing. Frank has done a great write-up here on these sessions and what you can expect.

The post PKS Revisited – Project Hatchway / K8s vSphere Cloud Provider review appeared first on CormacHogan.com.

Kubernetes on vSphere – Virtually Speaking Podcast Episode #86

$
0
0

I was delighted to be asked along to the latest Virtually Speaking podcast last week. I was invited to attend alongside the very smart Frank Denneman. Also joining us was Myles “vOdgeball sports scholarship” Gray (you’ll need to listen to the podcast to get that joke). And of course, we were joined by talented podcast hosts, Pete and John. Pete was in top form after his 1 month vac^H^H^H working in Italy (touché my friend :-D) The podcast was really a conversation about why vSphere and vSAN are ideal platforms for next-gen apps and containerized application, particularly when they are deployed on a Kubernetes orchestration/cluster framework. We gave our reasons why we feel running K8s on vSphere and vSAN is far superior to a bare-metal approach. This really was a precursor to the sessions that Frank, Myles and I have at VMworld 2018, where we plan to go into detail on the advantages of using vSphere and vSAN for Kubernetes and containerized applications. Hope you enjoy the podcast, and if you are coming to VMworld later this month, hope you can make it to our sessions. There is a link to all 3 of our sessions here. You can listen to the podcast by clicking on the image below.

Kubernetes on VMware vSphere

 

The post Kubernetes on vSphere – Virtually Speaking Podcast Episode #86 appeared first on CormacHogan.com.

Kubernetes, Hadoop, Persistent Volumes and vSAN

$
0
0

At VMworld 2018, one of the sessions I presented on was running Kubernetes on vSphere, and specifically using vSAN for persistent storage. In that presentation (which you can find here), I used Hadoop as a specific example, primarily because there are a number of moving parts to Hadoop. For example, there is the concept of Namenode and a Datanode. Put simply, a Namenode provides the lookup for blocks, whereas Datanodes store the actual blocks of data. Namenodes can be configured in a HA pair with a standby Namenode, but this requires a lot more configuration and resources, and introduces additional components such as journals and zookeeper to provide high availability. There is also the option of a secondary Namenode, but this does not provide high availability. On the other hand, datanodes have their own built-in replication. The presentation showed how we could use vSAN to provide additional resilience to a namenode, but use less capacity and resources if a component like the Datanode has its own built-in protection.

A number of people have asked me how they could go about setting up such a configuration for themselves. They were especially interested in how to consume different policies for the different parts of the application.

In this article I will take a very simple namenode and Datanode configuration that will use persistent volumes with different policies on our vSAN datastore.

We will also show how through the use of a Statefulset, we can very easily scale the number of Datanodes (and their persistent volumes) from 1 to 3.

Helm Charts

To begin with, I tried to use the Helm charts to deploy Hadoop. You can find it here. While this was somewhat successful, I was unable to figure out a way to scale the deployment’s persistent storage. Each attempt to scale the statefulset successfully created new Pods, but all of the Pods tried to share the same persistent volume. And although you can change the persistent volume from ReadWriteOnce (single Pod access) to ReadWriteMany (multiple Pod access), this is not allowed on vSAN even though K8s does provide a config option for multiple Pods to share the same PV. On vSAN, Pods cannot share PVs at the time of writing.

However, if you simply want to deploy a single Namenode and a single Datanode with persistent volume, this stable Hadoop Helm chart will work just fine for you. Its also quite possible that this is achievable with the Helm chart. I’m afraid my limited knowledge of Helm meant that in order to create unique PVs for each Pod as I scaled my Datanodes, I had to look for an alternate method. This  led me to a Hadoop Cluster on Kubernetes using flokkr docker images that was already available on GitHub.

Hadoop Cluster on Kubernetes

Let’s talk about the flokkr Hadoop cluster. In this solution, there were only two YAML files; the first was the config.yaml which passed in a bunch of environment variables to our Hadoop deployment (core-site.xml, yarn-site.xml, etc) via a configMap (more on this shortly). The second held details about the services and Statefulsets for the Namenode and Datanode. I will modify the  Statefulsets so that the /data directory on the nodes will be placed on persistent volumes rather than using a local filesystem within the container (which is not persistent).

Use of configMap

I’ll be honest – this was the first time I has used the configMap construct, but it looks pretty powerful. In the config.yaml, there are entries for 5 configuration files required by Hadoop when the environment is bootstrapped – the core-site.xml, the hdfs-site.xml, the log4j.properties file, the mapred-site.xml and finally the yarn-site.xml. These 5 different piece of data can be seen when we query the configmap.

root@srvr:~/hdfs# kubectl get configmaps
NAME         DATA   AGE
hadoopconf   5      88m

When we look at the hdfs.yaml, we will see how this configMap is referenced, and how these configuration files are made available on a specific folder/directory in the application’s containers when the application is launched. First, lets look at the StatefulSet.spec.template.spec.containers.volumeMounts entry:

volumeMounts:
            - name: config
              mountPath: "/opt/hadoop/etc/hadoop"

This is where the files referenced in config.yaml entries are going to be placed when the container/application is launched. If we look at that volume in more detail in StatefulSet.spec.template.spec.containers.volumes we see the following:

      volumes:
        - name: config
          configMap:
            name: hadoopconf

So the configMap in config.yaml, which is named hadoopconf, will placed these 5 configuration files on “/opt/hadoop/etc/hadoop” when the application launches. The application contains and init/bootstrap script which will deploy hadoop using the configuration in these files. A little bit complicated, but sort of neat. You do not need to make any changes here. Instead, we want to change the /data folder to use Persistent Volumes. Let’s see how to do that next.

Persistent Volumes – changes to hfds.yaml

Now there are a few changes needed to the  hdfs.yaml file to get it to use persistent volumes. We will need to make some changes to the StatefulSet for the Datanode and the Namenode. First, we will need to add a new mount point for the PV, which for Hadoop, will of course be /data. This will appear in StatefulSet.spec.template.spec.containers.volumeMounts and look like the following:

          volumeMounts:
            - name: data
              mountPath: "/data"
              readOnly: false

Next we will need to make some specifications around the volume that we are going to use. For the PV, we are not using a StatefulSet.spec.template.spec.containers.volumes as used by the configMap. Instead we will use the StatefulSet.spec.template.volumeClaimTemplate. This is what that would look like. First we have the Datanode entry, and then the Namenode entry. The differences are the storage class entries and of course the volume sizes. This is how we will use different storage policies on vSAN for the Datanode and Namenode.

volumeClaimTemplates:
  - metadata:
      name: data
      annotations:
        volume.beta.kubernetes.io/storage-class: hdfs-dn-sc
    spec:
      accessModes: [ "ReadWriteOnce" ]
      resources:
       requests:
         storage: 200Gi

volumeClaimTemplates:
  - metadata:
      name: data
      annotations:
        volume.beta.kubernetes.io/storage-class: hdfs-nn-sc
    spec:
      accessModes: [ "ReadWriteOnce" ]
      resources:
       requests:
         storage: 50Gi

Storage Classes

We saw in the hdfs.yaml that the Datanode and Namenode referenced two different storage classes. Let’s look at those next. First is the Namenode.

kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: hdfs-nn-sc
provisioner: kubernetes.io/vsphere-volume
parameters:
    diskformat: thin
    storagePolicyName: gold
    datastore: vsanDatastore

As you can see, the Namenode storageClass uses a gold policy on the vSAN datastore. If we compare it to the Datanode storageClass, you will see that this is the only difference (other than the name). The provisioner is the VMware vSphere Cloud Provider (VCP) which is currently included in K8s distributions but will soon be decoupled along with other built-in drivers as part of the CSI initiative.

kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: hdfs-dn-sc
provisioner: kubernetes.io/vsphere-volume
parameters:
    diskformat: thin
    storagePolicyName: silver
    datastore: vsanDatastore

vSAN Policies – storagePolicyName

Now you may be wondering why we are using two different policies. This is the beauty of vSAN. As we shall see shortly, the datanodes have their own built-in replication mechanism (3 copies of each block are stored). Thus, it is possible to deploy the Datanode volumes on vSAN without any underlying protection from vSAN (e.g. RAID-0) by simply specifying a policy (silver). This is because if a Datanode fails, there are still two copies of the data blocks.

However if we take the Namenode, it has no such built-in replication or protection feature. Therefore we could offer to protect the underlying persistent volume using an appropriate vSAN policy (e.g. RAID-1, RAID-5). In my example, the gold policy provides this extra protection for the Namenode volumes.

Deployment of Hadoop

Now there are only a few steps to deploying the Hadoop application. (1) Create the storage classes, (2) create the configMap in the config.yaml and (3) create the services and Statefulsets in the hdfs.yaml. All of these can be done by the kubectl create -f “file.yaml” commands.

Post Deployment Checks

The following are a bunch of commands that can be used to validate the state of the constituent components of the application after deployment.

root@srvr:~/cormac-hadoop# kubectl get nodes
NAME                STATUS                     ROLES    AGE   VERSION
kubernetes-master   Ready,SchedulingDisabled   <none>   46h   v1.12.0
kubernetes-node1    Ready                      <none>   46h   v1.12.0
kubernetes-node2    Ready                      <none>   46h   v1.12.0
kubernetes-node3    Ready                      <none>   46h   v1.12.0

root@srvr:~/cormac-hadoop# kubectl get configmaps 
NAME         DATA   AGE 
hadoopconf   5      150m

root@srvr:~/cormac-hadoop# kubectl get sc
NAME         PROVISIONER                    AGE
hdfs-sc-dn   kubernetes.io/vsphere-volume   40m
hdfs-sc-nn   kubernetes.io/vsphere-volume   40m

root@srvr:~/cormac-hadoop# kubectl get svc
NAME         TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)     AGE
datanode     ClusterIP   None         <none>        80/TCP      15m
kubernetes   ClusterIP   10.0.0.1     <none>        443/TCP     46h
namenode     ClusterIP   None         <none>        50070/TCP   15m

root@srvr:~/cormac-hadoop# kubectl get statefulsets
NAME       DESIRED   CURRENT   AGE
datanode   1         1         16m
namenode   1         1         16m

root@srvr:~/cormac-hadoop# kubectl get pvc
NAME                 STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
dn-data-datanode-0   Bound    pvc-5c5f5c5a-f88e-11e8-9e4a-005056970672   200Gi      RWO            hdfs-sc-dn     10m
nn-data-namenode-0   Bound    pvc-5c68ed9b-f88e-11e8-9e4a-005056970672   50Gi       RWO            hdfs-sc-nn     10m

root@srvr:~/cormac-hadoop# kubectl get pv
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                        STORAGECLASS   REASON   AGE
pvc-5c5f5c5a-f88e-11e8-9e4a-005056970672   200Gi      RWO            Delete           Bound    default/dn-data-datanode-0   hdfs-sc-dn              10m
pvc-5c68ed9b-f88e-11e8-9e4a-005056970672   50Gi       RWO            Delete           Bound    default/nn-data-namenode-0   hdfs-sc-nn              10m

root@srvr:~/cormac-hadoop# kubectl get statefulsets
NAME       DESIRED   CURRENT   AGE
datanode   1         1         21m
namenode   1         1         21m

Post deploy Hadoop check

Since this is Hadoop, we can very quickly use some Hadoop utilities to check the state of our Hadoop Cluster running on Kubernetes. Here is the output of one such command which generates a report on the HDFS filesystem and also reports on the Datanodes. Note the capacity at the beginning, as we will return to this after scale out.

root@srvr:~/cormac-hadoop# kubectl exec -n default -it namenode-0 \
-- /opt/hadoop/bin/hdfs dfsadmin -report
Configured Capacity: 210304475136 (195.86 GB)
Present Capacity: 210224738304 (195.79 GB)
DFS Remaining: 210224709632 (195.79 GB)
DFS Used: 28672 (28 KB)
DFS Used%: 0.00%
Replicated Blocks:
     Under replicated blocks: 0
     Blocks with corrupt replicas: 0
     Missing blocks: 0
     Missing blocks (with replication factor 1): 0
     Pending deletion blocks: 0
Erasure Coded Block Groups: 
     Low redundancy block groups: 0
     Block groups with corrupt internal blocks: 0
     Missing block groups: 0
     Pending deletion blocks: 0

-------------------------------------------------
Live datanodes (1):

Name: 172.16.96.5:9866 (172.16.96.5)
Hostname: datanode-0.datanode.default.svc.cluster.local
Decommission Status : Normal
Configured Capacity: 210304475136 (195.86 GB)
DFS Used: 28672 (28 KB)
Non DFS Used: 62959616 (60.04 MB)
DFS Remaining: 210224709632 (195.79 GB)
DFS Used%: 0.00%
DFS Remaining%: 99.96%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Wed Dec 05 13:17:29 GMT 2018
Last Block Report: Wed Dec 05 13:05:56 GMT 2018

root@srvr:~/cormac-hadoop#

Post deploy check on config files

We mentioned that the purpose of the configMap in the config.yaml was to put in place a set of configuration files that can be used to bootstrap Hadoop. This will show you how to verify that this step has indeed occurred (should you need to troubleshoot at any point). First we will open a bash shell to the Namenode, and then we can navigate to the directory mount point highlighted in the hdfs.yaml to verify that the files exist, which indeed they do in this case.

root@srvr:~/cormac-hadoop# kubectl exec -n default -it namenode-0 \
-- /bin/bash

bash-4.3$ cd /opt/hadoop/etc/hadoop                                                         
bash-4.3$ ls
core-site.xml     hdfs-site.xml     log4j.properties  mapred-site.xml   yarn-site.xml
bash-4.3$ cat core-site.xml
<configuration>
<property><name>fs.defaultFS</name><value>hdfs://namenode-0:9000</value></property>
</configuration>bash-4.3$

Simply type exit to return from the container shell.

Scale out the Datanode statefulSet

We are going to start with the current configuration of 1 Datanode statefulset and 1 Namenode statefulset and we will scale the Datanode statefulset to 3. This should create additional Pods as well as additional persistent volumes and persistent volume claims. Let’s see.

root@srvr:~/cormac-hadoop# kubectl get statefulsets
NAME       DESIRED   CURRENT   AGE
datanode   1         1         21m
namenode   1         1         21m

root@srvr:~/cormac-hadoop# kubectl scale --replicas=3 statefulsets/datanode
statefulset.apps/datanode scaled

root@srvr:~/cormac-hadoop# kubectl get pvc
NAME                 STATUS    VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
dn-data-datanode-0   Bound     pvc-5c5f5c5a-f88e-11e8-9e4a-005056970672   200Gi      RWO            hdfs-sc-dn     16m
dn-data-datanode-1   Pending                                                                        hdfs-sc-dn     7s
nn-data-namenode-0   Bound     pvc-5c68ed9b-f88e-11e8-9e4a-005056970672   50Gi       RWO            hdfs-sc-nn     16m

root@srvr:~/cormac-hadoop# kubectl get pv
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                        STORAGECLASS   REASON   AGE
pvc-5c5f5c5a-f88e-11e8-9e4a-005056970672   200Gi      RWO            Delete           Bound    default/dn-data-datanode-0   hdfs-sc-dn              16m
pvc-5c68ed9b-f88e-11e8-9e4a-005056970672   50Gi       RWO            Delete           Bound    default/nn-data-namenode-0   hdfs-sc-nn              16m
pvc-9cd769f7-f890-11e8-9e4a-005056970672   200Gi      RWO            Delete           Bound    default/dn-data-datanode-1   hdfs-sc-dn              6s

root@srvr:~/cormac-hadoop# kubectl get pods
NAME         READY   STATUS              RESTARTS   AGE
datanode-0   1/1     Running             0          16m
datanode-1   0/1     ContainerCreating   0          24s
namenode-0   1/1     Running             0          16m

root@srvr:~/cormac-hadoop# kubectl get statefulsets
NAME       DESIRED   CURRENT   AGE
datanode   3         2         22m
namenode   1         1         22m

So changes are occurring, but they will take a little time. Let’s retry those commands again now that some time has passed.

root@srvr:~/cormac-hadoop# kubectl get statefulsets
NAME       DESIRED   CURRENT   AGE
datanode   3         3         23m
namenode   1         1         23m

root@srvr:~/cormac-hadoop# kubectl get pvc
NAME                 STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
dn-data-datanode-0   Bound    pvc-5c5f5c5a-f88e-11e8-9e4a-005056970672   200Gi      RWO            hdfs-sc-dn     18m
dn-data-datanode-1   Bound    pvc-9cd769f7-f890-11e8-9e4a-005056970672   200Gi      RWO            hdfs-sc-dn     2m12s
dn-data-datanode-2   Bound    pvc-b7ce55f9-f890-11e8-9e4a-005056970672   200Gi      RWO            hdfs-sc-dn     87s
nn-data-namenode-0   Bound    pvc-5c68ed9b-f88e-11e8-9e4a-005056970672   50Gi       RWO            hdfs-sc-nn     18m

root@srvr:~/cormac-hadoop# kubectl get pv
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                        STORAGECLASS   REASON   AGE
pvc-5c5f5c5a-f88e-11e8-9e4a-005056970672   200Gi      RWO            Delete           Bound    default/dn-data-datanode-0   hdfs-sc-dn              18m
pvc-5c68ed9b-f88e-11e8-9e4a-005056970672   50Gi       RWO            Delete           Bound    default/nn-data-namenode-0   hdfs-sc-nn              18m
pvc-9cd769f7-f890-11e8-9e4a-005056970672   200Gi      RWO            Delete           Bound    default/dn-data-datanode-1   hdfs-sc-dn              2m3s
pvc-b7ce55f9-f890-11e8-9e4a-005056970672   200Gi      RWO            Delete           Bound    default/dn-data-datanode-2   hdfs-sc-dn              86s

root@srvr:~/cormac-hadoop# kubectl get pods
NAME         READY   STATUS    RESTARTS   AGE
datanode-0   1/1     Running   0          18m
datanode-1   1/1     Running   0          2m19s
datanode-2   1/1     Running   0          94s
namenode-0   1/1     Running   0          18m
root@srvr:~/cormac-hadoop# 
root@srvr:~/cormac-hadoop# kubectl get pvc
NAME                 STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
dn-data-datanode-0   Bound    pvc-5c5f5c5a-f88e-11e8-9e4a-005056970672   200Gi      RWO            hdfs-sc-dn     18m
dn-data-datanode-1   Bound    pvc-9cd769f7-f890-11e8-9e4a-005056970672   200Gi      RWO            hdfs-sc-dn     2m12s
dn-data-datanode-2   Bound    pvc-b7ce55f9-f890-11e8-9e4a-005056970672   200Gi      RWO            hdfs-sc-dn     87s
nn-data-namenode-0   Bound    pvc-5c68ed9b-f88e-11e8-9e4a-005056970672   50Gi       RWO            hdfs-sc-nn     18m

root@srvr:~/cormac-hadoop# kubectl get pv
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                        STORAGECLASS   REASON   AGE
pvc-5c5f5c5a-f88e-11e8-9e4a-005056970672   200Gi      RWO            Delete           Bound    default/dn-data-datanode-0   hdfs-sc-dn              18m
pvc-5c68ed9b-f88e-11e8-9e4a-005056970672   50Gi       RWO            Delete           Bound    default/nn-data-namenode-0   hdfs-sc-nn              18m
pvc-9cd769f7-f890-11e8-9e4a-005056970672   200Gi      RWO            Delete           Bound    default/dn-data-datanode-1   hdfs-sc-dn              2m3s
pvc-b7ce55f9-f890-11e8-9e4a-005056970672   200Gi      RWO            Delete           Bound    default/dn-data-datanode-2   hdfs-sc-dn              86s

root@srvr:~/cormac-hadoop# kubectl get pods
NAME         READY   STATUS    RESTARTS   AGE
datanode-0   1/1     Running   0          18m
datanode-1   1/1     Running   0          2m19s
datanode-2   1/1     Running   0          94s
namenode-0   1/1     Running   0          18m

Now we can see how the Datanode has scaled out with additional Pods and storage.

Post scale-out application check

And now for the final step – let’s check to see if the HDFS has indeed scaled out with those new Pods and PVs. We will run the same command as before and get an updated report from the application. Note the updated capacity figure and the additional Datanodes.

root@srvr:~/cormac-hadoop# kubectl exec -n default -it namenode-0 \
-- /opt/hadoop/bin/hdfs dfsadmin -report
Configured Capacity: 630913425408 (587.58 GB)
Present Capacity: 630674206720 (587.36 GB)
DFS Remaining: 630674128896 (587.36 GB)
DFS Used: 77824 (76 KB)
DFS Used%: 0.00%
Replicated Blocks:
     Under replicated blocks: 0
     Blocks with corrupt replicas: 0
     Missing blocks: 0
     Missing blocks (with replication factor 1): 0
     Pending deletion blocks: 0
Erasure Coded Block Groups:
     Low redundancy block groups: 0
     Block groups with corrupt internal blocks: 0
     Missing block groups: 0
     Pending deletion blocks: 0

-------------------------------------------------
Live datanodes (3):

Name: 172.16.86.5:9866 (172.16.86.5)
Hostname: datanode-2.datanode.default.svc.cluster.local
Decommission Status : Normal
Configured Capacity: 210304475136 (195.86 GB)
DFS Used: 24576 (24 KB)
Non DFS Used: 62963712 (60.05 MB)
DFS Remaining: 210224709632 (195.79 GB)
DFS Used%: 0.00%
DFS Remaining%: 99.96%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Wed Dec 05 13:24:37 GMT 2018
Last Block Report: Wed Dec 05 13:22:34 GMT 2018

Name: 172.16.88.2:9866 (172.16.88.2)
Hostname: datanode-1.datanode.default.svc.cluster.local
Decommission Status : Normal
Configured Capacity: 210304475136 (195.86 GB)
DFS Used: 24576 (24 KB)
Non DFS Used: 62963712 (60.05 MB)
DFS Remaining: 210224709632 (195.79 GB)
DFS Used%: 0.00%
DFS Remaining%: 99.96%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Wed Dec 05 13:24:39 GMT 2018
Last Block Report: Wed Dec 05 13:21:57 GMT 2018

Name: 172.16.96.5:9866 (172.16.96.5)
Hostname: datanode-0.datanode.default.svc.cluster.local
Decommission Status : Normal
Configured Capacity: 210304475136 (195.86 GB)
DFS Used: 28672 (28 KB)
Non DFS Used: 62959616 (60.04 MB)
DFS Remaining: 210224709632 (195.79 GB)
DFS Used%: 0.00%
DFS Remaining%: 99.96%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Wed Dec 05 13:24:38 GMT 2018
Last Block Report: Wed Dec 05 13:05:56 GMT 2018

Checking the Replication Factor of HDFS

Last thing we want to check is to make sure that the Datanodes are indeed replicating. There are a few ways to do this. The following commands create a simple file, and then validate the replication factor. In both cases, the commands return 3 which is the default replication factor for HDFS.

root@srvr:~/cormac-hadoop# kubectl exec -n default -it namenode-0 \
-- /opt/hadoop/bin/hdfs dfs -touchz /out.txt

root@srvr:~/cormac-hadoop# kubectl exec -n default -it namenode-0 \
-- /opt/hadoop/bin/hdfs dfs -stat %r /out.txt
3

root@srvr:~/cormac-hadoop# kubectl exec -n default -it namenode-0 \
-- /opt/hadoop/bin/hdfs dfs -ls /out.txt
-rw-r--r--   3 hdfs admin          0 2018-12-05 15:55 /out.txt

Conclusion

That looks to have scaled out just fine. Now there are a few things to keep in mind when dealing with StatefulSets, as per the guidance found here. Deleting and/or scaling a StatefulSet down will not delete the volumes associated with the StatefulSet. This is done to ensure data safety, which is generally more valuable than an automatic purge of all related StatefulSet resources.

With that in mind, I hope this has added some clarity to how we can provide different vSAN policies to different parts of a cloud native application, providing additional protection when the application needs it, but not consuming any additional HCI (hyperconverged infrastructure) resources when the application is able to protect itself through built-in mechanisms.

The post Kubernetes, Hadoop, Persistent Volumes and vSAN appeared first on CormacHogan.com.

Pivotal and Harbor – x509 certificate issues

$
0
0
After deploying and configuring the Harbor tile in Pivotal Ops Manager, I ran into a couple of issues with certificates. The first was encountered when I was  trying to login to harbor from an Ubuntu VM where I was running all of my PKS and BOSH commands. It was also the VM where I pulled my container  images, and the VM from which I now wanted to push them into Harbor. Harbor is our registry server for storing container images. Here is what I got on trying to login:

 

cormac@pks-cli:~$ sudo docker login -u admin harbor.rainpole.com
Password:
Error response from daemon: Get https://harbor.rainpole.com/v1/users/: x509: certificate signed by unknown authority (possibly because of “crypto/rsa: verification error” while trying to verify candidate authority certificate “Pivotal”)
cormac@pks-cli:~$

 

To resolve this first issue, I had to log into the Harbor UI as the Admin user. From, there I navigated to Administration > Configuration > System Settings, and then I clicked on the Download link associated with the Registry Root Cert, as shown below.
On my Ubuntu VM, the certificate needed to be placed in a particular directory /etc/docker/certs.d/harbor.rainpole.com, where harbor.rainpole.com is obviously the name of my registry that I am trying to login to. With the cert in place, I can now login to my registry, as shown below.
cormac@pks-cli:/etc/docker/certs.d/harbor.rainpole.com$ uname -a
Linux pks-cli.rainpole.com4.13.0-46-generic #51-Ubuntu SMP Tue Jun 12 12:36:29 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
cormac@pks-cli:/etc/docker/certs.d/harbor.rainpole.com$ ls
ca.crt
cormac@pks-cli:/etc/docker/certs.d/harbor.rainpole.com$ sudo docker login -u admin harbor.rainpole.com
Password:
Login Succeeded

 

Cool. At this point, I thought I had solved the certificate issue. I was able to login to Harbor, tag images and push/pull to/from the registry. My next step was to deploy a couchbase app on my Kubernetes cluster, the image of which I had pushed to my registry. However, I got the following issue during the application creation:
root@pks-cli:~/cns-demo# kubectl get pods
NAME          READY   STATUS         RESTARTS   AGE
couchbase-0   0/1     ErrImagePull   0          12s
root@pks-cli:~/cns-demo# kubectl describe pods
Name:               couchbase-0
Namespace:          default
Priority:           0

.

.
.
Events:
  Type     Reason                  Age        From                                           Message
  —-     ——                  —-       —-                                           ——-
  Normal   Scheduled               3s         default-scheduler                              Successfully assigned default/couchbase-0 to 2e2478da-5a3f-4941-90b1-9410f2cebab2
  Normal   SuccessfulAttachVolume  2s         attachdetach-controller                        AttachVolume.Attach succeeded for volume “pvc-b5eb9ff9-2f2b-11e9-805e-00505682e96b”
  Normal   Pulling                 <invalid>  kubelet, 2e2478da-5a3f-4941-90b1-9410f2cebab2  pulling image”harbor.rainpole.com/library/saturnism/couchbase:k8s-petset”
  Warning  Failed                  <invalid>  kubelet, 2e2478da-5a3f-4941-90b1-9410f2cebab2  Failed to pull image “harbor.rainpole.com/library/saturnism/couchbase:k8s-petset”: rpc error: code = Unknown desc = Error response from daemon: Get https://harbor.rainpole.com/v2/: x509: certificate signed by unknown authority
  Warning  Failed                  <invalid>  kubelet, 2e2478da-5a3f-4941-90b1-9410f2cebab2  Error: ErrImagePull
  Normal   BackOff                 <invalid>  kubelet, 2e2478da-5a3f-4941-90b1-9410f2cebab2  Back-off pulling image “harbor.rainpole.com/library/saturnism/couchbase:k8s-petset”
  Warning  Failed                  <invalid>  kubelet, 2e2478da-5a3f-4941-90b1-9410f2cebab2  Error: ImagePullBackOff
root@pks-cli:~/cns-demo#

 

After some investigation, I found that I missed a step of integrating Harbor with PKS. In a nutshell, I should have copied the contents of my Harbor Registry CA certificate (same certificate I downloaded to my VM) and add it to the BOSH’s list of Trusted Certificates under Security in the BOSH tile in Pivotal Ops Manager. Once I had added it and applied the changes, I was successfully able to deploy my application.

 

root@pks-cli:~/cns-demo# kubectl get pods
NAME          READY   STATUS    RESTARTS   AGE
couchbase-0   1/1     Running   0          50s
root@pks-cli:~/cns-demo# kubectl describe pods
Name:               couchbase-0
Namespace:          default
Priority:           0

.

.
.
Events:
  Type     Reason                  Age                From                                           Message
  —-     ——                  —-               —-                                           ——-
  Warning  FailedScheduling        30s (x6 over 37s)  default-scheduler                              pod has unbound immediate PersistentVolumeClaims (repeated 3 times)
  Normal   Scheduled               30s                default-scheduler                              Successfully assigned default/couchbase-0 to e47914d4-efa3-4087-87f1-f7feb665b324
  Normal   SuccessfulAttachVolume  28s                attachdetach-controller                        AttachVolume.Attach succeeded for volume “pvc-8f84d30d-2f8b-11e9-a131-005056821e38”
  Normal   Pulling                 20s                kubelet, e47914d4-efa3-4087-87f1-f7feb665b324  pulling image “harbor.rainpole.com/library/saturnism/couchbase:k8s-petset”
  Normal   Pulled                  7s                 kubelet, e47914d4-efa3-4087-87f1-f7feb665b324  Successfully pulled image “harbor.rainpole.com/library/saturnism/couchbase:k8s-petset”
  Normal   Created                 7s                 kubelet, e47914d4-efa3-4087-87f1-f7feb665b324  Created container
  Normal   Started                 7s                 kubelet, e47914d4-efa3-4087-87f1-f7feb665b324  Started container
root@pks-cli:~/cns-demo#

The post Pivotal and Harbor – x509 certificate issues appeared first on CormacHogan.com.

A first look at Heptio Velero (previously known as Ark)

$
0
0

Those of you who work in the cloud native space will probably be aware of VMware’s acquisition of Heptio back in December 2018. Heptio bring much expertise and a number of products to the table, one of which I was very eager to try it. This is the Heptio Velero product, previously known as Heptio Ark. Heptio Velero provides a means to back up and restore cloud native applications. Interestingly enough, they appear to be able to capture all of the deployment details, so they are able to backup the pods (compute), persistent volumes (storage) and services (networking), as well as any other related objects, e.g. statefulset. They can back up or restore all objects in your Kubernetes cluster, or you can filter objects by types, namespaces, and/or labels. More details about how Velero works can be found in heptio.github.io.

Velero can be used with both cloud and on-prem deployments of K8s. These cloud providers (Amazon, Google, etc) have snapshot providers available for the creating snapshots of the application’s Persistent Volumes (PVs). On-prem is a little different. There is a provider available for Portworx. However, for everybody else, there is the open source “restic” utility which can be used for snapshotting PVs on other platforms, including vSphere.

Velero also ships with the ability to create/deploy a container based Minio object store for storing backups. Note that out of the box, this is very small, and won’t let you back up very much. I had to add a large PV to the Minio deployment to make it useful (I’ll show you how I did that shortly). The other issue I encountered with Minio is that the Velero backup/restore client (typically residing in a VM or on your dekstop) will not be able to communicate directly with the Minio object store. This means you can’t look at the logs stored in the object store from the velero client. Thus, a workaround is to change the Minio service from using a ClusterIP to use a NodePort instead. This will provide external access to the Minio obejct store. From that point, you can update one of the Minio configuration YAML files to tell the client about the external access to the Minio store. From that point, you can display the logs etc. I’ll show you how to do that as well in the post.

My K8s cluster was deployed via PKS. This also has a nuance (not a major one) which we will see shortly.

1. Creating a larger Minio object store

After doing some very small backups, I quickly ran out of space in the Minio object store.The symptoms were that my backup failed very quickly, and when I queried the podVolumeBackups object, I saw the following events:

cormac@pks-cli:~$ kubectl get podVolumeBackups -n velero
NAME                     AGE
couchbase-backup-c6bbl   4m
couchbase-backup-lsss6   3m
couchbase-backup-p9j4b   2m


cormac@pks-cli:~$ kubectl describe podVolumeBackups couchbase-backup-c6bbl -n velero
Name:         couchbase-backup-c6bbl
Namespace:    velero
.
.
Status:
  Message:  error running restic backup, stderr=Save(<lock/f476380dcd>) returned error, retrying after 679.652419ms: client.PutObject: Storage backend has reached its minimum free disk threshold. Please delete a few objects to proceed.
Save(<lock/f476380dcd>) returned error, retrying after 508.836067ms: client.PutObject: Storage backend has reached its minimum free disk threshold. Please delete a few objects to proceed.
Save(<lock/f476380dcd>) returned error, retrying after 1.640319969s: client.PutObject: Storage backend has reached its minimum free disk threshold. Please delete a few objects to proceed.
Save(<lock/f476380dcd>) returned error, retrying after 1.337024499s: client.PutObject: Storage backend has reached its minimum free disk threshold. Please delete a few objects to proceed.
Save(<lock/f476380dcd>) returned error, retrying after 1.620713255s: client.PutObject: Storage backend has reached its minimum free disk threshold. Please delete a few objects to proceed.
Save(<lock/f476380dcd>) returned error, retrying after 4.662012875s: client.PutObject: Storage backend has reached its minimum free disk threshold. Please delete a few objects to proceed.
Save(<lock/f476380dcd>) returned error, retrying after 7.092309877s: client.PutObject: Storage backend has reached its minimum free disk threshold. Please delete a few objects to proceed.
Save(<lock/f476380dcd>) returned error, retrying after 6.33450427s: client.PutObject: Storage backend has reached its minimum free disk threshold. Please delete a few objects to proceed.
Save(<lock/f476380dcd>) returned error, retrying after 13.103711682s: client.PutObject: Storage backend has reached its minimum free disk threshold. Please delete a few objects to proceed.
Save(<lock/f476380dcd>) returned error, retrying after 27.477605106s: client.PutObject: Storage backend has reached its minimum free disk threshold. Please delete a few objects to proceed.
Fatal: unable to create lock in backend: client.PutObject: Storage backend has reached its minimum free disk threshold. Please delete a few objects to proceed.
: exit status 1
  Path:
  Phase:        Failed
  Snapshot ID:
Events:         <none>
cormac@pks-cli:~$

Therefore to do anything meaningful, I modified the Minio deployment so that it had a 10GB PV for storing backups. To do it, I created a new StorageClass, a new PersistentVolumeClaim, and the modified the configuration in config/minio/00-minio-deployment.yaml to dynamically provision a PV rather than simply use a directory in the container. Since my deployment is on vSAN and using the vSphere Cloud Provider (VCP), I can also specify a storage policy for the PV. Note the namespace entry for the PVC. It has to be in velero, along with everything else in the configuration for Velero. The full deployment instructions found at heptio.github.io.

1.1. New StorageClass

kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: minio-sc
provisioner: kubernetes.io/vsphere-volume
parameters:
    diskformat: thin
    storagePolicyName: gold
    datastore: vsanDatastore

1.2. New PersistentVolumeClaim

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: minio-pv-claim-1
  namespace: velero
  annotations:
    volume.beta.kubernetes.io/storage-class: minio-sc
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi

1.3. config/minio/00-minio-deployment.yaml changes

apiVersion: apps/v1beta1
kind: Deployment
metadata:
  namespace: velero
  name: minio
  labels:
    component: minio
spec:
  strategy:
    type: Recreate
  template:
    metadata:
      labels:
        component: minio
    spec:
      volumes:
      - name: storage
        persistentVolumeClaim:
          claimName: minio-pv-claim-1
      - name: config
        emptyDir: {}

The change here is to change the volume “storage” from using an emptyDir to instead use a Persistent Volume. It is dynamically provisioned via the PersistentVolumeClaim entry. The StorageClass and PVC YAML can be added to the Minio configuration YAML to make things easier for any additional redeploys. Once the new configuration is in place, and once you redeploy the configuration, Minio will now have a 10GB PV available to place backups and allow you to do something meaningful.

2. Accessing Logs

Any time I tried to display backup logs, it would error and complain about not being able to find Minio. It would complain with a “dial tcp: lookup minio.velero.svc on 127.0.0.1:53: no such host” or something to that effect. This is because Minio is using ClusterIP and is not available externally. To resolve this, you can edit the service, and change the ClusterIP to a NodePort.

cormac@pks-cli:~/Velero$ kubectl edit svc minio -n velero
Please edit the object below. Lines beginning with a '#' will be ignored,
# and an empty file will abort the edit. If an error occurs while saving this file will be
# reopened with the relevant failures.
#
apiVersion: v1
kind: Service
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"v1","kind":"Service","metadata":{"annotations":{},"labels":{"component":"minio"},\
"name":"minio","namespace":"velero"},"spec":{"ports":[{"port":9000,"protocol":"TCP","targetPort":9000}],\
"selector":{"component":"minio"},"type":"ClusterIP"}}
  creationTimestamp: 2019-03-06T11:16:55Z
  labels:
    component: minio
  name: minio
  namespace: velero
  resourceVersion: "1799322"
  selfLink: /api/v1/namespaces/velero/services/minio
  uid: 59c59e95-4001-11e9-a7c8-005056a27540
spec:
  clusterIP: 10.100.200.169
  externalTrafficPolicy: Cluster
  ports:
  - nodePort: 30971
    port: 9000
    protocol: TCP
    targetPort: 9000
  selector:
    component: minio
  sessionAffinity: None
  type: NodePort   
status:
  loadBalancer: {}

After making the change, you can now see which port that Minio is available on externally.

cormac@pks-cli:~/Velero$ kubectl get svc -n velero
NAME    TYPE       CLUSTER-IP       EXTERNAL-IP   PORT(S)          AGE
minio   NodePort   10.100.200.169   <none>        9000:32166/TCP   59m

Now the Minio S3 object store is accessible externally. Point a browser at the IP address of a Kubernetes worker node, and tag on the port (in this case 32166) shown above and you should get access to the Minio object store interface. Now we need to change one of the configuration files for Velero/Minio so that the velero client can get access to it for displaying logs, etc. This file is config/minio/05-backupstoragelocation.yaml. Edit this file, and uncomment the last line for the publicUrl, and set it appropriately.

apiVersion: velero.io/v1
kind: BackupStorageLocation
metadata:
  name: default
  namespace: velero
spec:
  provider: aws
  objectStorage:
    bucket: velero
  config:
    region: minio
    s3ForcePathStyle: "true"
    s3Url: http://minio.velero.svc:9000
    # Uncomment the following line and provide the value of an externally
    # available URL for downloading logs, running Velero describe, and more.
    publicUrl: http://10.27.51.187:32166

Reapply the configuration using the command kubectl apply -f config/minio/05-backupstoragelocation.yaml. Now you should be able to access the logs using commands such as velero backups logs backup-name.

3. PKS interop – Privileged Containers

This was another issue that stumped me for a while. With PKS, through Pivotal Operations Manager, you create plans for the types of K8s clusters that you deploy. One of the options in a plan is to ‘Enable Privileged Containers’. Since this always stated ‘use with caution’, I always left it disabled. However, with Velero, and specifically with the restic portion (snapshots), I hit a problem when this was disabled. On trying to create a backup, I encountered these events on the restic daemon set:

cormac@pks-cli:~/Velero/config/minio$ kubectl describe ds restic -n velero
Name:           restic
Selector:       name=restic
.
.
Events:
  Type     Reason        Age   From                  Message
  ----     ------        ----  ----                  -------
  Warning  FailedCreate  49m   daemonset-controller  Error creating: pods "restic-v567s" is forbidden: pod.Spec.SecurityContext.RunAsUser is forbidden
  Warning  FailedCreate  32m   daemonset-controller  Error creating: pods "restic-xv5v7" is forbidden: pod.Spec.SecurityContext.RunAsUser is forbidden
  Warning  FailedCreate  15m   daemonset-controller  Error creating: pods "restic-vx2cb" is forbidden: pod.Spec.SecurityContext.RunAsUser is forbidden

So I went back to the Pivotal Ops Manager, went to the PKS tile, edited the plan, and enabled the checkbox for both ‘Enable Privileged Containers’ and ‘Disable DenyEscalatingExec’. I had to re-apply the PKS configuration, but once this completed (and I didn’t have to do anything with the K8s cluster btw, it is all taken care of by PKS), the restic pods were created successfully. Here are the buttons as they appear in the PKS tile > Plan.

OK – we are now ready to take a backup.

 

4. First Backup (CouchBase StatefulSet)

For my first backup, I wanted to take a StatefulSet – in my case, a CouchBase deployment that had been scaled out to 3 replicas. This means 3 pods and 3 PVs. Velero uses annotations to identify components for backup. Here I added special annotation to the volumes in the pods so that they can be identified by the restic utility for snapshotting. Note that this application is in its own namespace called “couchbase”.

cormac@pks-cli:~/Velero$ kubectl -n couchbase annotate pod/couchbase-0 backup.velero.io/backup-volumes=couchbase-data
pod/couchbase-0 annotated
cormac@pks-cli:~/Velero$ kubectl -n couchbase annotate pod/couchbase-1 backup.velero.io/backup-volumes=couchbase-data
pod/couchbase-1 annotated
cormac@pks-cli:~/Velero$ kubectl -n couchbase annotate pod/couchbase-2 backup.velero.io/backup-volumes=couchbase-data
pod/couchbase-2 annotated

Now I can run the backup:

cormac@pks-cli:~/Velero$ velero backup create couchbase
Backup request "couchbase" submitted successfully.
Run `velero backup describe couchbase` or `velero backup logs couchbase` for more details.

cormac@pks-cli:~/Velero$ velero backup describe couchbase --details
Name:         couchbase
Namespace:    velero
Labels:       velero.io/storage-location=default
Annotations:  <none>
Phase:  InProgress
Namespaces:
  Included:  *
  Excluded:  <none>
Resources:
  Included:        *
  Excluded:        <none>
  Cluster-scoped:  auto
Label selector:  <none>
Storage Location:  default
Snapshot PVs:  auto
TTL:  720h0m0s
Hooks:  <none>
Backup Format Version:  1
Started:    <n/a>
Completed:  <n/a>
Expiration:  2019-04-05 13:25:25 +0100 IST
Validation errors:  <none>
Persistent Volumes: <none included>
Restic Backups:
  New:
    couchbase/couchbase-0: couchbase-data

cormac@pks-cli:~/Velero/$ velero backup get 
NAME        STATUS      CREATED                         EXPIRES   STORAGE LOCATION   SELECTOR 
couchbase   Completed   2019-03-06 13:24:27 +0000 GMT   29d       default            <none> 

At this point, it is also interesting to login to the Minio portal and see what has been backed up. Here is an overview from the top-most level – velero/restic/couchbase.

5. First Restore (CouchBase StatefulSet)

To do a real restore, let’s delete the couchbase namespace which has all of my application and data.

cormac@pks-cli:~/Velero/tests$ kubectl delete ns couchbase
namespace "couchbase" deleted
cormac@pks-cli:~/Velero/tests$ kubectl get ns
NAME          STATUS   AGE
default       Active   15d
kube-public   Active   15d
kube-system   Active   15d
pks-system    Active   15d
velero        Active   16h

Let’s now try to restore the delete namespace and all of its contents (pods, PVs, etc). Some of the more observant among you will notice that the backup name I am using is “cb2”, and not “couchbase”. That is only because I did a few different backup tests, and cb2 is the one I am restoring. You will specify your own backup name here.

cormac@pks-cli:~/Velero/tests$ velero restore create --from-backup cb2
Restore request "cb2-20190307090808" submitted successfully.
Run `velero restore describe cb2-20190307090808` or `velero restore logs cb2-20190307090808` for more details.
cormac@pks-cli:~/Velero/tests$ velero restore describe cb2-20190307090808
Name:         cb2-20190307090808
Namespace:    velero
Labels:       <none>
Annotations:  <none>
Backup:  cb2
Namespaces:
  Included:  *
  Excluded:  <none>
Resources:
  Included:        *
  Excluded:        nodes, events, events.events.k8s.io, backups.ark.heptio.com, backups.velero.io, restores.ark.heptio.com, restores.velero.io
  Cluster-scoped:  auto
Namespace mappings:  <none>
Label selector:  <none>
Restore PVs:  auto
Phase:  InProgress
Validation errors:  <none>
Warnings:  <none>
Errors:    <none>

After a few moment, the restore completes. I can also see that all 3 of my PVs are referenced. Next step is to verify that the application has indeed recovered and is usable.

cormac@pks-cli:~/Velero/tests$ velero restore describe cb2-20190307090808 --details
Name:         cb2-20190307090808
Namespace:    velero
Labels:       <none>
Annotations:  <none>

Backup:  cb2

Namespaces:
  Included:  *
  Excluded:  <none>

Resources:
  Included:        *
  Excluded:        nodes, events, events.events.k8s.io, backups.ark.heptio.com, backups.velero.io, restores.ark.heptio.com, restores.velero.io
  Cluster-scoped:  auto

Namespace mappings:  <none>

Label selector:  <none>

Restore PVs:  auto

Phase:  Completed

Validation errors:  <none>

Warnings:  <none>
Errors:    <none>

Restic Restores:
  Completed:
    couchbase/couchbase-0: couchbase-data
    couchbase/couchbase-1: couchbase-data
    couchbase/couchbase-2: couchbase-data

Every thing appears to have come back successfully.

cormac@pks-cli:~/Velero/tests$ kubectl get ns
NAME          STATUS   AGE
couchbase     Active   70s
default       Active   15d
kube-public   Active   15d
kube-system   Active   15d
pks-system    Active   15d
velero        Active   16h

cormac@pks-cli:~/Velero/tests$ kubectl get pods -n couchbase
NAME          READY   STATUS    RESTARTS   AGE
couchbase-0   1/1     Running   0          90s
couchbase-1   1/1     Running   0          90s
couchbase-2   1/1     Running   0          90s

cormac@pks-cli:~/Velero/tests$ kubectl get pv -n couchbase
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                                  STORAGECLASS   REASON   AGE
pvc-8d676074-4032-11e9-842d-005056a27540   10Gi       RWO            Delete           Bound    velero/minio-pv-claim-1                minio-sc                16h
pvc-9d9c537b-40b8-11e9-842d-005056a27540   1Gi        RWO            Delete           Bound    couchbase/couchbase-data-couchbase-0   couchbasesc             95s
pvc-9d9d7846-40b8-11e9-842d-005056a27540   1Gi        RWO            Delete           Bound    couchbase/couchbase-data-couchbase-1   couchbasesc             95s
pvc-9d9f60b9-40b8-11e9-842d-005056a27540   1Gi        RWO            Delete           Bound    couchbase/couchbase-data-couchbase-2   couchbasesc             91s

cormac@pks-cli:~/Velero/tests$ kubectl get svc -n couchbase
NAME           TYPE           CLUSTER-IP      EXTERNAL-IP   PORT(S)          AGE
couchbase      ClusterIP      None            <none>        8091/TCP         117s
couchbase-ui   NodePort      10.100.200.80   <pending>     8091:31365/TCP    117s

I was also able to successfully connect to my CouchBase UI and examine the contents – all looked good to me. To close, I’d like to give a bit shout-out to my colleague Myles Gray, who fielded a lot of my K8s related questions. Cheer Myles!

[Update] OK, so I am not sure why it worked for me when I did my initial blog, but further attempts to restore this configuration resulted in CouchBase not starting. The reason for this seems to be that the IP addresses allocated to the CouchBase pods are hard-coded in either /opt/couchbase/var/lib/couchbase/ip or /opt/couchbase/var/lib/couchbase/ip_start on the pods. To get CouchBase back up and running, I had to replace the old entries (which were the IP addresses of the nodes which were backed up) with the new IP addresses of the restored pods. I ssh’ed onto the pods to do this. You can get the new IP address from /etc/hosts on each pod. Once the new entries were successfully updated, CouchBase restarted. However, it did not retain the original configuration, and seems to have reset to a default deployment. I’ll continue to reasearch into what else may need to be changed to bring the original config back.

The post A first look at Heptio Velero (previously known as Ark) appeared first on CormacHogan.com.

More Velero – Cassandra backup and restore

$
0
0

In my previous exercise with Heptio Velero, I looked at backing up and restoring a Couchbase deployment. This time I turned my attention to another popular containerized application, Cassandra. Cassandra is a NoSQL database, similar in some respects to Couchbase. Once again, I will be deploying Cassandra as a set of containers and persistent volumes from Kubernetes running on top of PKS, the Pivotal Container Service. And again, just like my last exercise, I will be instantiating the Persistent Volumes as virtual disks on top of vSAN. I’ll show you how to get Cassandra up and running quickly by sharing my YAML files, then we will destroy the namespace where Cassandra is deployed. Of course, this is after we have taken a backup with Heptio Velero (formerly Ark). We will then restore the Cassandra deployment from our Velero backup and verify that our data is still intact.

Since I went through all of the initial setup steps in my previous post, I will get straight to the Cassandra deployment, followed by the backup, restore with Velero and then verification of the data.

In my deployment, I went with 3 distinct YAML files, the service, the storage class and the statefulset. The first one shown here is the service YAML for my headless Cassandra deployment. Nothing really much to say here except for the fact that this is headless and we won’t be forwarding any traffic from the pods, thus we don’t need any cluster IP.

apiVersion: v1
kind: Service
metadata:
  labels:
    app: cassandra
  name: cassandra
  namespace: cassandra
spec:
# headless does not need a cluster IP
  clusterIP: None
  ports:
  - port: 9042
  selector:
    app: cassandra

Next up in the storage class. Regular readers will be familiar with this concept now. In a nutshell it allows us to do dynamic provisioning of volumes for our application. This storage class uses the K8s vSphere Volume Driver, consumes an SPBM policy called gold and creates virtual disks for persistent volumes on the vSAN datastore of this vSphere cluster.

kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: vsan
provisioner: kubernetes.io/vsphere-volume
parameters:
    diskformat: thin
    storagePolicyName: gold
    datastore: vsanDatastore

Lastly, we come to the stateful set itself, which allows scaling of PODS and PVs together. There are a number of things to highlight here. The first is the Cassandra container image version. These can be retrieved from gcr.io/google-samples. I went all the way back to v11 because this image included the cqlsh tool for working on the database. Now there are other options available if you choose to use later versions of the image, such as deploying a separate container with cqlsh, but I found it easier just to log onto the Cassandra containers and running my cqlsh commands from there. I’ve actually pulled down the Cassandra image and pushed it up to my own local harbor registry, which is where I am retrieving it from. One other thing is the DNS name of the Cassandra SEED node. Since I am deploying to a separate namespace called Cassandra, I need to ensure that the DNS name reflects that below. This SEED node is what allows the cluster to form. Last but not least is the volume section. This section references the storage class and allows for the creation of dynamic PVs for each POD in the Cassandra deployment, scaling in and out as needed.

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: cassandra
  namespace: cassandra
  labels:
    app: cassandra
spec:
  serviceName: cassandra
  replicas: 3
  selector:
    matchLabels:
      app: cassandra
  template:
    metadata:
      labels:
        app: cassandra
    spec:
      terminationGracePeriodSeconds: 1800
      containers:
      - name: cassandra
# My image is on my harbor registry
        image: harbor.rainpole.com/library/cassandra:v11
        imagePullPolicy: Always
        ports:
        - containerPort: 7000
          name: intra-node
        - containerPort: 7001
          name: tls-intra-node
        - containerPort: 7199
          name: jmx
        - containerPort: 9042
          name: cql
        resources:
          limits:
            cpu: "500m"
            memory: 1Gi
          requests:
            cpu: "500m"
            memory: 1Gi
        securityContext:
          capabilities:
            add:
              - IPC_LOCK
        lifecycle:
          preStop:
            exec:
              command:
              - /bin/sh
              - -c
              - nodetool drain
        env:
          - name: MAX_HEAP_SIZE
            value: 512M
          - name: HEAP_NEWSIZE
            value: 100M
# Make sure the DNS name matches the nameserver
          - name: CASSANDRA_SEEDS
            value: "cassandra-0.cassandra.cassandra.svc.cluster.local"
          - name: CASSANDRA_CLUSTER_NAME
            value: "K8Demo"
          - name: CASSANDRA_DC
            value: "DC1-K8Demo"
          - name: CASSANDRA_RACK
            value: "Rack1-K8Demo"
          - name: POD_IP
            valueFrom:
              fieldRef:
                fieldPath: status.podIP
        readinessProbe:
          exec:
            command:
            - /bin/bash
            - -c
            - /ready-probe.sh
          initialDelaySeconds: 15
          timeoutSeconds: 5
        volumeMounts:
        - name: cassandra-data
          mountPath: /cassandra_data
  volumeClaimTemplates:
  - metadata:
      name: cassandra-data
# Match the annotation to the storage class name defined previously
      annotations:
        volume.beta.kubernetes.io/storage-class: vsan
    spec:
      accessModes: [ "ReadWriteOnce" ]
      resources:
        requests:
          storage: 1Gi

Let’s take a look at the configuration after Cassandra has been deployed. Note that the statefulset requested 3 replicas.

cormac@pks-cli:~/Cassandra$ kubectl get sts -n cassandra
NAME        DESIRED   CURRENT   AGE
cassandra   3         3         54m

cormac@pks-cli:~/Cassandra$ kubectl get po -n cassandra
NAME          READY   STATUS    RESTARTS   AGE
cassandra-0   1/1     Running   0          54m
cassandra-1   1/1     Running   3          54m
cassandra-2   1/1     Running   2          54m

cormac@pks-cli:~/Cassandra$ kubectl get pvc -n cassandra
NAME                         STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
cassandra-data-cassandra-0   Bound    pvc-c61a6e97-4be8-11e9-be9b-005056a24d92   1Gi        RWO            vsan           54m
cassandra-data-cassandra-1   Bound    pvc-c61ba5d2-4be8-11e9-be9b-005056a24d92   1Gi        RWO            vsan           54m
cassandra-data-cassandra-2   Bound    pvc-c61cadc6-4be8-11e9-be9b-005056a24d92   1Gi        RWO            vsan           54m

cormac@pks-cli:~/Cassandra$ kubectl get svc -n cassandra
NAME        TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)    AGE
cassandra   ClusterIP   None         <none>        9042/TCP   55m

It all looks ok from a K8s perspective. We can use this nodetool CLI tool to verify the state of the Cassandra cluster and verify that all 3 nodes have joined.

cormac@pks-cli:~/Cassandra$ kubectl exec -it cassandra-0 -n cassandra -- nodetool status
Datacenter: DC1-K8Demo
======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address        Load       Tokens       Owns (effective)  Host ID                               Rack
UN  10.200.30.203  133.7  KiB  32           54.4%             a0baa626-ac99-45cc-a2f0-d45ac2f9892c  Rack1-K8Demo
UN  10.200.57.61   231.56 KiB  32           67.9%             95b1fdb8-2138-4b5d-901e-82b9b8c4b6c6  Rack1-K8Demo
UN  10.200.99.101  223.25 KiB  32           77.7%             3477bb48-ad60-4716-ac5e-9bf1f7da3f42  Rack1-K8Demo

Now we can use the cqlsh command mentioned earlier to create a dummy table and some contents (Like most of this setup, I simply picked these up from a quick google – I’m sure you can be far more elaborate should you wish).

cormac@pks-cli:~/Cassandra$ kubectl exec -it cassandra-0 -n cassandra -- cqlsh
Connected to K8Demo at 127.0.0.1:9042.
[cqlsh 5.0.1 | Cassandra 3.9 | CQL spec 3.4.2 | Native protocol v4]
Use HELP for help.

cqlsh> CREATE KEYSPACE demodb WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 2 };

cqlsh> use demodb;

cqlsh:demodb> CREATE TABLE emp(emp_id int PRIMARY KEY, emp_name text, emp_city text, emp_sal varint,emp_phone varint);

cqlsh:demodb> INSERT INTO emp (emp_id, emp_name, emp_city, emp_phone, emp_sal) VALUES (100, 'Cormac', 'Cork', 999, 1000000);

cqlsh:demodb> select * from emp;
emp_id | emp_city | emp_name | emp_phone | emp_sal
--------+----------+----------+-----------+---------
    100 |     Cork |   Cormac |       999 | 100000

(1 rows)
cqlsh:demodb> exit;

Next, we can start with the backup preparations, first of all annotating the Persistent Volumes so that Velero  knows to back them up.

cormac@pks-cli:~/Cassandra$ kubectl -n cassandra annotate pod/cassandra-2 backup.velero.io/backup-volumes=cassandra-data
pod/cassandra-2 annotated

cormac@pks-cli:~/Cassandra$ kubectl -n cassandra annotate pod/cassandra-1 backup.velero.io/backup-volumes=cassandra-data
pod/cassandra-1 annotated

cormac@pks-cli:~/Cassandra$ kubectl -n cassandra annotate pod/cassandra-0 backup.velero.io/backup-volumes=cassandra-data
pod/cassandra-0 annotated

Finally, we initiate the backup. This time I am going to tell Velero to skip all the other namespaces so that it only backs up the Cassandra namespace. Note that there are various ways of doing this with selectors, etc. This isn’t necessary the most optimal way to achieve this (but it works).

cormac@pks-cli:~/Cassandra$ velero backup create cassandra --exclude-namespaces velero,default,kube-public,kube-system,pks-system,couchbase
Backup request "cassandra" submitted successfully.
Run `velero backup describe cassandra` or `velero backup logs cassandra` for more details.

I typically put a watch -n 5 before the ‘velero backup describe’ command so I can see it getting regularly updated with progress. When the backup is complete, it can be listed as follows:

cormac@pks-cli:~/Cassandra$ velero backup get
NAME         STATUS      CREATED                         EXPIRES   STORAGE LOCATION   SELECTOR
all          Completed   2019-03-21 10:43:43 +0000 GMT   29d       default            <none>
all-and-cb   Completed   2019-03-21 10:51:26 +0000 GMT   29d       default            <none>
all-cb-2     Completed   2019-03-21 11:11:04 +0000 GMT   29d       default            <none>
cassandra    Completed   2019-03-21 14:43:25 +0000 GMT   29d       default            <none>

Time to see if we can restore it. As before, we can now destroy our current data. In my case, I am just going to remove the namespace where my Cassandra objects reside (PODs, PVs, service, StatefulSet), and then recover it using Velero.

cormac@pks-cli:~/Cassandra$ kubectl delete ns cassandra namespace "cassandra" deleted

cormac@pks-cli:~/Cassandra$ velero restore create cassandra-restore --from-backup cassandra
Restore request "cassandra-restore" submitted successfully.
Run `velero restore describe cassandra-restore` or `velero restore logs cassandra-restore` for more details. 

You can monitor this in the same way as you monitor the backup, using a watch -n 5. You can also monitor the creation of new namespace, PVs and PODs using kubectl. Once everything is backed up, we can verify if the data exists using the same commands as before.

cormac@pks-cli:~/Cassandra$ kubectl exec -it cassandra-0 -n cassandra -- cqlsh
Connected to K8Demo at 127.0.0.1:9042.
[cqlsh 5.0.1 | Cassandra 3.9 | CQL spec 3.4.2 | Native protocol v4]
Use HELP for help.
cqlsh> select * from demodb.emp;
emp_id | emp_city | emp_name | emp_phone | emp_sal
--------+----------+----------+-----------+---------
    100 |     Cork |   Cormac |       999 | 100000
(1 rows)
cqlsh>

So we have had a successful backup and restore, using Heptio Velero, of Cassandra running as a set of containers on top in K8s on PKS, and using Persistent Volumes on vSAN – neat!

The post More Velero – Cassandra backup and restore appeared first on CormacHogan.com.

Fun with PKS, K8s, MongoDB Helm Charts and vSAN

$
0
0

I’ve been spending a bit of time lately look at our Heptio Velero product, and how it works with various cloud native applications. Next application on my list is MongoDB, another NoSQL database. I looked at various deployment mechanisms for MongoDB, and it seems that using Helm Charts is the most popular approach. This led me to the Bitnami MongoDB Stack Chart GitHub Repo. At this point, I did spin my wheels a little trying to get MongoDB stood up. In this post, I’ll talk through some of the gotchas I encountered. Once again, my environment is vSphere 6.7 and vSAN 6.7. I am using the Pivotal Container Service/PKS 1.3 which has the vSphere Cloud Driver for Kubernetes, and I already have a 4 nodes Kubernetes cluster running v1.12.4. Helm is also installed and initialized. In a further post, I’ll look at Velero backup/restore of this MongoDB deployment.

 

OK – so the first thing you need to do is to download values-production.yaml file. This has got all the configuration information that you will need when trying to deploy the MongoDB helm chart. I spent a lot of time adding set options to the command line of my helm install, when I should have simply been referencing this file with my own tuned values. However, I think it is good to show you the sorts of issues I encountered and also what I went through to configure the MongoDB helm chart in order to get a working MongoDB environment. Note that before doing anything, you will need a StorageClass for the Persistent Volumes that we will use for this deployment. Here is my tried and trusted StorageClass YAML file which is referenced when dynamic Persistent Volumes are required – it placed the PVs on my vSANDatastore (as VMDKs) with a vSAN policy of gold.

kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: mongo-sc
provisioner: kubernetes.io/vsphere-volume
parameters:
    diskformat: thin
    storagePolicyName: gold
    datastore: vsanDatastore

A simple ‘kubectl create‘ of my StorageClass YAML file, and we are good to proceed. One other thing I wanted to do is to place MongoDB into its own namespace. Thus a very quick ‘kubectl create ns mongodb‘ and I was now ready to proceed with the helm chart deployments. If you are new to helm and tiller, this is a good place to get started.

 

Attempt 1 – no values-production. yaml file

Here was the helm command that I used. Don’t use this unless you really want to see the issues I encountered. The proper command will appear further on.

$ helm install --name mymongodb --namespace mongodb --set service.type=NodePort\
 --set persistence.storageClass=mongo-sc --set replicaSet.enabled=true\
 stable/mongodb

Helm displays a bunch of deployment status after the initial command is run. You can also get updated status by running helm status “name”, as follows:

$ helm status mymongodb
LAST DEPLOYED: Thu Mar 28 11:45:17 2019
NAMESPACE: mongodb
STATUS: DEPLOYED

RESOURCES:
==> v1/Pod(related)
NAME READY STATUS RESTARTS AGE
mymongodb-arbiter-0 0/1 Running 0 94s
mymongodb-primary-0 0/1 Running 0 94s
mymongodb-secondary-0 0/1 Running 0 94s

==> v1/Secret
NAME TYPE DATA AGE
mymongodb Opaque 2 94s

==> v1/Service
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
mymongodb NodePort 10.100.200.76 <none> 27017:30184/TCP 94s
mymongodb-headless ClusterIP None <none> 27017/TCP 94s

==> v1/StatefulSet
NAME READY AGE
mymongodb-arbiter 0/1 94s
mymongodb-primary 0/1 94s
mymongodb-secondary 0/1 94s

==> v1beta1/PodDisruptionBudget
NAME MIN AVAILABLE MAX UNAVAILABLE ALLOWED DISRUPTIONS AGE
mymongodb-arbiter 1 N/A 0 94s
mymongodb-primary 1 N/A 0 94s
mymongodb-secondary 1 N/A 0 94s

In this case, my pods were never entering a ready state. I decided to run a ‘kubectl describe‘ on the pods to see if I could get any further clues. Here is a snippet of some of the describe output, which also displays pod events at the end.

Liveness: exec [mongo --eval db.adminCommand('ping')] delay=30s timeout=5s period=10s #success=1 #failure=6
Readiness: exec [mongo --eval db.adminCommand('ping')] delay=5s timeout=5s period=10s #success=1 #failure=6

    Mounts:
      /bitnami/mongodb from datadir (rw)

Warning Unhealthy 2m35s kubelet, 91920344-b3a7-4979-a100-c156db235b6d Readiness probe failed: MongoDB shell version v4.0.6
connecting to: mongodb://127.0.0.1:27017/?gssapiServiceName=mongodb
2019-03-28T11:45:45.346+0000 E QUERY [js] Error: couldn't connect to server 127.0.0.1:27017, connection attempt failed: SocketException: Error connecting to 127.0.0.1:27017 :: caused by :: Connection refused :
connect@src/mongo/shell/mongo.js:343:13
@(connect):1:6
exception: connect failed

So obviously some problems with trying to connect to the database, as well as the liveness and readiness checks failing (presumably because they are trying to connect to the DB). Next I decided to login into one of the pods and see if I could get any more information by running the mongo client and mongod daemon.

$ kubectl get po -n mongodb
NAME READY STATUS RESTARTS AGE
mymongodb-arbiter-0 0/1 Running 4 9m58s
mymongodb-primary-0 0/1 Running 4 9m58s
mymongodb-secondary-0 0/1 Running 4 9m58s

$ kubectl exec -it mymongodb-primary-0 -n mongodb -- bash

I have no name!@mymongodb-primary-0:/$ mongo
MongoDB shell version v4.0.6
connecting to: mongodb://127.0.0.1:27017/?gssapiServiceName=mongodb
2019-03-28T11:56:14.087+0000 E QUERY [js] Error: couldn't connect to server 127.0.0.1:27017, connection attempt failed: SocketException: Error connecting to 127.0.0.1:27017 :: caused by :: Connection refused :
connect@src/mongo/shell/mongo.js:343:13
@(connect):1:6
exception: connect failed

I have no name!@mymongodb-primary-0:/$ mongod
2019-03-28T11:56:16.971+0000 I CONTROL [main] Automatically disabling TLS 1.0, to force-enable TLS 1.0 specify --sslDisabledProtocols 'none'
2019-03-28T11:56:16.975+0000 I CONTROL [initandlisten] MongoDB starting : pid=141 port=27017 dbpath=/data/db 64-bit host=mymongodb-primary-0
2019-03-28T11:56:16.975+0000 I CONTROL [initandlisten] db version v4.0.6
2019-03-28T11:56:16.975+0000 I CONTROL [initandlisten] git version: caa42a1f75a56c7643d0b68d3880444375ec42e3
2019-03-28T11:56:16.975+0000 I CONTROL [initandlisten] OpenSSL version: OpenSSL 1.1.0j 20 Nov 2018
2019-03-28T11:56:16.975+0000 I CONTROL [initandlisten] allocator: tcmalloc
2019-03-28T11:56:16.975+0000 I CONTROL [initandlisten] modules: none
2019-03-28T11:56:16.975+0000 I CONTROL [initandlisten] build environment:
2019-03-28T11:56:16.975+0000 I CONTROL [initandlisten] distmod: debian92
2019-03-28T11:56:16.975+0000 I CONTROL [initandlisten] distarch: x86_64
2019-03-28T11:56:16.975+0000 I CONTROL [initandlisten] target_arch: x86_64
2019-03-28T11:56:16.975+0000 I CONTROL [initandlisten] options: {}
2019-03-28T11:56:16.979+0000 I STORAGE [initandlisten] exception in initAndListen: NonExistentPath: Data directory /data/db not found., terminating
2019-03-28T11:56:16.979+0000 I NETWORK [initandlisten] shutdown: going to close listening sockets...
2019-03-28T11:56:16.979+0000 I NETWORK [initandlisten] removing socket file: /tmp/mongodb-27017.sock
2019-03-28T11:56:16.979+0000 I CONTROL [initandlisten] now exiting
2019-03-28T11:56:16.979+0000 I CONTROL [initandlisten] shutting down with code:100
I have no name!@mymongodb-primary-0:/$

I have no name!@mymongodb-primary-0:/$ cd /opt/bitnami/mongodb/
I have no name!@mymongodb-primary-0:/opt/bitnami/mongodb$ ls
LICENSE-Community.txt MPL-2 README THIRD-PARTY-NOTICES bin conf data licenses logs tmp
I have no name!@mymongodb-primary-0:/opt/bitnami/mongodb$ ls data/
db
I have no name!@mymongodb-primary-0:/opt/bitnami/mongodb$

It would appear that the biggest issue is that the Data directory /data/db was not found. Now, if I look for the PV mount point, I can see that it is not in the / (root) folder, but it is in /opt/bitnami/mongodb. OK – there seems to be an issue with where MongoDB is expecting to find the data/db folder. Was I missing another –set value on the command line?

 

Attempt 2 – using the values-production. yaml file

It was at this point that my colleague Myles recommended the values-production.yaml file. The appropriate changes were found in the Config Map section that had entries for the MongoDB config file. I removed the comment from the following lines:

from:

# Entries for the MongoDB config file
configmap:
# # Where and how to store data.
# storage:
# dbPath: /opt/bitnami/mongodb/data/db
# journal:
# enabled: true
# #engine:
# #wiredTiger:
# # where to write logging data.
# systemLog:
# destination: file
# logAppend: true
# path: /opt/bitnami/mongodb/logs/mongodb.log

to:

# Entries for the MongoDB config file
configmap:
# Where and how to store data.
storage:
dbPath: /opt/bitnami/mongodb/data/db
journal:
enabled: true
#engine:
#wiredTiger:
# where to write logging data.
systemLog:
destination: file
logAppend: true
path: /opt/bitnami/mongodb/logs/mongodb.log

With this file saved, I relaunched my helm install command, but this time referencing the values-production.yaml file.

$ helm install --name mymongodb -f values-production.yaml --namespace mongodb --set service.type=NodePort --set persistence.storageClass=mongo-sc --set replicaSet.enabled=true stable/mongodb
$ helm status mymongodb
LAST DEPLOYED: Thu Mar 28 12:22:24 2019
NAMESPACE: mongodb
STATUS: DEPLOYED

RESOURCES:
==> v1/ConfigMap
NAME DATA AGE
mymongodb 1 55s

==> v1/Pod(related)
NAME READY STATUS RESTARTS AGE
mymongodb-arbiter-0 0/1 Running 0 55s
mymongodb-primary-0 2/2 Running 0 55s
mymongodb-secondary-0 2/2 Running 0 55s

==> v1/Secret
NAME TYPE DATA AGE
mymongodb Opaque 2 55s

==> v1/Service
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
mymongodb NodePort 10.100.200.199 <none> 27017:30269/TCP 55s
mymongodb-headless ClusterIP None <none> 27017/TCP,9216/TCP 55s

==> v1/StatefulSet
NAME READY AGE
mymongodb-arbiter 0/1 55s
mymongodb-primary 1/1 55s
mymongodb-secondary 1/1 55s

==> v1beta1/PodDisruptionBudget
NAME MIN AVAILABLE MAX UNAVAILABLE ALLOWED DISRUPTIONS AGE
mymongodb-arbiter 1 N/A 0 55s
mymongodb-primary 1 N/A 0 55s
mymongodb-secondary 1 N/A 0 55s

Wow – this looks better already. Let’s see if I can now run the mongo client on the pods, which I could not do before.

cormac@pks-cli:~/mongodb-helm$ kubectl exec -it mymongodb-primary-0 -n mongodb -- bash
Defaulting container name to mongodb-primary.
Use 'kubectl describe pod/mymongodb-primary-0 -n mongodb' to see all of the containers in this pod.

I have no name!@mymongodb-primary-0:/$ mongo
MongoDB shell version v4.0.7
connecting to: mongodb://127.0.0.1:27017/?gssapiServiceName=mongodb
Implicit session: session { "id" : UUID("90b2c3b3-a6e6-4d85-ad6d-626efb008246") }
MongoDB server version: 4.0.7
Welcome to the MongoDB shell.
For interactive help, type "help".
For more comprehensive documentation, see
http://docs.mongodb.org/
Questions? Try the support group
http://groups.google.com/group/mongodb-user
2019-03-28T12:24:54.912+0000 I STORAGE [main] In File::open(), ::open for '//.mongorc.js' failed with Unknown error
Server has startup warnings:
2019-03-28T12:22:36.914+0000 I CONTROL [initandlisten]
2019-03-28T12:22:36.914+0000 I CONTROL [initandlisten] ** WARNING: Access control is not enabled for the database.
2019-03-28T12:22:36.914+0000 I CONTROL [initandlisten] ** Read and write access to data and configuration is unrestricted.
2019-03-28T12:22:36.914+0000 I CONTROL [initandlisten]
2019-03-28T12:22:36.914+0000 I CONTROL [initandlisten] ** WARNING: This server is bound to localhost.
2019-03-28T12:22:36.914+0000 I CONTROL [initandlisten] ** Remote systems will be unable to connect to this server.
2019-03-28T12:22:36.914+0000 I CONTROL [initandlisten] ** Start the server with --bind_ip <address> to specify which IP
2019-03-28T12:22:36.914+0000 I CONTROL [initandlisten] ** addresses it should serve responses from, or with --bind_ip_all to
2019-03-28T12:22:36.914+0000 I CONTROL [initandlisten] ** bind to all interfaces. If this behavior is desired, start the
2019-03-28T12:22:36.914+0000 I CONTROL [initandlisten] ** server with --bind_ip 127.0.0.1 to disable this warning.
2019-03-28T12:22:36.914+0000 I CONTROL [initandlisten]
2019-03-28T12:22:36.914+0000 I CONTROL [initandlisten]
2019-03-28T12:22:36.914+0000 I CONTROL [initandlisten] ** WARNING: soft rlimits too low. rlimits set to 15664 processes, 65536 files. Number of processes should be at least 32768 : 0.5 times number of files.
---
Enable MongoDB's free cloud-based monitoring service, which will then receive and display
metrics about your deployment (disk utilization, CPU, operation statistics, etc).

The monitoring data will be available on a MongoDB website with a unique URL accessible to you
and anyone you share the URL with. MongoDB may use this information to make product
improvements and to suggest MongoDB products and deployment options to you.

To enable free monitoring, run the following command: db.enableFreeMonitoring()
To permanently disable this reminder, run the following command: db.disableFreeMonitoring()
---

> show dbs
admin 0.000GB
config 0.000GB
local 0.000GB
>

There are a couple in interesting outputs from the mongo client, even though it appears to be working. First is related to access control. I’m not too concerned about that as this is in my own private lab. However, I do want to be able to use an application like Compass for managing the database from my desktop. Thus I do not want the servers to be bound to localhost. So there is another change needed to be made in the values-production.yaml file. First, delete the current deployment.

$ helm delete --purge mymongodb
release "mymongodb" deleted

Next, make the following changes to the values-production file, from:

# # network interfaces
# net:
# port: 27017
# bindIp: 0.0.0.0
# unixDomainSocket:
# enabled: true
# pathPrefix: /opt/bitnami/mongodb/tmp

to:

# # network interfaces
net:
port: 27017
bindIp: 0.0.0.0
unixDomainSocket:
enabled: true
pathPrefix: /opt/bitnami/mongodb/tmp

And reinstall the helm chart as before. Now if you login to the pod, and run the mongo client command, that message about bound to localhost should be gone.

Finally, we can see if we can manage the database from Compass. In my case, I do not have a LB front-end, so I simply connect to the IP address of one of my K8s nodes, along with the port on which my MongoDB is accessible externally. You can get the port in a number of ways, but the simplest way is to look at your services. In the output below, the external port is 30133.

$ kubectl get svc -n mongodb
NAME                 TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)              AGE
mymongodb            NodePort    10.100.200.3   <none>        27017:30133/TCP      2m17s
mymongodb-headless   ClusterIP   None           <none>        27017/TCP,9216/TCP   2m17s

With details about the K8s node and MongoDB port, we can now connect to it via Compass. Since I have not bothered with credentials, none are needed.

 

Excellent – we are up and running. Now, I can drop a number of additional –set options (nodeport, storage class, etc) and make my command line to install the helm chart for MongoDB a lot simpler. You can play with this yourselves. Come back soon as I plan to populate the DB with some useful information, and then go through the process of backing it up and restoring it with Heptio Velero.

The post Fun with PKS, K8s, MongoDB Helm Charts and vSAN appeared first on CormacHogan.com.


Velero and Portworx – Container Volume Backup and Restores

$
0
0

If you’ve been following my posts for the last week or so, you’ll have noticed my write-ups on Velero backups and restores using the new release candidate (RC). I also did a recent write-up on Portworx volumes and snapshots. In this post, I’ll bring them both together, and show you how Velero and Portworx are integrated to allow backups and restores of container applications using Portworx volumes. However, first, let’s take a step back. As was highlighted to me recently, all of this is very new to a lot of people, so let’s spend a little time setting the context.

Near the end of last year, VMware acquired a company called Heptio. These guys are some of the leading lights in the Kubernetes community, and bring a wealth of expertise around Kubernetes and Cloud Native Applications to VMware. One of the open source products that was part of their portfolio was a Kubernetes backup/restore/mobility product called Ark. After the acquisition, the product was rebranded to Velero (Ark was already used). So in a nutshell, Velero allows you to take backups and do restores (and also migrations) of applications running in containers on top of Kubernetes. So why am I looking at it? Well, as part of VMware’s Storage and Availability BU, one of the things we are closely looking at is how do we make vSphere/vSAN the best platform for running cloud native applications (including K8s).  This also involves how we implement day 2 type operations for these (newer) applications. Backup and Restore obviously fit fairly and squarely into this segment.

And finally, just by way of closing off this brief introduction, Portworx have been a significant player in the cloud native storage space for a while now. They have already worked with Velero (Ark) in the past, and have a plugin for taking snapshots to enable Velero to do backups and restores of container volumes that are deployed on Portworx backed container volumes. Portworx has also kindly provided early access to an RC version of their plugin to work with the RC version of Velero.

OK then – let’s take a look at how these two products work together.

To begin with, we had tried to do this before with v0.11 of Velero but due to a known issue with additional spaces in the snapshot name, we could never get this working. With the release candidate announcement for Velero, we reached out to Portworx to see if we could get early access to their new plugin. They kindly agreed, and I was finally able to do some test backups and restores of Cassandra using Portworx volumes. Here are the version numbers that I am using for this test.

$ velero version
Client:
       Version: v1.0.0-rc.1
       Git commit: d05f8e53d8ecbdb939d5d3a3d24da7868619ec3d
Server:
        Version: v1.0.0-rc.1

$ /opt/pwx/bin/pxctl -v
pxctl version 2.0.3.4-0c0bbe4

I’m not going to go through the details of deploying Velero here. Suffice to say that there is a new velero install command in the RC release that should make things easier than before. You can still set it up using the older YAML file method by copying files from the previous v0.11 to the RC distro as per my earlier blog post.

In this write-up, since the assumption is that Portworx has already been deployed, I will also use Portworx for the volume to back my Minio S3 object store. The Portworx team have some great write-ups on how to deploy Portworx on-premises if you need guidance. Here is an example of the StorageClass and PVC for my Minio deployment to do just that. Note that I have selected to have my Minio S3 replicated by a factor of 3 as per the parameters in the StorageClass.

---
kind: StorageClass
apiVersion: storage.k8s.io/v1beta1
metadata:
  name: minio-sc
provisioner: kubernetes.io/portworx-volume
parameters:
   repl: "3"
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: minio-pv-claim-1
  namespace: velero
  annotations:
    volume.beta.kubernetes.io/storage-class: minio-sc
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi
---

The next step is to add the Portworx plugin to Velero. For RC, the Portworx plugin is portworx/velero-plugin:1.0.0-rc1. To add the plugin, simply run velero plugin add portworx/velero-plugin:1.0.0-rc1. This will only work with the RC version of Velero. As there is currently no velero plugin list command, the only way to check if the plugin was successfully added is to describe the Velero POD, and examine the Init Containers (see below). This should list all of the plugins that have been added to Velero. I have filed a GitHub feature request to get a velero plugin list command.

Also not that I have aliased my kubectl command to simply ‘k’, as you can see below.

$ velero plugin add portworx/velero-plugin:1.0.0-rc1

$ k get pod -n velero
NAME                     READY   STATUS      RESTARTS   AGE
minio-74995c888c-b9d2m   1/1     Running     0          25h
minio-setup-l5sfl        0/1     Completed   0          25h
velero-c7c95547b-wd867   1/1     Running     0          25h

$ k describe pod velero-c7c95547b-wd867 -n velero
Name:               velero-c7c95547b-wd867
Namespace:          velero
Priority:           0
PriorityClassName:  <none>
Node:               k8s-2/10.27.51.66
Start Time:         Mon, 13 May 2019 09:55:14 +0100
Labels:             component=velero
                    pod-template-hash=c7c95547b
Annotations:        prometheus.io/path: /metrics
                    prometheus.io/port: 8085
                    prometheus.io/scrape: true
Status:             Running
IP:                 10.244.2.49
Controlled By:      ReplicaSet/velero-c7c95547b
Init Containers:
  velero-plugin:
    Container ID:   docker://0ae4789051de7db74745b3f34b289893f4641b3222a147bf82de73482573f7e7
    Image:          portworx/velero-plugin:1.0.0-rc1
    Image ID:       docker-pullable://portworx/velero-plugin@sha256:e11f24cc18396e5a4542ea71e789598f6e3149d178ec6c4f70781e9c3059a8ea
    Port:           <none>
    Host Port:      <none>
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Mon, 13 May 2019 09:55:19 +0100
      Finished:     Mon, 13 May 2019 09:55:19 +0100
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /target from plugins (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from velero-token-lp4gq (ro)

Now we need to tell Velero about the Snapshot Provider, and Snapshot location. This is done by creating a YAML file for the VolumeSnapshotLocation kind. One other thing to note here, which is different from the current Portworx documentation, is that the provider needs to use a fully qualified name for the plugin. This is new in Velero 1.0.0. This now makes the Portworx provider portworx.io/portworx. The full YAML file looks something like this.

apiVersion: velero.io/v1
kind: VolumeSnapshotLocation
metadata:
  name: portworx-local
  namespace: velero
spec:
  provider: portworx.io/portworx

Now we should have both the BackupStorageLocation and the VolumeSnapshotLocation ready to go. Let’s double check that.

$ k get BackupStorageLocation -n velero
NAME      AGE
default   25h

$ k get VolumeSnapshotLocation -n velero
NAME             AGE
portworx-local   20

At this point, everything is in place to begin our backup and restore test. Once again, I will use my trusty Cassandra instance, which has been pre-populated with some sample data. Let’s first examine the application from a K8s perspective.

$ k get sts -n cassandra
NAME        READY   AGE
cassandra   3/3     73m

$ k get pods -n cassandra
NAME          READY   STATUS    RESTARTS   AGE
cassandra-0   1/1     Running   0          73m
cassandra-1   1/1     Running   3          73m
cassandra-2   1/1     Running   3          73m

$ k get pvc -n cassandra
NAME                         STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
cassandra-data-cassandra-0   Bound    pvc-a9e89927-7589-11e9-ac93-005056b82121   1Gi        RWO            cass-sc        74m
cassandra-data-cassandra-1   Bound    pvc-5f7d3306-758b-11e9-ac93-005056b82121   1Gi        RWO            cass-sc        74m
cassandra-data-cassandra-2   Bound    pvc-9d9be9f2-758b-11e9-ac93-005056b82121   1Gi        RWO            cass-sc        74m

$ k get pv | grep cassandra
pvc-5f7d3306-758b-11e9-ac93-005056b82121   1Gi        RWO            Delete           Bound    cassandra/cassandra-data-cassandra-1   cass-sc                 74m
pvc-9d9be9f2-758b-11e9-ac93-005056b82121   1Gi        RWO            Delete           Bound    cassandra/cassandra-data-cassandra-2   cass-sc                 74m
pvc-a9e89927-7589-11e9-ac93-005056b82121   1Gi        RWO            Delete           Bound    cassandra/cassandra-data-cassandra-0   cass-sc                 74m

$ k exec -it cassandra-0 -n cassandra -- nodetool status
Datacenter: DC1-K8Demo
======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address      Load       Tokens       Owns (effective)  Host ID                               Rack
UN  10.244.2.53  189.17 KiB  32           100.0%            b7d527a6-d465-472a-82e1-8184a924045e  Rack1-K8Demo
UN  10.244.4.23  234.35 KiB  32           100.0%            4b678dd5-92af-4003-b978-559266e07d65  Rack1-K8Demo
UN  10.244.3.28  143.58 KiB  32           100.0%            74f0a3a6-5b34-4c97-b8ea-f71589b3fbca  Rack1-K8Demo

$ k exec -it cassandra-0 -n cassandra -- cqlsh
Connected to K8Demo at 127.0.0.1:9042.
[cqlsh 5.0.1 | Cassandra 3.9 | CQL spec 3.4.2 | Native protocol v4]
Use HELP for help.
cqlsh> use demodb;
cqlsh:demodb> select * from emp;
emp_id | emp_city | emp_name | emp_phone | emp_sal
--------+----------+----------+-----------+---------
    100 |     Cork |   Cormac |       999 | 1000000

(1 rows)
cqlsh:demodb> exit
$

Let’s now list the current volumes and snapshots from a Portworx perspective. There is a volume for each of the 3 Cassandra replicas (1GiB) and there is an additional one for my on-prem Minio volume (10GiB). Currently, there should not be any snapshots as we have not initiated any backups.

$ /opt/pwx/bin/pxctl volume list
ID                      NAME                                            SIZE    HA      SHARED  ENCRYPTED       IO_PRIORITY     STATUS                          SNAP-ENABLED
1051863986075634800     pvc-5f7d3306-758b-11e9-ac93-005056b82121        1 GiB   3       no      no              LOW             up - attached on 10.27.51.66    no
567192692874972784      pvc-9d9be9f2-758b-11e9-ac93-005056b82121        1 GiB   3       no      no              LOW             up - attached on 10.27.51.64    no
116101267951461079      pvc-a9e89927-7589-11e9-ac93-005056b82121        1 GiB   3       no      no              LOW             up - attached on 10.27.51.27    no
972553017890078199      pvc-e4710be5-755a-11e9-ac93-005056b82121        10 GiB  3       no      no              LOW             up - attached on 10.27.51.66    no

$ /opt/pwx/bin/pxctl volume list --snapshot
ID      NAME    SIZE    HA      SHARED  ENCRYPTED       IO_PRIORITY     STATUS  SNAP-ENABLED

We are now ready to take our first Velero backup. In the command below, I am only going to backup the Cassandra namespace and associated objects. I am going to exclude all other K8s namespaces. There are a number of commands available to check the status of the backup.  By including a —details option to the velero backup describe command, further details about the snapshots are displayed. You could also use velero backup logs to show the call-outs to the Portworx Snapshot provider when the backup is initiating a snapshot of the Cassandra PVs.

$ velero backup create cassandra --include-namespaces cassandra
Backup request "cassandra" submitted successfully.
Run `velero backup describe cassandra` or `velero backup logs cassandra` for more details.


$ velero backup describe cassandra
Name: cassandra
Namespace: velero
Labels: velero.io/storage-location=default
Annotations: <none>

Phase: Completed

Namespaces:
Included: cassandra
Excluded: <none>

Resources:
Included: *
Excluded: <none>
Cluster-scoped: auto

Label selector: <none>

Storage Location: default

Snapshot PVs: auto

TTL: 720h0m0s

Hooks: <none>

Backup Format Version: 1

Started: 2019-05-14 11:21:50 +0100 IST
Completed: 2019-05-14 11:21:55 +0100 IST

Expiration: 2019-06-13 11:21:50 +0100 IST

Persistent Volumes: 3 of 3 snapshots completed successfully (specify --details for more information)


$ velero backup describe cassandra --details
Name: cassandra
Namespace: velero
Labels: velero.io/storage-location=default
Annotations: <none>

Phase: Completed

Namespaces:
Included: cassandra
Excluded: <none>

Resources:
Included: *
Excluded: <none>
Cluster-scoped: auto

Label selector: <none>

Storage Location: default

Snapshot PVs: auto

TTL: 720h0m0s

Hooks: <none>

Backup Format Version: 1

Started: 2019-05-14 11:21:50 +0100 IST
Completed: 2019-05-14 11:21:55 +0100 IST

Expiration: 2019-06-13 11:21:50 +0100 IST

Persistent Volumes:
  pvc-a9e89927-7589-11e9-ac93-005056b82121:
    Snapshot ID: 849199168767327835
    Type: portworx-snapshot
    Availability Zone:
    IOPS: <N/A>
  pvc-5f7d3306-758b-11e9-ac93-005056b82121:
    Snapshot ID: 1019215085859062674
    Type: portworx-snapshot
    Availability Zone:
    IOPS: <N/A>
  pvc-9d9be9f2-758b-11e9-ac93-005056b82121:
    Snapshot ID: 841348593482616373
    Type: portworx-snapshot
    Availability Zone:
    IOPS: <N/A>

And to finish off the backup part of this post, let’s do another set of Portworx commands to see some information about the volumes and snapshots. At this point, we would expect to see a snapshot for each Cassandra PV. Indeed we do, and we also see that the snapshot names include the name of the cassandra application. I’m not sure at this point where this is retrieved from, possible application label.

$ /opt/pwx/bin/pxctl volume list
ID                      NAME                                                    SIZE    HA      SHARED  ENCRYPTED       IO_PRIORITY     STATUS                          SNAP-ENABLED
1019215085859062674     cassandra_pvc-5f7d3306-758b-11e9-ac93-005056b82121      1 GiB   3       no      no              LOW             up - detached                   no
841348593482616373      cassandra_pvc-9d9be9f2-758b-11e9-ac93-005056b82121      1 GiB   3       no      no              LOW             up - detached                   no
849199168767327835      cassandra_pvc-a9e89927-7589-11e9-ac93-005056b82121      1 GiB   3       no      no              LOW             up - detached                   no
1051863986075634800     pvc-5f7d3306-758b-11e9-ac93-005056b82121                1 GiB   3       no      no              LOW             up - attached on 10.27.51.66    no
567192692874972784      pvc-9d9be9f2-758b-11e9-ac93-005056b82121                1 GiB   3       no      no              LOW             up - attached on 10.27.51.64    no
116101267951461079      pvc-a9e89927-7589-11e9-ac93-005056b82121                1 GiB   3       no      no              LOW             up - attached on 10.27.51.27    no
972553017890078199      pvc-e4710be5-755a-11e9-ac93-005056b82121                10 GiB  3       no      no              LOW             up - attached on 10.27.51.66    no

$ /opt/pwx/bin/pxctl volume list --snapshot
ID                      NAME                                                    SIZE    HA      SHARED  ENCRYPTED       IO_PRIORITY     STATUS          SNAP-ENABLED
1019215085859062674     cassandra_pvc-5f7d3306-758b-11e9-ac93-005056b82121      1 GiB   3       no      no              LOW             up - detached   no
841348593482616373      cassandra_pvc-9d9be9f2-758b-11e9-ac93-005056b82121      1 GiB   3       no      no              LOW             up - detached   no
849199168767327835      cassandra_pvc-a9e89927-7589-11e9-ac93-005056b82121      1 GiB   3       no      no              LOW             up - detached   no

Let’s now go ahead and do something drastic. Let’s delete the Cassandra namespace (which will remove the StatefulSet, PODs, PVCs, PVs, etc). We will then use Velero to restore it, and hopefully observe that our Cassandra instance comes back with our sample data.

After first deleting the Cassandra namespace, we see that this also removes the persistent volumes. This can also be seen from Portworx. However the snapshots are still intact. These are in fact the only volumes that are now listed by Portworx, and note that they are not attached to K8s worker nodes.

$ k delete ns cassandra
namespace "cassandra" deleted

$ /opt/pwx/bin/pxctl volume list
ID                      NAME                                                    SIZE    HA      SHARED  ENCRYPTED       IO_PRIORITY     STATUS                          SNAP-ENABLED
1019215085859062674     cassandra_pvc-5f7d3306-758b-11e9-ac93-005056b82121      1 GiB   3       no      no              LOW             up - detached                   no
841348593482616373      cassandra_pvc-9d9be9f2-758b-11e9-ac93-005056b82121      1 GiB   3       no      no              LOW             up - detached                   no
849199168767327835      cassandra_pvc-a9e89927-7589-11e9-ac93-005056b82121      1 GiB   3       no      no              LOW             up - detached                   no
972553017890078199      pvc-e4710be5-755a-11e9-ac93-005056b82121                10 GiB  3       no      no              LOW             up - attached on 10.27.51.66    no

$ /opt/pwx/bin/pxctl volume list --snapshot
ID                      NAME                                                    SIZE    HA      SHARED  ENCRYPTED       IO_PRIORITY     STATUS          SNAP-ENABLED
1019215085859062674     cassandra_pvc-5f7d3306-758b-11e9-ac93-005056b82121      1 GiB   3       no      no              LOW             up - detached   no
841348593482616373      cassandra_pvc-9d9be9f2-758b-11e9-ac93-005056b82121      1 GiB   3       no      no              LOW             up - detached   no
849199168767327835      cassandra_pvc-a9e89927-7589-11e9-ac93-005056b82121      1 GiB   3       no      no              LOW             up - detached   no

Now it is time to do the restore of my Cassandra application using Velero. Using the —details option to the velero restore describe command does not appear to provide any additional details. However the logs can again be used to check if you had a successful restore of PVs from Portworx snapshots using velero restore logs command.

$ velero backup get
NAME        STATUS      CREATED                         EXPIRES   STORAGE LOCATION   SELECTOR
cassandra   Completed   2019-05-14 11:21:50 +0100 IST   29d       default            <none>

$ velero restore create cassandra-restore --from-backup cassandra
Restore request "cassandra-restore" submitted successfully.
Run `velero restore describe cassandra-restore` or `velero restore logs cassandra-restore` for more details.

$ velero restore describe cassandra-restore
Name:         cassandra-restore
Namespace:    velero
Labels:       <none>
Annotations:  <none>

Phase:  Completed

Backup:  cassandra

Namespaces:
  Included:  *
  Excluded:  <none>

Resources:
  Included:        *
  Excluded:        nodes, events, events.events.k8s.io, backups.velero.io, restores.velero.io, resticrepositories.velero.io
  Cluster-scoped:  auto

Namespace mappings:  <none>

Label selector:  <none>

Restore PVs:  auto

Let’s take a look at this from a Portworx perspective after the restore. The PVs are restored and attached to K8s worker nodes.

$ /opt/pwx/bin/pxctl volume list
ID                      NAME                                                    SIZE    HA      SHARED  ENCRYPTED       IO_PRIORITY     STATUS                          SNAP-ENABLED
1019215085859062674     cassandra_pvc-5f7d3306-758b-11e9-ac93-005056b82121      1 GiB   3       no      no              LOW             up - detached                   no
841348593482616373      cassandra_pvc-9d9be9f2-758b-11e9-ac93-005056b82121      1 GiB   3       no      no              LOW             up - detached                   no
849199168767327835      cassandra_pvc-a9e89927-7589-11e9-ac93-005056b82121      1 GiB   3       no      no              LOW             up - detached                   no
960922544915702517      pvc-5f7d3306-758b-11e9-ac93-005056b82121                1 GiB   3       no      no              LOW             up - attached on 10.27.51.64    no
180406970719671498      pvc-9d9be9f2-758b-11e9-ac93-005056b82121                1 GiB   3       no      no              LOW             up - attached on 10.27.51.27    no
305662590414763324      pvc-a9e89927-7589-11e9-ac93-005056b82121                1 GiB   3       no      no              LOW             up - attached on 10.27.51.66    no
972553017890078199      pvc-e4710be5-755a-11e9-ac93-005056b82121                10 GiB  3       no      no              LOW             up - attached on 10.27.51.66    no

$ /opt/pwx/bin/pxctl volume list --snapshot
ID                      NAME                                                    SIZE    HA      SHARED  ENCRYPTED       IO_PRIORITY     STATUS          SNAP-ENABLED
1019215085859062674     cassandra_pvc-5f7d3306-758b-11e9-ac93-005056b82121      1 GiB   3       no      no              LOW             up - detached   no
841348593482616373      cassandra_pvc-9d9be9f2-758b-11e9-ac93-005056b82121      1 GiB   3       no      no              LOW             up - detached   no
849199168767327835      cassandra_pvc-a9e89927-7589-11e9-ac93-005056b82121      1 GiB   3       no      no              LOW             up - detached   no

And last but not least, let’s verify the Cassandra application is fully functional with all 3 nodes rejoined and up, and table data has been restored.

$ k get sts -n cassandra
NAME        READY   AGE
cassandra   3/3     5m22s


$ k get pods -n cassandra
NAME          READY   STATUS    RESTARTS   AGE
cassandra-0   1/1     Running   0          5m28s
cassandra-1   1/1     Running   2          5m28s
cassandra-2   1/1     Running   2          5m28s


$ k get pvc -n cassandra
NAME                         STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
cassandra-data-cassandra-0   Bound    pvc-a9e89927-7589-11e9-ac93-005056b82121   1Gi        RWO            cass-sc        5m41s
cassandra-data-cassandra-1   Bound    pvc-5f7d3306-758b-11e9-ac93-005056b82121   1Gi        RWO            cass-sc        5m41s
cassandra-data-cassandra-2   Bound    pvc-9d9be9f2-758b-11e9-ac93-005056b82121   1Gi        RWO            cass-sc        5m41s


$ k get svc -n cassandra
NAME        TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)    AGE
cassandra   ClusterIP   None         <none>        9042/TCP   5m46s


$ k get pv | grep cassandra
pvc-5f7d3306-758b-11e9-ac93-005056b82121   1Gi        RWO            Delete           Bound    cassandra/cassandra-data-cassandra-1   cass-sc                 5m58s
pvc-9d9be9f2-758b-11e9-ac93-005056b82121   1Gi        RWO            Delete           Bound    cassandra/cassandra-data-cassandra-2   cass-sc                 5m56s
pvc-a9e89927-7589-11e9-ac93-005056b82121   1Gi        RWO            Delete           Bound    cassandra/cassandra-data-cassandra-0   cass-sc                 5m55s


$ k exec -it cassandra-0 -n cassandra -- nodetool status
Datacenter: DC1-K8Demo
======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address      Load       Tokens       Owns (effective)  Host ID                               Rack
UN  10.244.2.54  142.61 KiB  32           100.0%            74f0a3a6-5b34-4c97-b8ea-f71589b3fbca  Rack1-K8Demo
UN  10.244.3.29  275.59 KiB  32           100.0%            4b678dd5-92af-4003-b978-559266e07d65  Rack1-K8Demo
UN  10.244.4.24  230.94 KiB  32           100.0%            b7d527a6-d465-472a-82e1-8184a924045e  Rack1-K8Demo


$ k exec -it cassandra-0 -n cassandra -- cqlsh
Connected to K8Demo at 127.0.0.1:9042.
[cqlsh 5.0.1 | Cassandra 3.9 | CQL spec 3.4.2 | Native protocol v4]
Use HELP for help.
cqlsh> use demodb;
cqlsh:demodb> select * from emp;

emp_id | emp_city | emp_name | emp_phone | emp_sal
--------+----------+----------+-----------+---------
    100 |     Cork |   Cormac |       999 | 1000000

(1 rows)

cqlsh:demodb>

Everything looks good. Velero, with the Portworx plugin for snapshots, has been able to backup and restore a Cassandra instance.

Now, since these are both release candidates, you are not expected to use this for production purposes. However, if you have an opportunity to test these products in your own lab environments, I am sure both the Velero team and the Portworx plugin teams to love to get your feedback.

The post Velero and Portworx – Container Volume Backup and Restores appeared first on CormacHogan.com.

Kubernetes Storage on vSphere 101 – Failure Scenarios

$
0
0

We have looked at quite a few scenarios when Kubernetes is running on vSphere, and what that means for storage. We looked at PVs, PVC, PODs, Storage Classes, Deployments and ReplicaSets, and most recently we looked at StatefulSets. In a few of the posts we looked at some controlled failures, for example, when we deleted a Pod from a Deployment or from a StatefulSet. In this post, I wanted to look a bit closer at an uncontrolled failure, say when a node crashes. However, before getting into this in too much details, it is worth highlighting a few of the core components of Kubernetes before we get started.

Core Components

In a typical K8s deployment, the majority of the components that I will be describing appear in the namespace kube-system. Here is a recent deployment of K8s 1.14.3 on my vSphere environment. Let’s get a list of the pods first, so we can see what we are talking about.

$ kubectl get pods -n kube-system
NAME                                  READY   STATUS    RESTARTS   AGE
coredns-fb8b8dccf-7s4vg               1/1     Running   0          2d1h
coredns-fb8b8dccf-cwbxs               1/1     Running   0          2d1h
etcd-cor-k8s-m01                      1/1     Running   0          2d1h
kube-apiserver-cor-k8s-m01            1/1     Running   0          2d1h
kube-controller-manager-cor-k8s-m01   1/1     Running   0          2d1h
kube-flannel-ds-amd64-hgrww           1/1     Running   0          2d1h
kube-flannel-ds-amd64-j9nck           1/1     Running   0          2d1h
kube-flannel-ds-amd64-n5d28           1/1     Running   0          2d1h
kube-proxy-2gqlj                      1/1     Running   0          2d1h
kube-proxy-hc494                      1/1     Running   0          2d1h
kube-proxy-hsvdz                      1/1     Running   0          2d1h
kube-scheduler-cor-k8s-m01            1/1     Running   1          2d1h

 

Let’s begin with the Pods that run on the master, node cor-k8s-m01 in this example. To see where the Pods are running, you can use the kubectl get pods -o wide -n kube-system. The first component to highlight on the master is etcd, which is a key-value store that keeps track of all of the objects in your K8s cluster.

Next we have the kube-apiserver. In a nutshell, the API server takes care of all of the interactions that take place between core components and etcd. Other core components are constantly watching the API server for any changes related to the state of the system. If there is a change to an application definition, everyone watching gets a notification of the change.

This brings us to the kube-scheduler. The scheduler is responsible for finding the best node on which to run a newly created Pod.

Now we come to the kube-controller. This core component is responsible for running the control loops (reconcilers). Since Kubernetes is a Declarative system, the purpose of these control loops is to watch the actual state of the system, and if it is different from the desired/declared state, they initiates operations to rectify the situation and make the actual state the same as the desired state. An example of this could be something as simple as attaching a Persistent Volume to a Pod. When an application is deployed in K8s, the application definition (including the declarative state of its deployment) is persisted on the master server. The API server maintains both an in-memory cache of desired state (what the system desired state is) and another in-memory cache of the actual state (real world observed state). When these caches differ, the controller is responsible for initiating tasks to rectify the difference.

To close off on the master, CoreDNS, as its name might imply is the DNS service for assigning fully qualified domain names to virtual IP addresses assigned to services. You may also see this implemented as KubeDNS.

Let’s now talk about the workers or nodes. Kube-proxy is the component that configures node networking to route network requests from virtual IP addresses of a service to the endpoint implementing the service, anywhere in the cluster. Kube-proxy is implemented as a daemonset, means that an instance runs on every node. Also in this example, we are using flannel as the network overlay. This is also implemented as a daemon set, which means there is an instance on every node.

The main component on the nodes is the kubelet. This provides a number of functions. Some of its primary functions are to check with the API server to find which Pods should be running on this node, as well report the state of its running Pods back to the API server. One other interesting item to note is that the kubelet is responsible for running in-tree cloud providers, such as VMware’s vSphere Cloud Provider (VCP).

Let’s say we deploy a new application. The scheduler will notice a new Pod has not yet been scheduled. Therefore it runs its algorithm to find out the best node on which to run the Pod. The scheduler then updates the API server for the Pod to say where is should be scheduled, i.e. which node? The kubelet on that node is monitoring the API server, and now it sees that the new Pod has been scheduled on it, but this Pod is not running. The kubelet will now start the Pod (container). The kubelet continuously monitors the Pod to make sure it is running, and will restart it If it crashes. It also runs any reconciliation control loops to bring pods to the declared state if its actual state does not match. This is how it will remain until you delete it. When you delete the application, all of the components watching it will receive the deletion event. The kubelet will notice that Pod no longer exists on API server, so it will then go ahead and delete it from its node.

For a more detailed description of K8s internals, I strongly recommend reading this excellent blog on “Events, the DNA of Kubernetes” from my colleague, Michael Gasch.

Note that PKS (Pivotal Container Services) users will see a lot of differences when they display the Pods in the kube-system namespace. Many of the Pods discussed here are implemented as process on the PKS master. However, for the purposes of failure testing, the behaviour is the same.

With all of that background, lets now go ahead and examine what happens when a node fails, and what happens to the Pods running on that node. For the purposes of these exercises, I am using a combination of “power off” and the kubectl delete node command. The impact of these two operations on the behaviour of Kubernetes failure handling will be clearly seen shortly.

Standalone Pod behaviour on Node power off

First of all, lets talk about single, standalone Pod deployments. These are Pods that are not part of any Deployment or StatefulSet, so there are no objects ensuring their availability. I think there might be a misconception that a standalone Pod like this will be restarted automatically elsewhere in the K8 cluster when the node on which it resides is powered off. I know this is what I initially expected to happen before carrying out these tests. However, that is not the case. When the node on which the Pod is running is powered off, the node enters a NotReady state after approx 40 seconds of missed heartbeats (which are sent every 10 seconds). Around 5 minutes later, the Pod is marked for eviction/deletion and enters into a Terminating state, but remains in that state indefinitely. This is because the kubelet daemon on the node cannot be reached by the API server, so the decision to delete the Pod cannot be communicated. If the node is powered on again, the API server can again communicate to the kubelet, and it is at that point the decision to delete the Pod is communicated, and so the Pod is now deleted. Note however that the Pod is not restarted automatically (unless it is part of a Deployment or StatefulSet). The application has to be restarted manually. So yes, even though the node has recovered and has rejoined the cluster, any Pods on that node which were not protected, are deleted.

In previous versions of K8s (up to v1.5), these Pods used to be ‘force deleted’ even when the API server and kubelet could not talk. In later versions (1.5 and later) K8s no longer deletes the pods until it is confirmed that they have stopped running. To remove these Pods when a node is powered off or partitioned in some way, you may need to delete the node using the kubectl delete node command. This deletes all the Pods running on the node from the API server. You can read more about this behavior here.

Standalone Pod behaviour on Node delete

To remove a node from the K8s cluster, you can issue a kubectl delete node command (this is without powering off the node). In this case, the standalone Pods running on the node are immediately deleted and the node is removed from the cluster (with a big caveat that there are no Persistent Volumes attached to the Pod being evicted from the node). If the Pod has PVs attached, we will see what happens when we talk about StatefulSets shortly.  If you are doing a controlled delete of a node, but wish to restart the Pods, you should evacuate the Pods first. K8s provides a kubectl drain command for such scenarios. This will restart the Pods on another node in the cluster, again assuming that the Pod is not consuming any PVs. If it is, we will need to take some additional steps to detach the PVs from the node that is being drained. Let’s look at those scenarios next.

StatefulSet behaviour on Node delete

In this section, we will look at a node delete impact on a StatefulSet, where each Pod has its own unique Persistent Volume (which on vSphere means that it is backed by a VMDK created using the VCP – vSphere Cloud Provider driver). For the purposes of this example, I will use a Cassandra application, but I will only deploy it with a single Replica. However, the StatefulSet will still be responsible for keeping that single Pod running. In this first example, I will delete the node on which the Pod is running using the kubectl delete node command, and observe the events. What we will see is that the API server can still communicate to the kubelet, and the Pod is deleted. Then, since it is a StatefulSet, we will see the Pod restarted on another node, and the PV will attempt to attach to the new node. However, this is not possible since the original node still has the PV attached. Once we remove the virtual machine disk (VMDK) backing the PV from the original node using the vSphere Client (or power off the original node), the reconciler for the attacherDetacher controller can detach the PV, and successfully re-attach it to the new node where the Pod has been scheduled. This is identical to the standalone Pod situation mentioned in the previous scenario – manual intervention is required.

Let’ look at the actual steps involved; we will start with a look at the current configuration. Its a StatefulSet with one Pod and one PVC/PV. We can also see which node the Pod is running on using the kubectl describe pod command.

cormac@pks-cli:~/cassandra$ kubectl get pods
NAME          READY   STATUS    RESTARTS   AGE
cassandra-0   1/1     Running   0          5m19s

cormac@pks-cli:~/cassandra$ kubectl describe pod cassandra-0 | grep Node:
Node:               a2cfcf41-922d-487b-bdf6-b453d7fb3105/10.27.51.187
cormac@pks-cli:~/cassandra$

cormac@pks-cli:~/cassandra$ kubectl get pvc
NAME                         STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
cassandra-data-cassandra-0   Bound    pvc-91600a74-8d16-11e9-b070-005056a2a261   1Gi        RWO            cass-sc        5m31s

cormac@pks-cli:~/cassandra$ kubectl get pv
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                                  STORAGECLASS   REASON   AGE
pvc-91600a74-8d16-11e9-b070-005056a2a261   1Gi        RWO            Delete           Bound    cassandra/cassandra-data-cassandra-0   cass-sc                 5m26s

cormac@pks-cli:~/cassandra$ kubectl describe pv pvc-91600a74-8d16-11e9-b070-005056a2a261
Name:            pvc-91600a74-8d16-11e9-b070-005056a2a261
Labels:          <none>
Annotations:     kubernetes.io/createdby: vsphere-volume-dynamic-provisioner
                 pv.kubernetes.io/bound-by-controller: yes
                 pv.kubernetes.io/provisioned-by: kubernetes.io/vsphere-volume
Finalizers:      [kubernetes.io/pv-protection]
StorageClass:    cass-sc
Status:          Bound
Claim:           cassandra/cassandra-data-cassandra-0
Reclaim Policy:  Delete
Access Modes:    RWO
Capacity:        1Gi
Node Affinity:   <none>
Message:
Source:
    Type:               vSphereVolume (a Persistent Disk resource in vSphere)
    VolumePath:         [vsanDatastore] fef94d5b-fa8b-491f-bf0a-246e962f4850/kubernetes-dynamic-pvc-91600a74-8d16-11e9-b070-005056a2a261.vmdk
    FSType:             ext4
    StoragePolicyName:  raid-1
    VolumeID:
Events:                 <none>

Next, let’s take a look at the node VM in the vSphere Client UI, and we should observe that the attached VMDK matches the Persistent Volume.

Let’s also try to put something useful into the Cassandra DB, so we can check that it is still intact after our various testing.

$ kubectl exec -it cassandra-0 -n cassandra nodetool status
Datacenter: DC1-K8Demo
======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address       Load       Tokens       Owns    Host ID                               Rack
UN  10.200.100.6  172.85 KiB  32           ?       2726920f-38d4-4c41-bb16-e28bc6a2a1fb  Rack1-K8Demo
$ kubectl exec -it cassandra-0 -n cassandra -- cqlsh
Connected to K8Demo at 127.0.0.1:9042.
[cqlsh 5.0.1 | Cassandra 3.9 | CQL spec 3.4.2 | Native protocol v4]
Use HELP for help.
cqlsh:demodb> CREATE KEYSPACE mydemo WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 1 };
cqlsh:demodb> use mydemo;
cqlsh:mydemo> CREATE TABLE emp(emp_id int PRIMARY KEY, emp_name text, emp_city text, emp_sal varint,emp_phone varint);
cqlsh:mydemo> INSERT INTO emp (emp_id, emp_name, emp_city, emp_phone, emp_sal) VALUES (100, 'Cormac', 'Cork', 999, 1000000);
cqlsh:mydemo> exit

Great. Now I will delete this node from the cluster using kubectl delete node. Within a very short space of time, I see the following events appearing for the Pod.

Normal    Started                  Pod    Started container
Warning   Unhealthy                Pod    Readiness probe failed:
Normal    Scheduled                Pod    Successfully assigned cassandra/cassandra-0 to a2cfcf41-922d-487b-bdf6-b453d7fb3105
Warning   FailedAttachVolume       Pod    Multi-Attach error for volume "pvc-91600a74-8d16-11e9-b070-005056a2a261" \
Volume is already exclusively attached to one node and can't be attached to another
Normal    Killing                  Pod    Killing container with id docker://cassandra:Need to kill Pod

The Pod is restarted on a new node, but the volume cannot be attached to the new node, since it is still attached to the original node. Also 2 minutes later, we also see a problem with the mount attempt, which obviously cannot happen since the volume is not attached:

Warning   FailedMount              Pod    Unable to mount volumes for pod "cassandra-0_cassandra(5a2dc01e-8d19-11e9-b070-005056a2a261)": \
timeout expired waiting for volumes to attach or mount for pod "cassandra"/"cassandra-0". list of unmounted volumes=[cassandra-data]. \
list of unattached volumes=[cassandra-data default-token-6fj8b]

Now, the reconciler for the attacherDetacher controller has a 6 minute timer, and once that timer has expired, it will attempt to forcibly detach the volume from the current (now deleted) node, and attach it to the new node. However, because the PV is still attached to the original node VM, even a force detach does not work.

Warning   FailedAttachVolume   Pod    AttachVolume.Attach failed for volume "pvc-91600a74-8d16-11e9-b070-005056a2a261" : \
Failed to add disk 'scsi1:0'.

And if we examine the tasks in the vSphere client UI, we observe the following errors, which match the events seen in the Pod:

OK –  now we need some manual intervention to resolve the situation. To resolve this issue, I have a few choices. One option is to shutdown the node VM that I just deleted from the cluster, and still has the VMDK/PV attached. The other option is to simply remove the VMDK/PV manually from the original node VM. In this case, I will just remove the VMDK/PV from the VM via the vSphere Client UI, but make sure that the “Delete files from datastore” checkbox is not selected. If it is, then we would end up deleting the VDMK from disk, which is not what we want.

After a little time, we see the following in the logs, indicating that the PV could be successfully attached and the Pod can be started:

Normal    SuccessfulAttachVolume   Pod    AttachVolume.Attach succeeded for volume "pvc-91600a74-8d16-11e9-b070-005056a2a261"
Normal    Pulling                  Pod    pulling image "harbor.rainpole.com/library/cassandra:v11"
Normal    Pulled                   Pod    Successfully pulled image "harbor.rainpole.com/library/cassandra:v11"
Normal    Created                  Pod    Created container
Normal    Started                  Pod    Started container

Everything has returned to the normal, desired state of operation at this point. Let’s now try a different sequence.

StatefulSet behaviour on Node power off

In this next test, we are going to take a slightly different approach. In this test, we will first power off the node on which the Pod of our StatefulSet is running. Then we will see how K8s handles this, and if any further interaction is needed. Let’s check out Cassandra DB and contents before going any further.

$ kubectl exec -it cassandra-0 -n cassandra nodetool status
Datacenter: DC1-K8Demo
======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address       Load       Tokens       Owns    Host ID                               Rack
UN  10.200.82.12  173.43 KiB  32           ?       2726920f-38d4-4c41-bb16-e28bc6a2a1fb  Rack1-K8Demo
$ kubectl exec -it cassandra-0 -n cassandra -- cqlsh
Connected to K8Demo at 127.0.0.1:9042.
[cqlsh 5.0.1 | Cassandra 3.9 | CQL spec 3.4.2 | Native protocol v4]
Use HELP for help.
cqlsh> select * from mydemo.emp;

emp_id | emp_city | emp_name | emp_phone | emp_sal
--------+----------+----------+-----------+---------
    100 |     Cork |   Cormac |       999 | 1000000
(1 rows)
cqlsh>

All looks good. The next step is to determine where there Pod is running (on which node VM) and then power it down.

$ kubectl describe pod cassandra-0 | grep Node:
Node:               62fc3f4f-a992-4716-b972-3958bda8b231/10.27.51.186
$ kubectl get nodes -o wide
NAME                                   STATUS     ROLES    AGE     VERSION   INTERNAL-IP    EXTERNAL-IP    OS-IMAGE             KERNEL-VERSION      CONTAINER-RUNTIME
52bf9358-7a6e-4552-87b1-6591dc88634c   Ready      <none>   6h20m   v1.12.4   10.27.51.185   10.27.51.185   Ubuntu 16.04.5 LTS   4.15.0-43-generic   docker://18.6.1
62fc3f4f-a992-4716-b972-3958bda8b231   NotReady   <none>   31m     v1.12.4   10.27.51.186   10.27.51.186   Ubuntu 16.04.5 LTS   4.15.0-43-generic   docker://18.6.1
a2cfcf41-922d-487b-bdf6-b453d7fb3105   Ready      <none>   7m12s   v1.12.4   10.27.51.187   10.27.51.187   Ubuntu 16.04.5 LTS   4.15.0-43-generic   docker://18.6.1
c1b53999-ac8b-4187-aae6-940bf61b4e2b   Ready      <none>   6h13m   v1.12.4   10.27.51.189   10.27.51.189   Ubuntu 16.04.5 LTS   4.15.0-43-generic   docker://18.6.1

After 40 seconds, the node enters a NotReady state. After a further 5 minutes, the Pod is marked for deletion and enters a Terminating state, as shown is the describe command below.

$ kubectl describe pod cassandra-0
Name:                      cassandra-0
Namespace:                 cassandra
Priority:                  0
PriorityClassName:         <none>
Node:                      62fc3f4f-a992-4716-b972-3958bda8b231/10.27.51.186
Start Time:                Wed, 12 Jun 2019 16:16:25 +0100
Labels:                    app=cassandra
                           controller-revision-hash=cassandra-589c765486
                           statefulset.kubernetes.io/pod-name=cassandra-0
Annotations:               <none>
Status:                    Terminating(lasts <invalid>)
Termination Grace Period:  180s
Reason:                    NodeLost
Message:                   Node 62fc3f4f-a992-4716-b972-3958bda8b231 which was running pod cassandra-0 is unresponsive
IP:                        10.200.82.12
Controlled By:             StatefulSet/cassandra

Now the interesting thing is that the system remains in this state indefinitely. The Pod is left Terminating, it isn’t scheduled on a new node, and thus there is no attempt to detach the PV from the current node, and re-attach it to the new node. This is expected, and is as currently designed, since there is no communication possible between the kubelet and the API server. It doesn’t know if the node has failed, or if it is simply network partitioned. The node could just come back at any time. So it just waits. At this point, if the node is never coming back, you could try to force delete the Pod, or delete the node. We will do the latter, so we will now need to issue a kubectl delete node command against the node that was powered off. Once that command is initiated, immediately things begin to happen. First of all, we see the Pod get deleted, and because it is a StatefulSet it is automatically re-scheduled on a new node. You can also see the State change from Terminating to Pending. It is waiting for the PV to get attached and mounted before it can start.

$ kubectl describe pod cassandra-0
Name:               cassandra-0
Namespace:          cassandra
Priority:           0
PriorityClassName:  <none>
Node:               a2cfcf41-922d-487b-bdf6-b453d7fb3105/10.27.51.187
Start Time:         Wed, 12 Jun 2019 16:51:06 +0100
Labels:             app=cassandra
                    controller-revision-hash=cassandra-589c765486
                    statefulset.kubernetes.io/pod-name=cassandra-0
Annotations:        <none>
Status:             Pending
IP:
Controlled By:      StatefulSet/cassandra

Now, initially we see some errors in the Pod’s events. Once again, the volume cannot be attached to the new node as it is still attached to the original node.

Normal    Scheduled                Pod    Successfully assigned cassandra/cassandra-0 to a2cfcf41-922d-487b-bdf6-b453d7fb3105
Warning   FailedAttachVolume       Pod    Multi-Attach error for volume "pvc-91600a74-8d16-11e9-b070-005056a2a261" \
                                          Volume is already exclusively attached to one node and can't be attached to another
Warning   FailedMount              Pod    Unable to mount volumes for pod "cassandra-0_cassandra(e3b15fb4-8d29-11e9-b070-005056a2a261)":\
                                          timeout expired waiting for volumes to attach or mount for pod "cassandra"/"cassandra-0". \
                                          list of unmounted volumes=[cassandra-data]. \
                                          list of unattached volumes=[cassandra-data default-token-6fj8b]

At this point, the 6 minute force detach timer kicks off in the reconciler of the attacherDetacher controller. Once the 6 minutes has expired, we see some new events happening on the Pod indicating the the attach of the PV succeeded, and at this point the Pod is up and running once more.

Normal    SuccessfulAttachVolume   Pod    AttachVolume.Attach succeeded for volume "pvc-91600a74-8d16-11e9-b070-005056a2a261"
Normal    Pulling                  Pod    pulling image "harbor.rainpole.com/library/cassandra:v11"
Normal    Pulled                   Pod    Successfully pulled image "harbor.rainpole.com/library/cassandra:v11"
Normal    Created                  Pod    Created container
Normal    Started                  Pod    Started container
$ kubectl describe pod cassandra-0
Name:               cassandra-0
Namespace:          cassandra
Priority:           0
PriorityClassName:  <none>
Node:               a2cfcf41-922d-487b-bdf6-b453d7fb3105/10.27.51.187
Start Time:         Wed, 12 Jun 2019 16:51:06 +0100
Labels:             app=cassandra
                    controller-revision-hash=cassandra-589c765486
                    statefulset.kubernetes.io/pod-name=cassandra-0
Annotations:        <none>
Status:             Running
IP:                 10.200.100.10
Controlled By:      StatefulSet/cassandra

Let’s finish off our test by verifying that the contents of the Cassandra DB is still intact.

$ kubectl exec -it cassandra-0 -n cassandra -- cqlsh
Connected to K8Demo at 127.0.0.1:9042.
[cqlsh 5.0.1 | Cassandra 3.9 | CQL spec 3.4.2 | Native protocol v4]
Use HELP for help.
cqlsh> use mydemo;
cqlsh:mydemo> select * from emp;
emp_id | emp_city | emp_name | emp_phone | emp_sal
--------+----------+----------+-----------+---------
    100 |     Cork |   Cormac |       999 | 1000000
(1 rows)
cqlsh:mydemo>

LGTM. So that is some interesting behaviour when a node is powered off, and took quite a bit of testing and re-testing to verify that this is indeed correct behaviour. I guess the first thing to highlight is the fact that I was running with a single Replica. In StatefulSets, you would most likely be running with multiple Replicas, and some load-balancer front end. So even if the event of a node powering off like this, leaving one Pod Terminating indefinitely, the application would still be available.

So to recap, in my observations, when a node is powered off, after ~40 seconds of no heartbeats from the node, it is marked as NotReady. After a further ~5 minutes, the Pod enter Terminating state, and remains there indefinitely. Once the kubectl node delete command is issued, the Pod is deleted and since it is a StatefulSet, is re-scheduled on a new node. 6 minutes later the reconciler for the attacherDetacher controller force detaches the PV from the original node, and now it can then be attached to the new node where the Pod has been scheduled, and mounted by the Pod.

If you are interested in the attacherDetacher events, you can use the command kubectl logs -f kube-controller-manager-xxx -n kube-system on native K8s systems. If you are using PKS, you will need to SSH to the master VM, and then change directory to /var/vcap/sys/log/kube-controller-manager. From here you can do a tail -f kube-controller-manager.stderr.log and see the various attacher-detacher events.

Recovery after shutdown/delete

Let’s now assume that whatever the issue with the node that was shutdown has been resolved, and that we now want to bring the node back into the cluster. Well, remember that if this node was powered off, when it is powered on again, this node will still think it has the original PV/VMDK attached. But at this point, we have forcibly detached it to allow it to be attached to another node so that the Pod can be restarted/rescheduled. So we are going to have to address this. Let’s see what  happens when you attempt to power on the original node. Note that you will see the same issue if you try to power on the node from the vSphere client UI, but I wanted to show you a useful PKS command to do a similar task from the command line. I am going to use the cloud-check (cck) BOSH command to request a reboot of the powered off node VM.

$ bosh -d service-instance_3aa87d05-87a3-462a-8f1d-5f07cd5d7bda cck
Using environment '10.27.51.141' as client 'ops_manager'
Using deployment 'service-instance_3aa87d05-87a3-462a-8f1d-5f07cd5d7bda'
Task 1677
Task 1677 | 13:05:57 | Scanning 5 VMs: Checking VM states (00:00:21)
Task 1677 | 13:06:18 | Scanning 5 VMs: 4 OK, 1 unresponsive, 0 missing, 0 unbound (00:00:00)
Task 1677 | 13:06:18 | Scanning 5 persistent disks: Looking for inactive disks (00:00:27)
Task 1677 | 13:06:45 | Scanning 5 persistent disks: 5 OK, 0 missing, 0 inactive, 0 mount-info mismatch (00:00:00)
Task 1677 Started  Thu Jun 13 13:05:57 UTC 2019
Task 1677 Finished Thu Jun 13 13:06:45 UTC 2019
Task 1677 Duration 00:00:48
Task 1677 done

#  Type                Description
2  unresponsive_agent  VM for 'worker/580b6ee0-970e-44cb-896c-9520da134b30 (1)' with cloud ID \
'vm-33849704-0c77-450b-939c-dbea525e6df7' is not responding.
1 problems

1: Skip for now
2: Reboot VM
3: Recreate VM without waiting for processes to start
4: Recreate VM and wait for processes to start
5: Delete VM
6: Delete VM reference (forceful; may need to manually delete VM from the Cloud to avoid IP conflicts)

VM for 'worker/580b6ee0-970e-44cb-896c-9520da134b30 (1)' with cloud ID \
'vm-33849704-0c77-450b-939c-dbea525e6df7' is not responding. (1): 2
Continue? [yN]: y

Task 1678
Task 1678 | 13:07:10 | Applying problem resolutions: VM for 'worker/580b6ee0-970e-44cb-896c-9520da134b30 (1)'\
 with cloud ID 'vm-33849704-0c77-450b-939c-dbea525e6df7' is not responding. (unresponsive_agent 80): Reboot VM (00:04:04)
                    L Error: Unknown CPI error 'Unknown' with message 'File system specific implementation of \
                      OpenFile[file] failed' in 'reboot_vm' CPI method (CPI request ID: 'cpi-735808')
Task 1678 | 13:11:14 | Error: Error resolving problem '2': Unknown CPI error 'Unknown' with message 'File system \
                       specific implementation of OpenFile[file] failed' in 'reboot_vm' CPI method (CPI request ID: \
                       'cpi-735808')
Task 1678 Started  Thu Jun 13 13:07:10 UTC 2019
Task 1678 Finished Thu Jun 13 13:11:14 UTC 2019
Task 1678 Duration 00:04:04
Task 1678 error

Resolving problems for deployment 'service-instance_3aa87d05-87a3-462a-8f1d-5f07cd5d7bda':
  Expected task '1678' to succeed but state is 'error'
Exit code 1

As we can see this request failed, and this is because the node VM is still under the impression that the PV/VMDK is attached, but of course it has been forcibly removed. This is the error as it appears in the vSphere Client UI.

We will need to manually remove the VMDK from the node VM. We saw how to do that earlier through the vSphere Client UI, ensuring that we do not remove the disk from the datastore. With the VMDK removed from the node VM completed, lets try the “cck” once more.

$ bosh -d service-instance_3aa87d05-87a3-462a-8f1d-5f07cd5d7bda cck
Using environment '10.27.51.141' as client 'ops_manager'
Using deployment 'service-instance_3aa87d05-87a3-462a-8f1d-5f07cd5d7bda'
Task 1679
Task 1679 | 13:15:13 | Scanning 5 VMs: Checking VM states (00:00:20)
Task 1679 | 13:15:33 | Scanning 5 VMs: 4 OK, 1 unresponsive, 0 missing, 0 unbound (00:00:00)
Task 1679 | 13:15:33 | Scanning 5 persistent disks: Looking for inactive disks (00:00:27)
Task 1679 | 13:16:00 | Scanning 5 persistent disks: 5 OK, 0 missing, 0 inactive, 0 mount-info mismatch (00:00:00)
Task 1679 Started  Thu Jun 13 13:15:13 UTC 2019
Task 1679 Finished Thu Jun 13 13:16:00 UTC 2019
Task 1679 Duration 00:00:47
Task 1679 done

#  Type                Description
3  unresponsive_agent  VM for 'worker/580b6ee0-970e-44cb-896c-9520da134b30 (1)' with cloud ID \
'vm-33849704-0c77-450b-939c-dbea525e6df7' is not responding.
1 problems

1: Skip for now
2: Reboot VM
3: Recreate VM without waiting for processes to start
4: Recreate VM and wait for processes to start
5: Delete VM
6: Delete VM reference (forceful; may need to manually delete VM from the Cloud to avoid IP conflicts)

VM for 'worker/580b6ee0-970e-44cb-896c-9520da134b30 (1)' with cloud ID \
'vm-33849704-0c77-450b-939c-dbea525e6df7' is not responding. (1): 2

Continue? [yN]: y

Task 1680
Task 1680 | 13:16:10 | Applying problem resolutions: VM for 'worker/580b6ee0-970e-44cb-896c-9520da134b30 (1)' \
with cloud ID 'vm-33849704-0c77-450b-939c-dbea525e6df7' is not responding. (unresponsive_agent 80): Reboot VM (00:01:21)
Task 1680 Started  Thu Jun 13 13:16:10 UTC 2019
Task 1680 Finished Thu Jun 13 13:17:31 UTC 2019
Task 1680 Duration 00:01:21
Task 1680 done

Succeeded

Success! The VM powers back on, and should automatically rejoin the cluster.

$ kubectl get nodes
NAME                                   STATUS   ROLES    AGE   VERSION
52bf9358-7a6e-4552-87b1-6591dc88634c   Ready    <none>   29h   v1.12.4
62fc3f4f-a992-4716-b972-3958bda8b231   Ready    <none>   68m   v1.12.4
a2cfcf41-922d-487b-bdf6-b453d7fb3105   Ready    <none>   22h   v1.12.4
c1b53999-ac8b-4187-aae6-940bf61b4e2b   Ready    <none>   29h   v1.12.4

That completes our overview of failure handling, focusing on node failures/shutdowns. Hope it helps to clarify some of the expected behaviors.

Summary

If you’ve read this far, you’re probably asking yourself why is this not handled automatically? For example, why isn’t there a force detach when the node isn’t accessible for a period of time? That is a great question, and is the same question a lot of people are asking in the Kubernetes community. I want to point out that this is not just behavior that is seen with vSphere, but is seen on lots of other platforms. I know there are a few discussions going on in the community about whether or not this is desired behaviour, and if there is something that could be done to prevent Pods remaining in a Terminating state indefinitely. These are still being debated but at the moment, this is how Kubernetes behaves.

Manifests used in this demo can be found on my vsphere-storage-101 github repo.

The post Kubernetes Storage on vSphere 101 – Failure Scenarios appeared first on CormacHogan.com.

Kubernetes Storage on vSphere 101 – NFS revisited

$
0
0

In my most recent 101 post on ReadWriteMany volumes, I shared an example whereby we created an NFS server in a Pod which automatically exported a File Share. We then mounted the File Share to multiple NFS client Pods deployed in the same namespace. We saw how multiple Pods were able to write to the same ReadWriteMany volume, which was the purpose of the exercise. I received a few questions on the back on that post relating to the use of Services. In particular, could an external NFS client, even one outside of the K8s cluster, access a volume from an NFS Server running in a Pod?

Therefore, in this post, we will look at how to do just that. We will be creating a Service that can be used by external clients to mount a File Share from an NFS Server running in a K8s Pod. To achieve this, we will be looking at a new Service type that we haven’t seen before, the Load Balancer type. This type means that by using this Service, our NFS Server will be associated an External IP address. This should allow our clients to access the NFS exports. If you have a K8s distribution that already has some Container Network Interface (CNI) already deployed that will provide these external IP addresses, e.g. NSX-T, then great. If not, I will introduce you to the MetalLB Load Balancer later in the post which will provide external IP addresses to your Load Balancer Services.

Please note that one would typically use a Load Balancer Service to load balance requests across multiple Pods that are part of a StatefulSet or ReplicaSet. However, we’re not going to delve into that functionality here we are just using it for external access. As I said in previous posts, I may look at doing a post about Services in more detail at some point.

To begin this demonstration, let’s create the NFS Server Pod. Let’s take a look at that now.

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: nfs-server-ext
  namespace: nfs
  labels:
    app: nfs-server-ext
spec:
  serviceName: nfs-service-svc-ext
  replicas: 1
  selector:
    matchLabels:
      app: nfs-server-ext
  template:
    metadata:
      labels:
        app: nfs-server-ext
    spec:
      containers:
      - name: nfs-server-ext
        image: gcr.io/google_containers/volume-nfs:0.8
        ports:
          - name: nfs
            containerPort: 2049a
          - name: mountd
            containerPort: 20048
          - name: rpcbind
            containerPort: 111
        securityContext:
          privileged: true
        volumeMounts:
        - name: nfs-export
          mountPath: /exports
  volumeClaimTemplates:
  - metadata:
      name: nfs-export
      annotations:
        volume.beta.kubernetes.io/storage-class: nfs-sc
    spec:
      accessModes: [ "ReadWriteOnce" ]
      resources:
        requests:
          storage: 5Gi

I am deploying my NFS Server as a StatefulSet, but with only a single replica. This simply means that should the Pod fail, the StatefulSet will take care of restarting it, etc. You can review StatefulSets here. What this manifest YAML does is instructs K8s to create a 5GB volume from the storage defined in my StorageClass nfs-sc and present it to the NFS server container as /exports. The NFS Server container (running in the Pod) is configured to automatically export the directory /exports as a File Share. Basically whatever size of volume we add to the manifest with the name nfs-export is automatically exported. I have opened 3 container ports for the NFS server, 2049 for nfs, 20048 for mountd and 111 for the portmapper/rpcbind. These are required for NFS to work. Let’s look at the StorageClass next:

kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: nfs-sc
provisioner: kubernetes.io/vsphere-volume
parameters:
    diskformat: thin
    storagePolicyName: raid-1
    datastore: vsanDatastore

This StorageClass is referencing the VCP (vSphere Cloud Provider) storage driver called vsphere-volume. It is stating the volumes should be instantiated on the vSAN datastore, and the policy used should be the Storage Policy called raid-1. Nothing really new here – we have seen this many times in previous 101 posts. If you need to review StorageClasses, you can do that here.

Now we come to the interesting bit – the service. It includes a Load Balancer reference so that it can be assigned an external IP address. I am fortunate in that I have a lab setup with NSX-T, meaning that NSX-T is configured with a Floating IP Pool to provide me with external IP addresses when I need them for Load Balancer service types. Access to NSX-T is not always possible, so in those cases, I have used MetalLB in the past. To deploy a MetalLB load balancer, you can use the following command to create the necessary Kubernetes objects and privileges:

$ kubectl apply -f https://raw.githubusercontent.com/google/metallb/v0.7.3/manifests/metallb.yaml 
namespace/metallb-system created 
serviceaccount/controller created 
serviceaccount/speaker created 
clusterrole.rbac.authorization.k8s.io/metallb-system:controller created 
clusterrole.rbac.authorization.k8s.io/metallb-system:speaker created 
role.rbac.authorization.k8s.io/config-watcher created 
clusterrolebinding.rbac.authorization.k8s.io/metallb-system:controller created 
clusterrolebinding.rbac.authorization.k8s.io/metallb-system:speaker created 
rolebinding.rbac.authorization.k8s.io/config-watcher created 
daemonset.apps/speaker created 
deployment.apps/controller created

Once the MetalLB Load Balancer is deployed, it is simply a matter of creating a ConfigMap with a pool of IP address for it to use  for any Load Balancer service types that required external IP addresses. Here is a sample ConfigMap YAML that I have used in the past, with a range of external IP addresses configured. Modify the IP address range appropriately for your environment. Remember that these IP addresses must be on a network that can be accessed by your NFS clients, so they can mount the exported filesystems from the NFS Server Pod:

$ cat layer2-config.yaml
apiVersion: v1 
kind: ConfigMap 
metadata:  
  namespace: metallb-system  
  name:config
data:  
  config: |    
    address-pools:    
    - name: my-ip-space      
      protocol: layer2      
      addresses:      
      - 10.27.51.172-10.27.51.178
$ kubectl apply -f layer2-config.yaml 
configmap/config created

Now when your service starts with type Load Balancer, it will be allocated one of the IP addresses for the pool of IP addresses in the ConfigMap.

Since my Container Network Interface (CNI) is NSX-T, I don’t need to worry about that. As soon as I specify type: LoadBalancer in my Service manifest file, NSX-T will retrieve an available address from the preconfigured pool of Floating IP addresses, and allocate it to my service. Here is my manifest for the NFS server service, which opens the same network ports as our NFS Server container running in the Pod:

apiVersion: v1
kind: Service
metadata:
  labels:
    app: nfs-server-svc-ext
  name: nfs-server-svc-ext
  namespace: nfs
spec:
  ports:
    - name: nfs
      port: 2049
    - name: mountd
      port: 20048
    - name: rpcbind
      port: 111
  selector:
    app: nfs-server-ext
  type: LoadBalancer

Note the selector, and how it matches the label in the Pod. Backing Pods are both discovered and connected using the selector.

The next step is to go ahead and deploy the StorageClass and StatefulSet for the NFS Server. We will verify that the PVC, PV and Pod get created accordingly.

$ kubectl create -f nfs-sc.yaml
storageclass.storage.k8s.io/nfs-sc created

$ kubectl create -f nfs-server-sts-ext.yaml 
statefulset.apps/nfs-server-ext created 

$ kubectl get pv 
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                             STORAGECLASS   REASON   AGE 
pvc-daa8629c-9746-11e9-8893-005056a27deb   5Gi        RWO            Delete           Bound    nfs/nfs-export-nfs-server-ext-0   nfs-sc                  111m 

$ kubectl get pvc 
NAME                          STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE 
nfs-export-nfs-server-ext-0   Bound    pvc-daa8629c-9746-11e9-8893-005056a27deb   5Gi        RWO            nfs-sc         112m 

$ kubectl get pod 
NAME               READY   STATUS    RESTARTS   AGE 
nfs-server-ext-0   1/1     Running   0          93s

The Pod is up and running. Now let’s deploy our Load Balancer Service and check the Endpoint get created correctly.

$ kubectl create -f nfs-server-svc-ext-lb.yaml
service/nfs-server-svc-ext created

$ kubectl get svc
NAME                 TYPE           CLUSTER-IP       EXTERNAL-IP                 PORT(S)                                        AGE
nfs-server-svc-ext   LoadBalancer   10.100.200.235   100.64.0.1,192.168.191.67   2049:32126/TCP,20048:31504/TCP,111:30186/TCP   5s

$ kubectl get endpoints
NAME                 ENDPOINTS                                         AGE
nfs-server-svc-ext   172.16.5.2:20048,172.16.5.2:111,172.16.5.2:2049   9s

$ kubectl describe endpoints nfs-server-svc-ext
Name:         nfs-server-svc-ext
Namespace:    nfs
Labels:       app=nfs-server-svc-ext
Annotations:  <none>
Subsets:
  Addresses:          172.16.5.2
  NotReadyAddresses:  <none>
  Ports:
    Name     Port   Protocol
    ----     ----   --------
    mountd   20048  TCP
    rpcbind  111    TCP
    nfs      2049   TCP

Events:  <none>

All looks good. I have marked the external IP in blue above. This address, allocated by NSX-T from my floating IP address pool, can be reached from other apps running in my environment, so long as they can reach the 192.168.191.67 IP address. Not only do I get an IP address, but I also get a DNS entry for my service (which is the same name as the service). There is also load balancing of requests to the IP address which redirects any requests across all of the back-end Pods that implement the service (although I only have 1 back-end Pod, so not really relevant here).

It is also useful to query the endpoints since these will only populate once the Pod has mapped/bound successfully to the service. It ensures that your labeling and selector are working correctly between Pod and Service. It also displays the IP address of the NFS Server Pod, and any configured ports.

With the Load Balancer service now created, it should mean that a pod, virtual machine or bare-metal server running NFS client software should be able to mount the share exported from my NFS server Pod.

Now, if you had used the MetalLB Load Balancer, then you would expect the external IP address allocated to the Load Balancer Service to be one of the range of IP addresses placed in the ConfigMap for the MetalLB.

You might ask why I don’t just scale out the StatefulSet to 3 replicas or something and allow the requests to load balance? The thing to keep in mind is that this NFS Server has no built in replication, and each Pod is using its own unique Persistent Volume (PV). So let’s say my client keeps connecting to Pod-0 and writes lots of data to Pod-0’s PV. Now Pod-0 fails so I am now redirected/proxied to Pod-1. Well, Pod-1’s PV will have none of the data that I wrote to POD-0’s PV – it will be empty since there is no replication built into the NFS Server Pod. Note that Kubernetes does not do any replication of data in a ReplicaSet or a StatefulSet – it is up to the application running in the Pods to do this.

Note: I usually use a showmount -e command pointed at the NFS server to see what it shares/directories it is exporting. From my Ubuntu client VM below, you can see that it is not working for the NFS server Pod, but if I point it at another NFS server IP address (in a VM) it works. I’m unsure why it is not working for my NFS Server Pod.

# showmount -e 192.50.0.4
Export list for 192.50.0.4:
/nfs *

# showmount -e 192.168.191.67
clnt_create: RPC: Port mapper failure - Unable to receive: errno 111 (Connection refused)
#

However, the rpcinfo -p command works just fine when pointed at the NFS Server Pod.

# rpcinfo -p 192.168.191.67
   program vers proto   port  service
    100000    4   tcp    111  portmapper
    100000    3   tcp    111  portmapper
    100000    2   tcp    111  portmapper
    100000    4   udp    111  portmapper
    100000    3   udp    111  portmapper
    100000    2   udp    111  portmapper
    100005    3   udp  20048  mountd
    100005    3   tcp  20048  mountd
    100003    3   tcp   2049  nfs
    100003    4   tcp   2049  nfs
    100227    3   tcp   2049  nfs_acl
    100003    3   udp   2049  nfs
    100227    3   udp   2049  nfs_acl
    100021    1   udp  59128  nlockmgr
    100021    3   udp  59128  nlockmgr
    100021    4   udp  59128  nlockmgr
    100021    1   tcp  43089  nlockmgr
    100021    3   tcp  43089  nlockmgr
    100021    4   tcp  43089  nlockmgr
    100024    1   udp  35175  status
    100024    1   tcp  46455  status

And for the final test, can we actually mount the exported share from the NFS server Pod to my client VM sitting outside the cluster?

# mount -t nfs 192.168.191.67:/exports /demo
# cd /demo
# touch my-new-file
# ls
index.html  lost+found  my-new-file
#

LGTM. So what is happening here? In my example, my Load Balancer (provided by the NSX-T CNI) has provided an IP address for my Service. As the Service receives NFS client requests on what could be termed a virtual IP address of 192.168.191.67, these requests are being redirected or proxied to our back-end NFS Server Pod on 172.16.5.2. This is all handled by the kube-proxy daemon which we discussed briefly in the failure scenarios post. It takes care of configuring the network on its K8s node so that network requests to the virtual/external IP address are redirected to the back-end Pod(s). In this way, we have managed to expose an internal Kubernetes Pod based application to the outside world. So not only can clients within the cluster access these resources, but so can clients outside of Kubernetes.

Manifests used in this demo (as well as previous 101 blog posts) can be found on my vsphere-storage-101 github repo.

The post Kubernetes Storage on vSphere 101 – NFS revisited appeared first on CormacHogan.com.

Kubernetes on vSphere 101 – Services

$
0
0

This will be last article in the 101 series, as I think I have covered off most of the introductory storage related items at this point. One object that came up time and again during the series was services. While not specifically a storage item, it is a fundamental building block of Kubernetes applications. In the 101 series, we came across a “headless” service with the Cassandra StatefulSet demo. This was where service type ClusterIP was set to None. When we started to look at ReadWriteMany volumes, we used NFS to demonstrate these volumes in action. In the first NFS example, we came across a blank ClusterIP entry. This was the service type entry when NFS client Pods were mounting file shares from an NFS server Pod. We then looked at a Load Balancer type service, which we used to allow external NFS clients outside of the K8s cluster mount a file shared from an NFS Server Pod.

When a service is created, it typically gets (1) a virtual IP address, (2) a DNS entry and (3) networking rules that ‘proxy’ or redirects the network traffic to the Pod/Endpoint that actually provides the service. When that virtual IP address receives traffic, kube-proxy is responsible for redirecting the traffic to the correct back-end Pod/Endpoint. You might ask what the point of a service is? Well, services address an issue where Pods can come and go, and each time Pods are restarted, they most likely get new IP addresses. This makes it difficult to maintain connectivity/communication to them, especially for clients. Through services, K8s provides a mechanism to maintain a unique IP address for the lifespan of the service. Clients can then be configured to talk to the service, and traffic to the service will be load balanced across all the Pods that are connected to it.

At this point, lets revisit some of the internal K8s components that we already came across. It will be useful to appreciate the purpose of each in the context of services.

kubeDNS / coreDNS revisited

In the failure scenarios post, we talked a little about some of the internal components of a K8s cluster. When a service is created, it is assigned a virtual IP address. This IP address is added to DNS to make service discovery easier. The DNS name-service is implemented with by coreDNS or kubeDNS. The name server implementation depends on your distribution. PKS uses coreDNS whilst upstream K8s distros use kubeDNS.

kube-proxy revisited

Another one of the internal components we touched on was kube-proxy.  I described the kube-proxy as the component that configures K8s node networking. It routes network requests from virtual IP addresses of a service to the endpoint (Pod) implementing the service, anywhere in the cluster. Thus, a front-end Pod on one K8s node would be able to seamlessly communicate with a back-end Pod on a completely different K8s node in the same K8s cluster.

Why don’t we go ahead and tease out these services in some further detail, and look at some of the possible service types that you may come across. I am going to use the following manifest files for my testing. The first is an nginx web-server deployment which has 2 replicas, thus there will be two back-end (behind a service) Pods deployed.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
spec:
  selector:
    matchLabels:
      app: nginx
  replicas: 2
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:latest
        ports:
        - containerPort:80

I am also using this manifest for another simple busybox Pod to allow me to do Pod to Pod testing in the cluster.

apiVersion: v1
kind: Pod
metadata:
  name: demo-sc-pod
spec:
  containers:
  - name: busybox
    image: "k8s.gcr.io/busybox”

Now we need to create a service manifest, but we will be modifying this YAML file with each test. Let’s start our testing with ClusterIP.

1. ClusterIP

ClusterIP can have a number of different values when it comes to services. Let’s look at the most common.

1.1 clusterIP set to “” blank

This is the default service type in Kubernetes. With ClusterIP set to “” or blank, the service is accessible within the cluster only – no external access is allowed from outside of the cluster. There is also no direct access to the back-end Pods via DNS as the Pods are not added to DNS. Instead, there is a single DNS name for the group of back-end Pods (of course the Pods are still accessible via IP Address).

Lets assume that this service has been assigned to a group of Pods running some application. Access to the application is available via the virtual IP address of the service, or via the DNS name assigned to the service. When a client accesses the service via the virtual IP address or DNS name, the first request is proxied (by kube-proxy) to the first Pod, the second request goes to the second Pod, and so on. Requests are Load Balanced across all Pods.

Let’s now deploy our Pods and service (using the below manifest), and look at the behaviour in more detail. The service manifest has the clusterIP set to “” blank. Note also that the port matches the container port in the deployment (80), and that the selector for the service is the same as the label of the deployment (nginx).

apiVersion: v1
kind: Service
metadata:
  labels:
    app: nginx
  name: nginx-svc
spec:
  clusterIP: ""
  ports:
    - name: http
      port: 80
  selector:
    app: nginx

Now we shall deploy the ‘deployment’ and the ‘service’ manifests. Once these have been deployed, we will look at the service, and its endpoints. The endpoints should be the two Pods which are part of the deployment. We will also see that there is no external IP associated with the service. It is internal only.

$ kubectl get deploy
NAME               DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
nginx-deployment   2         2         2            2           103m

$ kubectl get pods -o wide
NAME                               READY   STATUS    RESTARTS   AGE    IP           NODE                                   NOMINATED NODE
nginx-deployment-8b8f7ccb4-qct48   1/1     Running   0          104m   172.16.6.3   6ac7f51f-af3f-4b55-8f47-6449a8a7c365   <none>
nginx-deployment-8b8f7ccb4-xpk8p   1/1     Running   0          104m   172.16.6.2   2164e6f0-1b8a-4edd-b268-caaf26792dd4   <none>

$ kubectl get svc
NAME        TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)   AGE
nginx-svc   ClusterIP   10.100.200.20   <none>        80/TCP    3m31s

$ kubectl get endpoints
NAME        ENDPOINTS                     AGE
nginx-svc   172.16.6.2:80,172.16.6.3:80   3m36s

$ kubectl describe endpoints
Name:         nginx-svc
Namespace:    svc-demo
Labels:       app=nginx
Annotations:  <none>
Subsets:
  Addresses:          172.16.6.2,172.16.6.3
  NotReadyAddresses:  <none>
  Ports:
    Name  Port  Protocol
    ----  ----  --------
    http  80    TCP

Events:  <none>

We will now deploy the simple busybox Pod, and after connecting to the Pod, we will try to reach the service. This should be possible, using both the service name nginx-svc, and also the assigned ClusterIP address of 10.100.200.20. Note also how the service name resolves to the IP address.  I will use the wget command to verify that I can pull down the nginx landing page (which is a simple welcome message) from the back-end Pods.

$ kubectl get pods -o wide
NAME                               READY   STATUS    RESTARTS   AGE    IP           NODE                                   NOMINATED NODE
demo-nginx-pod                     1/1     Running   0           3m    172.16.6.4   2164e6f0-1b8a-4edd-b268-caaf26792dd4   <none>
nginx-deployment-8b8f7ccb4-qct48   1/1     Running   0          104m   172.16.6.3   6ac7f51f-af3f-4b55-8f47-6449a8a7c365   <none>
nginx-deployment-8b8f7ccb4-xpk8p   1/1     Running   0          104m   172.16.6.2   2164e6f0-1b8a-4edd-b268-caaf26792dd4   <none>

$ kubectl exec -it demo-nginx-pod -- /bin/sh
/ # cat /etc/hosts
# Kubernetes-managed hosts file.
127.0.0.1       localhost
::1     localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
fe00::0 ip6-mcastprefix
fe00::1 ip6-allnodes
fe00::2 ip6-allrouters
172.16.6.4      demo-nginx-pod

/ # nslookup nginx-svc
Server:    10.100.200.10
Address 1: 10.100.200.10 kube-dns.kube-system.svc.cluster.local

Name:      nginx-svc
Address 1: 10.100.200.20 nginx-svc.svc-demo.svc.cluster.local

/ # wget -O - nginx-svc
Connecting to nginx-svc (10.100.200.20:80)
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
    body {
        width: 35em;
        margin: 0 auto;
        font-family: Tahoma, Verdana, Arial, sans-serif;
    }
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>

<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>

<p><em>Thank you for using nginx.</em></p>
</body>
</html>
-                    100% |***************************************************************************************************************************|   612   0:00:00 ETA

/ #

This all looks very good. I can reach the nginx server running on the deployment Pods via the service. Now the final point to make is that because this is ClusterIP set to “” (blank), the Pods that are the endpoints for the service are not added to DNS. However, the demo Pod demo-nginx-pod which we are using to test the service, which has been added to DNS. We can verify this once again from the busybox Pod, where we can see it resolve itself, but not the deployment Pods.

/ # nslookup demo-nginx-pod 
Server:    10.100.200.10 
Address 1: 10.100.200.10 kube-dns.kube-system.svc.cluster.local 

Name:      demo-nginx-pod 
Address 1: 172.16.6.4 demo-nginx-pod

/ # nslookup nginx-deployment-8b8f7ccb4-qct48
Server:    10.100.200.10
Address 1: 10.100.200.10 kube-dns.kube-system.svc.cluster.local

nslookup: can't resolve 'nginx-deployment-8b8f7ccb4-qct48'

/ # nslookup 172.16.6.3
Server:    10.100.200.10
Address 1: 10.100.200.10 kube-dns.kube-system.svc.cluster.local

Name:      172.16.6.3
Address 1: 172.16.6.3
/ #

That completes our first look at the clusterIP service. Let’s now look at the subtle differences with a headless service.

1.2 clusterIP set to “None” (aka headless)

With Cluster IP explicitly set to “None”, the service is once again accessible within the cluster only. However, the difference to this setting compared to the last one is that the DNS name of the service resolves to the IP addresses of the individual Pods, not its own virtual IP address. This service is typically used when you want to control which specific Pod or Pods that you want to communicate with. Let’s look at this in more detail now.

We will use the same setup as before. The only difference this time is that the service manifest has a single change, highlighted in blue below.

apiVersion: v1
kind: Service
metadata:
  labels:
    app: nginx
  name: nginx-svc
spec:
  clusterIP: "None"
  ports:
    - name: http
      port: 80
  selector:
    app: nginx

We will now do the same set of tests as before, and note the differences. One major difference is there is no internal Cluster IP address associated with the service. This now appears as None.

$ kubectl create -f nginx-service.yaml
service/nginx-svc created

$ kubectl get svc
NAME        TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)    AGE
nginx-svc   ClusterIP   None         <none>        80/TCP   5s

$ kubectl get endpoints
NAME        ENDPOINTS                         AGE
nginx-svc   172.16.6.2:80,172.16.6.3:80       10s

$ kubectl describe endpoints
Name:         nginx-svc
Namespace:    svc-demo
Labels:       app=nginx
Annotations:  <none>
Subsets:
  Addresses:          172.16.6.2,172.16.6.3
  NotReadyAddresses:  <none>
  Ports:
    Name  Port  Protocol
    ----  ----  --------
    http  80    TCP

Events:  <none>

$ kubectl exec -it demo-nginx-pod -- /bin/sh
/ # cat /etc/hosts
# Kubernetes-managed hosts file.
127.0.0.1       localhost
::1     localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
fe00::0 ip6-mcastprefix
fe00::1 ip6-allnodes
fe00::2 ip6-allrouters
172.16.6.4      demo-nginx-pod

/ # nslookup demo-nginx-pod
Server:    10.100.200.10
Address 1: 10.100.200.10 kube-dns.kube-system.svc.cluster.local

Name:      demo-nginx-pod
Address 1: 172.16.6.4 demo-nginx-pod

Now we get to the interesting part of the headless service (clusterIP set to “None”). When I resolve the service name from the busybox Pod, I get returned the list of IP address for the back-end Pods rather than a unique IP for the service itself. You can also see the request going to the different Pods in a round-robin basis. (Note: I had read that headless always goes to the first Pod with requests, but that does not seem to be the case with my testing – perhaps this behaviour changed in later version of K8s).

/ # nslookup nginx-svc
Server:    10.100.200.10
Address 1: 10.100.200.10 kube-dns.kube-system.svc.cluster.local

Name:      nginx-svc
Address 1: 172.16.6.2
Address 2: 172.16.6.3

/ # ping nginx-svc
PING nginx-svc (172.16.6.3): 56 data bytes
64 bytes from 172.16.6.3: seq=0 ttl=64 time=1.792 ms
64 bytes from 172.16.6.3: seq=1 ttl=64 time=0.284 ms
64 bytes from 172.16.6.3: seq=2 ttl=64 time=0.332 ms
64 bytes from 172.16.6.3: seq=3 ttl=64 time=0.384 ms
^C
--- nginx-svc ping statistics ---
4 packets transmitted, 4 packets received, 0% packet loss
round-trip min/avg/max = 0.284/0.698/1.792 ms

/ # ping nginx-svc
PING nginx-svc (172.16.6.2): 56 data bytes
64 bytes from 172.16.6.2: seq=0 ttl=64 time=1.022 ms
64 bytes from 172.16.6.2: seq=1 ttl=64 time=0.218 ms
64 bytes from 172.16.6.2: seq=2 ttl=64 time=0.231 ms
64 bytes from 172.16.6.2: seq=3 ttl=64 time=0.217 ms
^C
--- nginx-svc ping statistics ---
4 packets transmitted, 4 packets received, 0% packet loss
round-trip min/avg/max = 0.217/0.422/1.022 ms

/ # wget -O - nginx-svc
Connecting to nginx-svc (172.16.6.3:80)
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
    body {
        width: 35em;
        margin: 0 auto;
        font-family: Tahoma, Verdana, Arial, sans-serif;
    }
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>

<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>

<p><em>Thank you for using nginx.</em></p>
</body>
</html>
-                    100% |***************************************************************************************************************************|   612   0:00:00 ETA
/ #

One final point to note is that the deployment Pods are once again not added to DNS when using this service type.

/ # nslookup demo-nginx-pod
Server: 10.100.200.10
Address 1: 10.100.200.10 kube-dns.kube-system.svc.cluster.local

Name: demo-nginx-pod
Address 1: 172.16.6.4 demo-nginx-pod
/ # nslookup nginx-deployment-8b8f7ccb4-xpk8p
Server: 10.100.200.10
Address 1: 10.100.200.10 kube-dns.kube-system.svc.cluster.local

nslookup: can't resolve 'nginx-deployment-8b8f7ccb4-xpk8p'
/ #

1.3 clusterIP set to “X.X.X.X” IP Address

Let’s briefly look at one last setting. It seems that another option available in clusterIP is to set your own IP address. While I have never needed to use this, according to the official documentation, this could be useful if you have to reuse and existing DNS entry or if you have legacy systems tied to a specific IP address, and you can’t reconfigure.

2. LoadBalancer

As we have seen, ClusterIP services are only accessible from within the cluster. LoadBalancer services exposes the service externally. Kubernetes provides functionality that is similar to ClusterIP=””, and any incoming requests will be load-balanced across all back-end Pods. However, the external load balancer functionality is provided by a third party cloud load balancer provider, in my case, this is provided by NSX-T. As soon as I specify type: LoadBalancer in my Service manifest file, NSX-T will retrieve an available address from the preconfigured pool of Floating IP addresses, and allocate it to my service. As the Service receives client requests on what the external IP address, load balancer has been updated with entries for the Kubernetes pods, so these requests are redirected or proxied to the back-end Pods.

Let’s begin with a look at the modified service manifest, which now includes a load balancer reference.

apiVersion: v1
kind: Service
metadata:
  labels:
    app: nginx
  name: nginx-svc
spec:
  ports:
    - name: http
      port: 80
  selector:
    app: nginx
  type: LoadBalancer
After deploying the service, we see the external IP address populated (in blue below). Now, I’m not 100% sure why we are seeing two IP addresses show up below. One of them (10.64.0.1) is related to the NSX-T Logical Router, whilst the other (192.168.191.69) is the IP address allocated from the floating point pool of IP addresses configured in NSX-T. I’m assuming this is a nuance of the NSX-T implementation.
$ kubectl get svc
NAME        TYPE           CLUSTER-IP       EXTERNAL-IP                 PORT(S)        AGE
nginx-svc   LoadBalancer   10.100.200.178   100.64.0.1,192.168.191.69   80:31467/TCP   8m27s

$ kubectl get endpoints
NAME        ENDPOINTS                     AGE
nginx-svc   172.16.6.2:80,172.16.6.3:80   8m31s

$ kubectl describe endpoints
Name:         nginx-svc
Namespace:    svc-demo
Labels:       app=nginx
Annotations:  <none>
Subsets:
  Addresses:          172.16.6.2,172.16.6.3
  NotReadyAddresses:  <none>
  Ports:
    Name  Port  Protocol
    ----  ----  --------
    http  80    TCP

Events:  <none>
And now for the external test. Can I reach the nginx server on the Pods from outside the cluster? Let’s try a wget from my desktop:
$ wget -O - 192.168.191.69
--2019-07-02 10:48:15-- http://192.168.191.69/
Connecting to 192.168.191.69:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 612 [text/html]
Saving to: ‘STDOUT’

- 0%[ ] 0 --.-KB/s <!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
body {
width: 35em;
margin: 0 auto;
font-family: Tahoma, Verdana, Arial, sans-serif;
}
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>

<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>

<p><em>Thank you for using nginx.</em></p>
</body>
</html>
- 100%[========================================================================================>] 612 --.-KB/s in 0s

2019-07-02 10:48:15 (123 MB/s) - written to stdout [612/612]

Looks like it is working. You could of course also open a browser, and point it to the external IP address. You should see the ‘Welcome to nginx!’ welcome page rendered. One last point – if you queries this from inside the cluster, you would continue to see the internal IP address, as follows:

$ kubectl exec -it demo-nginx-pod -- /bin/sh
/ # nslookup nginx-svc
Server: 10.100.200.10
Address 1: 10.100.200.10 kube-dns.kube-system.svc.cluster.local

Name: nginx-svc
Address 1: 10.100.200.178 nginx-svc.svc-demo.svc.cluster.local
/ #

3. NodePort

The last service I want to discuss is NodePort, something I have used often in the past when I have not had an external load balancer available to my cluster. This is another method of exposing a service outside of the cluster, but rather than using a dedicated virtual IP address, it exposes a port on every K8s node in the cluster. Access to the service is then made via a reference to the node IP address plus exposed port. Let’s look at an example of that next. First, lets look at the manifest file which now sets the type to NodePort.

apiVersion: v1
kind: Service
metadata:
  labels:
    app: nginx
  name: nginx-svc
spec:
  ports:
    - name: http
      port: 80
  selector:
    app: nginx
  type: NodePort

When this service is deployed, you will notice is that it is continues to get allocated a cluster IP address, but now the type is NodePort. There is also no external IP address. The PORT field below is telling us that the nginx server/Pod port 80 is accessible via node port 31027 (in this example).

$ kubectl get svc
NAME        TYPE       CLUSTER-IP       EXTERNAL-IP   PORT(S)        AGE
nginx-svc   NodePort   10.100.200.119   <none>        80:31027/TCP   3m29s

$ kubectl get endpoints nginx-svc
NAME        ENDPOINTS                     AGE
nginx-svc   172.16.6.2:80,172.16.6.3:80   3m50s

$ kubectl describe endpoints nginx-svc
Name:         nginx-svc
Namespace:    svc-demo
Labels:       app=nginx
Annotations:  <none>
Subsets:
  Addresses:          172.16.6.2,172.16.6.3
  NotReadyAddresses:  <none>
  Ports:
    Name  Port  Protocol
    ----  ----  --------
    http  80    TCP

Events:  <none>

Now, in order to access the nginx server/Pod, we need to know the port on which it is running, and the node. The port information is available above. We can get the node information using a combination of kubectl Pod and node commands. First, I can choose a pod, and note which node it is running on. Then I can get the IP address of the node, and use that (along with the port number) to run a wget command against the nginx server/Pod.

$ kubectl get pods -o wide
NAME                               READY   STATUS    RESTARTS   AGE   IP           NODE                                   NOMINATED NODE
nginx-deployment-8b8f7ccb4-qct48   1/1     Running   0          19h   172.16.6.3   6ac7f51f-af3f-4b55-8f47-6449a8a7c365   <none>
nginx-deployment-8b8f7ccb4-xpk8p   1/1     Running   0          19h   172.16.6.2   2164e6f0-1b8a-4edd-b268-caaf26792dd4   <none>

$ kubectl get nodes -o wide
NAME                                   STATUS   ROLES    AGE     VERSION   INTERNAL-IP     EXTERNAL-IP     OS-IMAGE             KERNEL-VERSION      CONTAINER-RUNTIME
2164e6f0-1b8a-4edd-b268-caaf26792dd4   Ready    <none>   6d23h   v1.12.4   192.168.192.4   192.168.192.4   Ubuntu 16.04.5 LTS   4.15.0-43-generic   docker://18.6.1
6ac7f51f-af3f-4b55-8f47-6449a8a7c365   Ready    <none>   6d23h   v1.12.4   192.168.192.5   192.168.192.5   Ubuntu 16.04.5 LTS   4.15.0-43-generic   docker://18.6.1
aaab83d2-b2c4-4c09-a0f4-14c3c234aa7b   Ready    <none>   6d22h   v1.12.4   192.168.192.6   192.168.192.6   Ubuntu 16.04.5 LTS   4.15.0-43-generic   docker://18.6.1
cba0db9c-eb9e-41e3-ba5a-916017af1c98   Ready    <none>   6d23h   v1.12.4   192.168.192.3   192.168.192.3   Ubuntu 16.04.5 LTS   4.15.0-43-generic   docker://18.6.1

$ wget -O - 192.168.192.5:31027
--2019-07-02 11:30:08--  http://192.168.192.5:31027/
Connecting to 192.168.192.5:31027... connected.
HTTP request sent, awaiting response... 200 OK
Length: 612 [text/html]
Saving to: ‘STDOUT’
-                                            0%[                                                                                         ]       0  --.-KB/s               <!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
    body {
        width: 35em;
        margin: 0 auto;
        font-family: Tahoma, Verdana, Arial, sans-serif;
    }
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>

<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>

<p><em>Thank you for using nginx.</em></p>
</body>
</html>
-                                          100%[========================================================================================>]     612  --.-KB/s    in 0s

2019-07-02 11:30:08 (48.3 MB/s) - written to stdout [612/612]

And in fact, you can connect to that port on any of the nodes. If you end up on a node that is not running the nginx Pod, every node in the cluster will have proxied that port to the service, so the request will be proxied/routed/redirected to the back-end Pod. Therefore I could run the wget against any of the nodes in the cluster, and as long as I was using the correct port number, my wget request would succeed.

There are some other service types, such as ExternalName and ExternalIPs that I have never used, but I will add a short note for completeness. My understanding is that this ExternalName is used when you want to map your service to an external DNS name. ExternalIPs is when your nodes have external IP addresses, and you want to use those to reach your service. From what I read, there is no proxying done in Kubernetes for these service types.  You can read more about them in the official documentation.

That completes my overview of services in Kubernetes. These are the ones that I have come across most often when working with Kubernetes on vSphere.

Manifests used in this demo can be found on my vsphere-storage-101 github repo.

The post Kubernetes on vSphere 101 – Services appeared first on CormacHogan.com.

Kubernetes on vSphere 101 – Ingress

$
0
0

As I was researching content for the 101 series, I came across the concept of an Ingress. As I hadn’t come across it before, I wanted to do a little more research on what it actually did. It seems that in some ways, they achieve the same function as a Load Balancer in so far as they provide a mean of allowing external traffic into your cluster. But they are significantly different in how they do this. If we take the Load Balancer service type first, then for every service that is exposed via a Load Balancer, a unique external IP address needs to be assigned to each service. Ingress, on the other hands, is not a service.  It behaves as a sort of entry point to your cluster, using a single IP address, sitting in front of multiple services. The request can be ‘routed’ to the appropriate service, based on how the request is made. The most common example of where ingress is used is with web servers. For example, I may run an online store, where different services are offered, e.g. search for an item, add item to basket, display basket contents, etc. Depending on the URL, I can redirect that request to a different service at the back-end, all from the same web site/URL. So cormachogan.com/add-to-basket could be directed to the ‘add-to-basket’ service backed by a set of Pods running a particular service, whilst cormachogan.com/search could be redirected to a different service backed by a different set of Pods.

To summarize, how this differs from a Load Balancer service is that a Load Balancer distributes requests across back-end Pods of the same type offering a service, consuming a unique external IP address per service. Whereas ingress will route requests to a specific back-end service (based on a URL, for example) when there are multiple different services available in the back-end. As mentioned, one typically comes across ingress when you have multiple services exposed via the same IP address, and all these services uses the same layer 7 protocol, which more often than not is HTTP.

Be aware that an Ingress object does nothing by itself; it requires an ingress controller to operate. Having said that, the other thing that my research introduced me to was Contour. Contour is an Ingress Controller which VMware acquired along with the Heptio acquisition. It works by deploying Envoy, an open source edge and service proxy. What is neat about Contour is that supports dynamic configuration updates. I thought it might be interesting to try to use Contour and Envoy to create my own ingress for something home grown in the lab so as to demonstrate Ingress.

Deploy Contour

The roll-out of Contour is very straight-forward. The team have created a single manifest/YAML file with all of the necessary object definitions included (see the first command below for the path). This creates a new contour namespace, creates the service and service accounts, the Custom Resource Definitions, and everything else that is required. Contour is rolled out as a deployment with 2 Replicas. Each Replica Pod contains both an Envoy container and a Contour container. Let’s roll it out and take a look. FYI, I am deploying this on my PKS 1.3 environment, which has NSX-T for the CNI and Harbor for its image repository. There is a lot of output here, but it should give you an idea as to how Contour and Envoy all hang together.

$ kubectl apply -f https://j.hept.io/contour-deployment-rbac
namespace/heptio-contour created
serviceaccount/contour created
customresourcedefinition.apiextensions.k8s.io/ingressroutes.contour.heptio.com created
customresourcedefinition.apiextensions.k8s.io/tlscertificatedelegations.contour.heptio.com created
deployment.apps/contour created
clusterrolebinding.rbac.authorization.k8s.io/contour created
clusterrole.rbac.authorization.k8s.io/contour created
service/contour created

$ kubectl change-ns heptio-contour
namespace changed to "heptio-contour"

$ kubectl get svc
NAME      TYPE           CLUSTER-IP       EXTERNAL-IP                 PORT(S)                      AGE
contour   LoadBalancer   10.100.200.184   100.64.0.1,192.168.191.70   80:31042/TCP,443:31497/TCP   35s

$ kubectl get crd
NAME                                           CREATED AT
clustersinks.apps.pivotal.io                   2019-06-25T11:35:17Z
ingressroutes.contour.heptio.com               2019-07-10T08:46:15Z
sinks.apps.pivotal.io                          2019-06-25T11:35:17Z
tlscertificatedelegations.contour.heptio.com   2019-07-10T08:46:15Z

$ kubectl get deploy
NAME      DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
contour   2         2         2            0           56s

$ kubectl get clusterrole | grep contour
contour

$ kubectl describe clusterrole contour
Name:         contour
Labels:       <none>
Annotations:  kubectl.kubernetes.io/last-applied-configuration:
                {"apiVersion":"rbac.authorization.k8s.io/v1beta1","kind":"ClusterRole","metadata":{"annotations":{},"name":"contour"},"rules":[{"apiGroups...
PolicyRule:
  Resources                                     Non-Resource URLs  Resource Names  Verbs
  ---------                                     -----------------  --------------  -----
  ingressroutes.contour.heptio.com              []                 []              [get list watch put post patch]
  tlscertificatedelegations.contour.heptio.com  []                 []              [get list watch put post patch]
  services                                      []                 []              [get list watch]
  ingresses.extensions                          []                 []              [get list watch]
  nodes                                         []                 []              [list watch get]
  configmaps                                    []                 []              [list watch]
  endpoints                                     []                 []              [list watch]
  pods                                          []                 []              [list watch]
  secrets                                       []                 []              [list watch]

$ kubectl describe clusterrolebinding contour
Name:         contour
Labels:       <none>
Annotations:  kubectl.kubernetes.io/last-applied-configuration:
                {"apiVersion":"rbac.authorization.k8s.io/v1beta1","kind":"ClusterRoleBinding","metadata":{"annotations":{},"name":"contour"},"roleRef":{"a...
Role:
  Kind:  ClusterRole
  Name:  contour
Subjects:
  Kind            Name     Namespace
  ----            ----     ---------
  ServiceAccount  contour  heptio-contour

$ kubectl get replicasets
NAME                 DESIRED   CURRENT   READY   AGE
contour-5cd6986479   2         2         0       4m48s

$ kubectl get pods
NAME                       READY   STATUS    RESTARTS   AGE
contour-864d797fc6-t8tb9   2/2     Running   0          32s
contour-864d797fc6-z8x4m   2/2     Running   0          32s

$ kubectl describe pod contour-864d797fc6-t8tb9
Name:               contour-864d797fc6-t8tb9
Namespace:          heptio-contour
Priority:           0
PriorityClassName:  <none>
Node:               6ac7f51f-af3f-4b55-8f47-6449a8a7c365/192.168.192.5
Start Time:         Wed, 10 Jul 2019 10:02:40 +0100
Labels:             app=contour
                    pod-template-hash=864d797fc6
Annotations:        prometheus.io/path: /stats/prometheus
                    prometheus.io/port: 8002
                    prometheus.io/scrape: true
Status:             Running
IP:                 172.16.7.2
Controlled By:      ReplicaSet/contour-864d797fc6
Init Containers:
  envoy-initconfig:
    Container ID:  docker://ecc58b57d4ae0329368729d5ae5ae76ac143809090c659256ceceb74c192d2e9
    Image:         harbor.rainpole.com/library/contour:master
    Image ID:      docker-pullable://harbor.rainpole.com/library/contour@sha256:b3c8a2028b9224ad1e418fd6dd70a68ffa62ab98f67b8d0754f12686a9253e2a
    Port:          <none>
    Host Port:     <none>
    Command:
      contour
    Args:
      bootstrap
      /config/contour.json
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Wed, 10 Jul 2019 10:02:43 +0100
      Finished:     Wed, 10 Jul 2019 10:02:44 +0100
    Ready:          True
    Restart Count:  0
    Environment:
      CONTOUR_NAMESPACE:  heptio-contour (v1:metadata.namespace)
    Mounts:
      /config from contour-config (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from contour-token-jctk2 (ro)
Containers:
  contour:
    Container ID:  docker://b2c3764379b26221670ce5953b3cd2e11c90eb80bdd04d31d422f98ed3d4486d
    Image:         harbor.rainpole.com/library/contour:master
    Image ID:      docker-pullable://harbor.rainpole.com/library/contour@sha256:b3c8a2028b9224ad1e418fd6dd70a68ffa62ab98f67b8d0754f12686a9253e2a
    Port:          <none>
    Host Port:     <none>
    Command:
      contour
    Args:
      serve
      --incluster
      --envoy-service-http-port
      8080
      --envoy-service-https-port
      8443
    State:          Running
      Started:      Wed, 10 Jul 2019 10:02:45 +0100
    Ready:          True
    Restart Count:  0
    Liveness:       http-get http://:8000/healthz delay=0s timeout=1s period=10s #success=1 #failure=3
    Readiness:      http-get http://:8000/healthz delay=0s timeout=1s period=10s #success=1 #failure=3
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from contour-token-jctk2 (ro)
  envoy:
    Container ID:  docker://3a7859992c88d29ba4b9a347f817919a302b50e5352e988ca934caad3e0ea934
    Image:         harbor.rainpole.com/library/envoy:v1.10.0
    Image ID:      docker-pullable://harbor.rainpole.com/library/envoy@sha256:bf7970f469c3d2cd54a472536342bd50df0ddf099ebd51024b7f13016c4ee3c4
    Ports:         8080/TCP, 8443/TCP
    Host Ports:    0/TCP, 0/TCP
    Command:
      envoy
    Args:
      --config-path /config/contour.json
      --service-cluster cluster0
      --service-node node0
      --log-level info
    State:          Running
      Started:      Wed, 10 Jul 2019 10:02:52 +0100
    Ready:          True
    Restart Count:  0
    Readiness:      http-get http://:8002/healthz delay=3s timeout=1s period=3s #success=1 #failure=3
    Environment:    <none>
    Mounts:
      /config from contour-config (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from contour-token-jctk2 (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             True
  ContainersReady   True
  PodScheduled      True
Volumes:
  contour-config:
    Type:    EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
  contour-token-jctk2:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  contour-token-jctk2
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type    Reason     Age   From                                           Message
  ----    ------     ----  ----                                           -------
  Normal  Scheduled  43s   default-scheduler                              Successfully assigned heptio-contour/contour-864d797fc6-t8tb9 to 6ac7f51f-af3f-4b55-8f47-6449a8a7c365
  Normal  Pulling    42s   kubelet, 6ac7f51f-af3f-4b55-8f47-6449a8a7c365  pulling image "harbor.rainpole.com/library/contour:master"
  Normal  Pulled     41s   kubelet, 6ac7f51f-af3f-4b55-8f47-6449a8a7c365  Successfully pulled image "harbor.rainpole.com/library/contour:master"
  Normal  Created    41s   kubelet, 6ac7f51f-af3f-4b55-8f47-6449a8a7c365  Created container
  Normal  Started    41s   kubelet, 6ac7f51f-af3f-4b55-8f47-6449a8a7c365  Started container
  Normal  Pulling    40s   kubelet, 6ac7f51f-af3f-4b55-8f47-6449a8a7c365  pulling image "harbor.rainpole.com/library/contour:master"
  Normal  Created    40s   kubelet, 6ac7f51f-af3f-4b55-8f47-6449a8a7c365  Created container
  Normal  Pulled     40s   kubelet, 6ac7f51f-af3f-4b55-8f47-6449a8a7c365  Successfully pulled image "harbor.rainpole.com/library/contour:master"
  Normal  Started    39s   kubelet, 6ac7f51f-af3f-4b55-8f47-6449a8a7c365  Started container
  Normal  Pulling    39s   kubelet, 6ac7f51f-af3f-4b55-8f47-6449a8a7c365  pulling image "harbor.rainpole.com/library/envoy:v1.10.0"
  Normal  Pulled     33s   kubelet, 6ac7f51f-af3f-4b55-8f47-6449a8a7c365  Successfully pulled image "harbor.rainpole.com/library/envoy:v1.10.0"
  Normal  Created    32s   kubelet, 6ac7f51f-af3f-4b55-8f47-6449a8a7c365  Created container
  Normal  Started    32s   kubelet, 6ac7f51f-af3f-4b55-8f47-6449a8a7c365  Started container

The output from the other Pod is pretty much identical to this one. Now we need to figure out an application (or applications) that can sit behind this ingress. I’m thinking of using some simple Nginx web server deployments, whereby on receipt of a request to access /index-a, I am redirected to the ‘a’ service,  and hit the index page on Pod ‘a’, Similarly, on receipt of a request to access /index-b, I am redirected to the ‘b’ service,  and hit the index page on Pod ‘b’. For this, I am going to build some new docker images for my Pod containers, so we can easily tell which services/Pods we are landing on, a or b.

Create some bespoke Nginx images

I mentioned already that I am using Harbor for my registry. What I will show in this section is how to pull down an Nginx image, modify it, then commit those changes and store the updated image in Harbor. The end goal here is that when I connect to a particular path on the web server, I want to show which server I am landing on, either A or B. First, I will change the index.html contents to something that identifies it as A or B, and then rename it to something that we can reference in the ingress manifest later on, either index_a.html or index_b.html, depending on which Pod it is deployed to. If you are not using Harbor, then you can just keep the changed images locally and reference then from there in your manifests.

$ sudo docker images
REPOSITORY                                TAG                 IMAGE ID            CREATED             SIZE
harbor.rainpole.com/library/contour       master              adc3f7fbe3b4        30 hours ago        41.9MB
nginx                                     latest              f68d6e55e065        9 days ago          109MB
harbor.rainpole.com/library/mysql         5.7                 a1aa4f76fab9        4 weeks ago         373MB
harbor.rainpole.com/library/mysql         latest              c7109f74d339        4 weeks ago         443MB
harbor.rainpole.com/library/envoy         v1.10.0             20b550751ccf        3 months ago        164MB
harbor.rainpole.com/library/kuard-amd64   1                   81086a8c218b        5 months ago        19.7MB
harbor.rainpole.com/library/cadvisor      v0.31.0             a38f1319a420        10 months ago       73.8MB
harbor.rainpole.com/library/cassandra     v11                 11aad67b47d9        2 years ago         384MB
harbor.rainpole.com/library/xtrabackup    1.0                 c415dbd7af07        2 years ago         265MB
harbor.rainpole.com/library/volume-nfs    0.8                 ab7049d62a53        2 years ago         247MB
harbor.rainpole.com/library/busybox       latest              e7d168d7db45        4 years ago         2.43MB

$ sudo docker create nginx
18b1a4cea9568972241bda78d711dd71cb4afe1ea8be849b826f8c232429040e

$ sudo docker ps -a
CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS              PORTS               NAMES
18b1a4cea956        nginx               "nginx -g 'daemon of…"   10 seconds ago      Created                                 focused_swanson

$ sudo docker start 18b1a4cea956
18b1a4cea956

$ sudo docker exec -it 18b1a4cea956 bash
root@18b1a4cea956:/# cd /usr/share/nginx/html/
root@18b1a4cea956:/usr/share/nginx/html# cp index.html orig-index.html
root@18b1a4cea956:/usr/share/nginx/html# grep Welcome index.html
<title>Welcome to nginx!</title>
<h1>Welcome to nginx!</h1>

root@18b1a4cea956:/usr/share/nginx/html# sed 's/Welcome to nginx/Welcome to nginx - redirected to A/' orig-index.html > index-a.html

root@18b1a4cea956:/usr/share/nginx/html# grep Welcome index-a.html
<title>Welcome to nginx - redirected to A!</title>
<h1>Welcome to nginx - redirected to A!</h1>

root@18b1a4cea956:/usr/share/nginx/html# rm orig-index.html
root@18b1a4cea956:/usr/share/nginx/html# exit
exit
$

Now that the changes are made, let’s commit it to a new image and push it out to Harbor.

$ sudo docker commit 18b1a4cea956 nginx-a
sha256:6c15ef2087abd7065ce79ca703c7b902ac8ca4a2235d660b58ed51688b7b0164

$ sudo docker tag nginx-a:latest harbor.rainpole.com/library/nginx-a:latest
$ sudo docker push  harbor.rainpole.com/library/nginx-a:latest
The push refers to repository [harbor.rainpole.com/library/nginx-a]
dcce4746f5e6: Pushed
d2f0b6dea592: Layer already exists
197c666de9dd: Layer already exists
cf5b3c6798f7: Layer already exists
latest: digest: sha256:a9ade6ea857b991d34713f1b6c72fd4d75ef1f53dec4eea94e8f61adb5192284 size: 1155
$

Now we need to repeat the process for the other image. We can use the same image as previously, make new changes and commit the new changes.

$ sudo docker exec -it 18b1a4cea956 bash
root@18b1a4cea956:/# cd /usr/share/nginx/html/
root@18b1a4cea956:/usr/share/nginx/html# ls
50x.html  index-a.html
root@18b1a4cea956:/usr/share/nginx/html# grep Welcome index-a.html
<title>Welcome to nginx - redirected to A!</title>
<h1>Welcome to nginx - redirected to A!</h1>

root@18b1a4cea956:/usr/share/nginx/html# sed s'/redirected to A/redirected to B/' index-a.html > index-b.html

root@18b1a4cea956:/usr/share/nginx/html# grep Welcome index-b.html
<title>Welcome to nginx - redirected to B!</title>
<h1>Welcome to nginx - redirected to B!</h1>

root@18b1a4cea956:/usr/share/nginx/html# rm orig-index.html
root@18b1a4cea956:/usr/share/nginx/html# exit
exit

$ sudo docker commit 18b1a4cea956 nginx-b
sha256:29dc4c9bf09c18989781eb56efdbf15f5f44584a7010477a4bd58f5acaf468bd

$ sudo docker tag nginx-b:latest harbor.rainpole.com/library/nginx-b:latest
$ sudo docker push  harbor.rainpole.com/library/nginx-b:latest
The push refers to repository [harbor.rainpole.com/library/nginx-b]
6d67f88e5fa0: Pushed
d2f0b6dea592: Layer already exists
197c666de9dd: Layer already exists
cf5b3c6798f7: Layer already exists
latest: digest: sha256:11e4af3647abeb23358c92037a4c3287bf7b0b231fcd6f06437d75c9367d793f size: 1155
$

Deploy our ingress based application

Excellent. At this point, we now have two different images – one that we can use when a request is received for service A, and the other that we use when a request is received for service B. Let’s now take a look at the manifest files for the nginx application. The deployment and the service should be very straight forward to understand at this point. You can review the 101 deployments post and 101 service posts if you need more details. We will talk about the ingress manifest in more detail though. Let’s begin with the manifests for the deployments. As you can see, these are being deployed initially with a single ReplicaSet but can be scaled out if needed. Note also the image is using nginx-a for deployment ‘a’ and nginx-b for deployment ‘b’.

$ cat nginx-a-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-a-deployment
spec:
  selector:
    matchLabels:
      app: nginx-a
  replicas: 1
  template:
    metadata:
      labels:
        app: nginx-a
    spec:
      containers:
      - name: nginx-a
        image: harbor.rainpole.com/library/nginx-a:latest
        ports:
        - containerPort: 80
$ cat nginx-b-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-b-deployment
spec:
  selector:
    matchLabels:
      app: nginx-b
  replicas: 1
  template:
    metadata:
      labels:
        app: nginx-b
    spec:
      containers:
      - name: nginx-b
        image: harbor.rainpole.com/library/nginx-b:latest
        ports:
        - containerPort: 80
Next up are the services, one for each deployment. Once again, these are quite straight-forward, but with ClusterIP, there is no external access. These are simply tagged to the different deployments using the selector.label.
$ cat nginx-a-svc.yaml
apiVersion: v1
kind: Service
metadata:
  labels:
    app: nginx-a
  name: nginx-a
spec:
  ports:
  - port: 80
    protocol: TCP
  selector:
    app: nginx-a
  sessionAffinity: None
  type: ClusterIP
$ cat nginx-b-svc.yaml
apiVersion: v1
kind: Service
metadata:
  labels:
    app: nginx-b
  name: nginx-b
spec:
  ports:
  - port: 80
    protocol: TCP
  selector:
    app: nginx-b
  sessionAffinity: None
  type: ClusterIP
This brings us to the ingress manifest. Let’s take a look at that next.
$ cat nginx-ingress.yaml
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: nginx
spec:
  rules:
  - host: nginx.rainpole.com
    http:
      paths:
      - path: /index-a.html # This page must exist on the A server
        backend:
          serviceName: nginx-a
          servicePort: 80
      - path: /index-b.html # This page must exist on the B server
        backend:
          serviceName: nginx-b
          servicePort: 80
The rules section of the manifest is lookng at the path, and redirecting the request to the appropriate service. The idea here is that an end-user connects to the nginx.rainpole.com (the DNS name of the IP address that will be provided for the Ingress), and depending on whether the full URL is nginx.rainpole.com/index-a.html or nginx.rainpole.com/index-b.html, the request will be routed to the appropriate service, and this Pod/Container/Application. I should then see the correct (A or B) index that we modified in our bespoke images earlier. Otherwise you should not get anything when connecting to the URL.An easy way to visualize the setup is as follows:
Once the application has been deployed, you should see Pods, services and Ingress created, similar to the following (the output below includes the Contour objects as well):
$ kubectl get po,svc,ing
NAME                                      READY   STATUS    RESTARTS   AGE
pod/contour-864d797fc6-nqggp              2/2     Running   0          25h
pod/contour-864d797fc6-wk6vx              2/2     Running   0          25h
pod/nginx-a-deployment-6f6c8df5d6-k8lkz   1/1     Running   0          39m
pod/nginx-b-deployment-c66476f97-wl84p    1/1     Running   0          39m

NAME              TYPE           CLUSTER-IP       EXTERNAL-IP                 PORT(S)                      AGE
service/contour   LoadBalancer   10.100.200.227   100.64.0.1,192.168.191.71   80:31910/TCP,443:31527/TCP   29h
service/nginx-a   ClusterIP      10.100.200.30    <none>                      80/TCP                       39m
service/nginx-b   ClusterIP      10.100.200.195   <none>                      80/TCP                       39m

NAME                       HOSTS                ADDRESS                     PORTS   AGE
ingress.extensions/nginx   nginx.rainpole.com   100.64.0.1,192.168.191.60   80      39m

And now if I try to connect to the DNS name of the application (nginx.rainpole.com), let’s see what I get. First, let’s try to go to the main index.html. Typically we would see the Nginx landing page, but now we get the following:

But, if we choose either /index-a.html or /index-b.html, you will quickly see that we are redirected to the different services and applications at the back-end.

Very good. And that is essentially it. Hopefully you can see the benefits of using this approach over a Load Balancer, especially with the reduced number of IP addresses that need to be assigned. I was able to host both services on a single IP, and redirect to the correct service based on URL path. Otherwise I would have had to use two IP addresses, one for each service.

Now I have only scratched the surface of what Contour + Envoy can do. There is obviously a lot more features than the simple example I have used in this post. Read more about Contour here via this link to GitHub.

Manifests used in this demo can be found on my vsphere-storage-101 github repo.

The post Kubernetes on vSphere 101 – Ingress appeared first on CormacHogan.com.

Viewing all 78 articles
Browse latest View live