Discovering Kubernetes

2020-08-25

The container
The cluster
- Running the container
Exposing the container to the outside: Basics
- The Service
- The Load Balancer
Exposing the container to the outside: Advanced
Conclusion

This website has historically been hosted on a VPS costing me about $10 a month. It's a static blog behind an Nginx server with Let's Encrypt certificates. A VPS, which is basically a full-fledged Linux virtual machine, is a bit much for static pages, since you can get cheaper hosting for those (e.g. for free on GitHub), but I like the flexibility that it provides. Kubernetes makes even less sense, since I can already manage and update my website easily, and it doesn't need to scale.

But I wanted to try anyway, at least as a learning experience. In this post, I will explain how I migrated this traditional VPS deployment to Kubernetes.

The container

Kubernetes runs containers, so we need to containerize the website first.

The following snippet is the Dockerfile. It uses two stages, one to generate the static pages, the other to serve those static pages with Nginx on port 80 (no HTTPS here, we'll let the Kubernetes cluster handle that).

FROM debian:10 AS build

RUN apt-get update \
    && apt-get install -y \
        python3-pip \
        curl

RUN curl -sL https://deb.nodesource.com/setup_12.x | bash -
RUN apt-get update \
    && apt-get install -y nodejs

RUN pip3 install lektor

WORKDIR /app
COPY src src
WORKDIR src
RUN cd webpack && npm install --package-lock-only
RUN lektor build \
        --extra-flag webpack \
        --buildstate-path /tmp/lektor \
        --output-path /app/html
RUN find /app/html

FROM nginx

EXPOSE 80
COPY --from=build /app/html/ /usr/share/nginx/html

I use Lektor for this blog, but you can apply this approach to any other static site generator. You could even run a dynamic website in such a container.

You can test the container with the following commands:

docker build --tag web .
docker run --publish 8000:80 web

And then you can try and visit http://localhost:8000 with your browser.

That was the easy part. The next sections will tackle more complex issues.

The cluster

We expect the following from Kubernetes:

Run the container on a virtual machine.
Expose it on a public IP address on ports 80 and 443.
Get certificates from Let's Encrypt.

Running the container

Before Kubernetes can run the container it needs to download the container image.

I created a private container registry at DigitalOcean and uploaded my image to it:

docker tag web registry.digitalocean.com/<registry_name>/web
docker push registry.digitalocean.com/<registry_name>/web

Ideally, you would use proper tags to control what version of your image will be deployed, but we'll just rely on the implicit latest tag in this post.

Now that you have a registry, you will need a Kubernetes cluster. I used DigitalOcean but there are other options. For the purpose of this tutorial, it is enough to configure it with just one node (e.g. 10$ per month).

Once it is running, you must configure kubectl to authenticate against it and be able to run commands such as the following:

$ kubectl get nodes
NAME        STATUS   ROLES    AGE    VERSION
foo-3jr7e   Ready    <none>   5d6h   v1.16.8

Then, configure the default service account to use the correct credentials for your registry:

kubectl create secret generic regcred \
    --from-file=.dockerconfigjson=<path_to_docker_config_json> \
    --type=kubernetes.io/dockerconfigjson

kubectl patch serviceaccount default \
    --patch '{"imagePullSecrets": [{"name": "regcred"}]}'

You can download the docker-config.json from DigitalOcean, preferably the read-only version since your default service account won't need to push images.

There are other ways to configure authentication to the container registry but I find this one to be practical.

From now on, I'll start showing you YAML files to configure the cluster. You can apply them individually with kubectl apply --filename <yaml_file> but what I usually do is put them all in a directory and run kubectl apply --recursive --filename <directory> every time I add a new one or change an existing one. It might not be the best way to deploy to Kubernetes but it sure is simple.

This first YAML file will make Kubernetes download the image and run it in a container:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: web
  labels:
    app: web
spec:
  replicas: 1
  selector:
    matchLabels:
      app: web
  template:
    metadata:
      labels:
        app: web
    spec:
      containers:
        - name: main
          image: registry.digitalocean.com/<registry_name>/web
          imagePullPolicy: IfNotPresent
          ports:
            - containerPort: 80

Don't worry, it is simpler than it looks:

kind: Deployment says that we're defining a Deployment resource. This is a practical resource for deploying pods. You could define Pod objects directly but they wouldn't survive events such as node failures. A Deployment combines Pod objects with a ReplicaSet to ensure your pods remain in the desired state.
metadata: ... defines a name and a label for the deployment. It will be used in other YAML files to refer to this one.
replicas: 1 sets to one the number of pods that should be running. If anything causes the pod to die, it will be recreated so that there is always one pod running.
selector: ... designates the pods we want to run.
template: ... specifies how the pods should be created. This includes the image the container should be instantiated from and what network port to expose (80 here).

You will find a lot more details in the Kubernetes documentation for deployments.

This is not the only way to deploy containers but it's simple enough for our test.

Once you've run kubectl apply, you should be able to see your pod in the list:

$ kubectl get pods
NAME                   READY   STATUS    RESTARTS   AGE
web-58746c97cd-v96kb   1/1     Running   0          5d7h

You can try to change the number of replicas in the configuration, run kubectl apply again, and see what happens to the list of pods.

The pod isn't accessible from outside the cluster but you can access it from your computer with port forwarding:

$ kubectl port-forward web-58746c97cd-v96kb 8000:80
Forwarding from 127.0.0.1:8000 -> 80
Forwarding from [::1]:8000 -> 80

Then, visit http://localhost:8000 with your browser and you should see your website.

So, that wasn't so difficult, was it? You'd better get used to all this YAML because a lot more is to come.

Note: While kubectl can do everything, I find it useful to also use the Kubernetes Dashboard or k9s for ease of use and better visibility on the pods. For instance, they make it quite easy to inspect the logs of containers when you don't know their names.

Exposing the container to the outside: Basics

The Service

There are many ways to expose a pod to outside the cluster, but a common first step is to define a service:

apiVersion: v1
kind: Service
metadata:
  name: web
spec:
  type: ClusterIP
  selector:
    app: web
  ports:
    - protocol: TCP
      port: 80
      targetPort: 80

You can see that we're reusing the app: web selector defined before. This configuration will make the website available to other pods inside the cluster under the name "web". That's nice because your pods could change names or IP addresses, and services abstract away those details. The service also acts as a load balancer internally.

You can see your services with kubectl get services.

The service is internal because of the type: ClusterIP attribute. The official documentation shows two other types that will enable you to expose it to the outside:

NodePort exposes the service directly on the node, on a port higher than 30000 (chosen or random).
LoadBalancer tells the cloud provider (DigitalOcean in our case) to use a load balancer to redirect to the service. You can choose any port.

NodePort isn't suited to our use case on its own:

It requires the node to have a public IP address.
You can't easily listen on ports 80 and 443 (port range constraints and potential port conflicts).
The service will change IP addresses if the pod if rescheduled on a different node.

In summary, you could probably make it work for your blog but it wouldn't be easy.

The Load Balancer

The LoadBalancer option seems more promising. It costs an additional $10 per month but doesn't have the problems of the NodePort.

Actually, we'll need to use a fancy Kubernetes mechanism to make this work but before that I'll explain why DigitalOcean's load balancer alone is not enough:

It requires a few DigitalOcean-specific annotations to enable TLS.
Certificate management cannot easily be automated unless you let DigitalOcean manage your DNS.

Before we leave that section, note that you can determine the IP address associated with your load balancer with kubectl get services in the EXTERNAL-IP column.

Exposing the container to the outside: Advanced

The Ingress

An Ingress is a Kubernetes resource that abstracts away several reverse proxy features like load balancing and HTTP routing. This feature is still in beta but it's already quite popular. Unfortunately, this is also where things become complicated, so you are likely to encounter some bumps before you get it to work.

There are two components to configure: the Ingress resource and the Ingress Controller. In this subsection, we'll focus on the Ingress resource.

Before we add an ingress, ensure that the type of your service is ClusterIP (in case you had changed it to LoadBalancer or anything else).

Now, let's add the following resource:

apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
  name: ingress-web
  annotations:
    kubernetes.io/ingress.class: "nginx"
    cert-manager.io/issuer: "letsencrypt-staging"
spec:
  rules:
    - host: <domain_name>
      http:
        paths:
          - path: /
            backend:
              serviceName: web
              servicePort: 80
  tls:
    - hosts:
        - <domain_name>
      secretName: <secret_name>

There are quite a few new things here:

The annotation kubernetes.io/ingress.class: "nginx" tells the ingress controller to check this ingress resource. But it does nothing for now since we haven't installed a controller yet.
The annotation cert-manager.io/issuer: "letsencrypt-staging" does the same thing for the certificate manager, which I will cover in a later section.
The rule starting with host: <domain_name> describes a route from a domain name to a service, which is the service web listening on port 80 we created earlier. You can have several rules like that, which is probably the main feature of ingress resources.
The tls section enables HTTPS. It defines the secret to use for storing the certificate (and its private key) and to what domains that secret applies.

The Ingress Controller

The controller is a set of Kubernetes resources that will pick up our ingress declaration from the previous section and expose it accordingly.

I used the NGINX Ingress Controller but other ingress controllers exist.

To install it, I recommend following the installation guide. For me, it consisted in applying DigitalOcean-specific YAML file with kubectl.

At this point, your website should be accessible via HTTPS and present a self-signed certificate.

The Certificate Manager

To make the website accessible by anyone, we want a valid certificate. The certificate needs to be obtained from a certificate authority and then provided to NGINX. It will then need to be renewed regularly. Cert-manager can do all that automatically.

As with the ingress controller, you'll need to install the component by downloading and applying a YAML file. See the installation instructions for more details.

Next, you'll need to configure it. Below is an Issuer resource configured to use the staging instance of Let's Encrypt:

apiVersion: cert-manager.io/v1alpha2
kind: Issuer
metadata:
  name: letsencrypt-staging
spec:
  acme:
    server: https://acme-staging-v02.api.letsencrypt.org/directory
    email: <user_email_address>
    privateKeySecretRef:
      name: letsencrypt-staging
    solvers:
    - http01:
        ingress:
          class: nginx

I suggest the use of staging here because it will make it possible for you to ensure that the configuration is correct before switching to production, which can get you banned you if you do too many requests.

There's a tweak you'll need on DigitalOcean. Update the annotations of the NGINX Ingress service in the YAML file you downloaded in the previous section so that they look like the following:

apiVersion: v1
kind: Service
metadata:
  annotations:
    service.beta.kubernetes.io/do-loadbalancer-enable-proxy-protocol: 'true'
    service.beta.kubernetes.io/do-loadbalancer-hostname: '<domain>'

The do-loadbalancer-hostname annotation makes it possible for your containers to access services via the load balancer. This is needed because cert-manager performs a local test before contacting Let's Encrypt, and that test will fail if you don't configure the load balancer that way.

It turns out to be a simple fix but it took me a lot of time to figure out the first time. See DigitalOcean's internal documentation for more information.

The website should soon get a valid certificate and be ready for visitors. Congratulations if you've gone this far; it was probably not easy.

Conclusion

So, what's the point of all this if we could have achieved the same with a simple VPS and few scripts?

Well, first, it's nice to learn new technologies on a simple example.

Second, the same technology and techniques would apply to a dynamic website. You could even follow the same steps as in this article if your dynamic website is simple enough. Then, you would be able to adapt the Kubernetes configuration as your website grows in size and complexity.

This power comes at a price, however. The learning curve is steep and there is so much more to cover before your website becomes really reliable. For instance:

How to upgrade the Kubernetes cluster (preferably without downtime)?
How to upgrade NGINX Ingress and cert-manager? Maybe manage them with Helm?
How to debug problems when they arise? It's easy to make a mistake in the configuration and break the website.
How to ensure that the cluster and your pods are secure?

Many of these questions would not come up with a more traditional approach to infrastructure because we are more used to that. But for Kubernetes, there are still grey areas, at least for me.

I hope this article made Kubernetes less obscure to you and that it gave you a better idea of whether you need it or can afford it.