Our work with legacy code doesn’t often put us in a position to move quickly into new or trendy tooling. And while we almost always introduce Docker very early in our projects, it is usually only for the purpose of standardizing and easing setup of developer environments. Transitioning a live environment to containers, however, can be a daunting prospect. There are a variety of reasons for that, many of which you’ve probably encountered yourself. I’ll list a few:
- The application isn’t in the cloud yet
- It’s too complicated
- Container orchestration (like Kubernetes or Swarm) is too new/buggy/insecure
- We need microservices to leverage Kubernetes
- The application is a monolith
All these might be valid reasons, but this article will focus on our experiences in that last scenario — containerizing a monolith. For a more detailed exploration of the pros and cons of containerization, check out this article.
Most, if not all, of the legacy projects we work on feature monolithic application architectures. However, that’s not necessarily a reason to forego the benefits of containerization. Moving to a container can result in:
- Consistency from local development all the way through production — no more “it works on my machine!”
- Redundancy and, to a degree, scaling are built-in
- Implementations abound — on-premise, in the cloud, and multi/hybrid cloud
- No-touch deployments
- Better support for decomposing your monolith
Those benefits can be extremely valuable, chiefly by better supporting your ability to get modernized code safely and reliably delivered. (We’ll deal with some of the downsides later on in the post.)
It can be daunting to containerize an existing application. There are usually a host of archaisms to account for as compared to a greenfield application: one-off cron jobs, file system dependencies, non-sourced helper scripts, and many more.
For the purposes of this post, we’ll use examples from a recent project we worked on. It is a medium-sized PHP project, initially deployed on Red Hat Enterprise Linux virtual machines (not in the cloud), with a SQL Server backend. Here’s a look at the architecture before we started the project. It is fairly typical for our projects — lots of manually configured servers that require fairly involved deployment procedures.
Our objective was to run the application in Azure’s Kubernetes Service while leveraging best practices for managed services (e.g. Azure’s SQL Database offering), networking, security, and scalability.
Just like when remodeling code, we recommend an iterative approach. In the early stages, you’ll be improving your application before you ever get to Kubernetes. However, these important early steps will safely lay the groundwork for getting your workload ready for containers in production.
Step 1: Start using Docker for local development
Even if you’re not sure about containerization in a production context, containerizing your local development environments using Docker will pay huge dividends by:
- Maintaining a consistent development environment for all developers
- Making onboarding new developers a matter of checking out the code and running a simple script to build and bring up development containers
- Providing living documentation for how to run your application
There are other benefits you’ll realize down the road, but these alone should be reason enough to begin using Docker for local development. For this project, we introduced Docker a few months before beginning our Kubernetes journey. This gave us time to work out some of the kinks (SQL Server drivers running on CentOS!) in our images and assist the development team with getting set up.
Step 2: Prepare your application code by adopting Twelve-Factor App principles
For web applications, we generally guide our projects towards the adoption of the Twelve-Factor App methodology. These are a set of principles initially developed at Heroku, and are generally seen as a solid baseline for software as a service applications that are delivered over the web. They’re independent of language, tech stack, or server architecture, but address the problems that many codebases face.
Whether you deploy to containers or not, implementing these principles will greatly enhance the maintainability, reliability, and security of your application. However, there are a few of the factors that will need particular focus as you prepare to containerize:
Explicitly declare and isolate dependencies via a package manager If you’re not already using a package manager, this should be step 1. It’s not something that will absolutely prevent you from leveraging Kubernetes, but it’s a practice that will help you maintain and secure your application in a more automated fashion. As you proceed down the path of containerization, this will give you the ability to apply and test these updates far more quickly than you can if you manually manage dependencies.
Store configurations in the environment Many legacy applications store configurations in config files. This can lead to difficulty managing configurations across servers, manual steps in your deployment, and potential security vulnerabilities. Anything that is likely to vary between deploys, and certainly any configurations that are sensitive like database credentials or external service hostnames, should be stored in an environment variable. Completing this in your existing infrastructure will make moving to containers much easier.
Make your application stateless Any persistent data your application depends on should be stored in a backing service. You’re almost certainly already doing this for your database, you should do the same for file storage, logs, and cache.
A couple of other Twelve-Factor principles are more implicit in Docker/Kubernetes, so you get these for free:
Disposability Your “server” (i.e. container) should be able to be started or stopped at a moment’s notice. Kubernetes treats pods and containers as ephemeral resources that can be turned on or off without impacting the overall application
Dev/prod parity How many times have you come across a bug that you can only reproduce in production, perhaps because it uses a slightly different version of an open source package? With Kubernetes, your stack is the same all the way up to production.
Step 3: Build a proof of concept
Kubernetes can be complicated and is difficult to learn in one sitting. For that reason, we’d recommend starting off by building a proof of concept. This can provide valuable learnings, while also giving you time to vet the implementation you’ll be using.
Learn the concepts The Kubernetes web site is a wealth of information and will help you get a handle on all the vocabulary (of which there is a substantial amount!).
Start with a Hello World Before you try to fit your application into a prototype, start with an off-the-shelf tutorial. In our case, we used an existing Docker container we’d been using locally for our PHP project, and re-built it to host a single
.phppage with a call to
phpinfo(). This helped us verify that all our dependencies and libraries were properly included, which in turn gave us the confidence to proceed with our application code and database. Doing this will also help you become familiar with some of the basic concepts:
- Building and pushing a Docker image
- Constructing Kubernetes manifests and
- Provisioning pods and services
Step 4: Build a CI/CD pipeline
Thankfully, Kubernetes does a great job of managing deployments. It’ll manage retrieving your images, provisioning and de-provisioning pods in your cluster, and managing traffic all the while.
If you don’t have a CI/CD pipeline, now would be the time to begin implementing it. However, instead of building and deploying a build artifact containing your code and dependencies as in a traditional CI/CD pipeline, you’ll build your application into a Docker image and tag and push that to a container registry. You’ll then kick off your deployment using
Step 5: Iterate and add layers
As you become more comfortable with the basics and get a better feel for
kubectl and how to automate your architecture, you’re ready to begin layering in more advanced components. Up until this point, we’ve introduced a substantial amount of complexity to deliver something fairly basic. However, these additional elements will be where container orchestration really starts to shine.
Ingress Controller and Load Balancing The ingress controller manages access to your application’s components inside the cluster. Oftentimes, this will integrate with an external load balancer (e.g. Azure’s Application Gateway or AWS’s ALB).
Auto-Scaling Within a cluster, you’ll want to configure your horizontal pod autoscaler. Outside the cluster, you’ll want a scaling solution that will provide additional capacity. This is not a part of Kubernetes proper, so you’ll need to come up to speed on your provider’s cluster scaling solution. The pod autoscaler provisions pods up to the point where the number of nodes (think servers / VMs) in your cluster is no longer sufficient. From there, the cluster autoscaler takes over and scales your compute capacity to accommodate the new pods that are needed.
Injection of Environment Variables While Kubernetes manifest files allow for management of environment variables, I’d highly recommend using a service-based approach (e.g. Azure Key Vault) for managing and injecting environment variables. On our project, we used Azure Key Vault to Kubernetes. There are also other options that are more vendor agnostic.
Spin-off Microservices Initially, you probably don’t want to go crazy cutting your code up into microservices (you’ll be tempted as soon as you get the hang of Kubernetes). However, there are likely a couple of easier targets you can spin off near the beginning:
- Static assets - these can be built into their own image and served via NGINX (and ideally further optimized via a CDN)
- Move authentication into its own container
Step 6: Secure and test
You’ve come so far, but before you launch, make sure you spend some time securing and testing your new cluster. Kubernetes doesn’t provide a lot of security out of the box, so you’ll need to implement some basics:
Put your cluster in a virtual private cloud and limit access just as you would with a more traditional cloud infrastructure Even though Kubernetes has a built-in load balancer, you should use your cloud provider’s load balancer offering (e.g. Azure’s Application Gateway) to provide access to your cluster. Place your cluster and its associated resources into a virtual private cloud and secure access to it behind a VPN. This will help limit attack surface.
Scan your cluster Aqua Security has an open source vulnerability scanner that you can deploy as a container in your cluster.
Add active threat detection measures Solutions like Azure’s Advanced Threat Protection can help identify and stop threats before they even reach your cluster.
More recommendations can be found in the Kubernetes article on the topic.
Finally, if you have end-to-end or browser tests, now is the time to leverage them on your shiny new Kubernetes stack.
Step 7: Deploy and monitor
Now is the time to reap the rewards of all your hard work! We’d strongly advise setting up telemetry and alerting so that you can monitor the health and operations of your cluster. Your cloud vendor likely has a plug and play option.
Okay, this seems doable - but what are the downsides?
Complexity & Learning Curve The number of concepts and the sheer amount of vocabulary is daunting. Will adding all this complexity pay off? This is an entirely reasonable question and one that may vary by situation. Small applications may not benefit from the investment required to fully containerize. Others may already have solid deployment and scaling solutions.
Reliability and Performance Generally, we’ve found our Kubernetes stack to be reliable and perform well, and (at least in our Azure cloud), cost to be comparable. However, gaining the same reliability and performance as a battle-tested production stack can be an extended exercise.
Tooling There are many new tools to learn. Kubectl and helm are likely to be the ones you’ll use the most, but you’ll likely also need to become familiar with your cloud provider’s CLI tooling, as it’ll often be used to provision the cluster, and configure your cluster autoscaler.
And now for the big reveal! At the end of our monolith containerization journey, here’s a look at where we ended up. As you can see, there is a fair bit more complexity. However, it is much more robust, scalable, configurable, and deployable.
Hopefully, by now, you’ve got a better understanding of the steps involved with moving your monolith to a container architecture like Kubernetes. Although it’s no simple task, it’s also one you shouldn’t avoid just because you have a monolith. In most cases, the benefits will outweigh the downsides, particularly if your eventual goal is to decompose your application.