In this article, I will show you a solution for zero-downtime deployment in Azure Kubernetes Service. To add a context for it, first, we are going through some Deployment strategies. Then, I will choose the one that fits our needs. Some of them are supported by Kubernetes natively, some are not (yet). Next, I will outline a System overview by showing you the necessary Kubernetes objects in our AKS. The following part of my article presents our Azure DevOps deployment pipeline to you and briefly goes through the scripts and other settings that do the main thing: zero-downtime deployment. Finally, I am going to Wrap up the things.
Many deployment strategies can help you to deploy your application to production or any other environment. They all have their usage scenarios along with their benefits and drawbacks. As a pre-condition, consider that you might have multiple running instances of your application that needs to be deployed.
Let us see some of them:
Recreate: Firstly, every instance with the old version is removed then the instances of the new version are rolled out.
We use this technique while we are developing, and the downtime does not matter.
Ramped: The central concept of this strategy is to replace the instances of the old version with the new version of the instances one by one.
There is no doubt that the main gain of this solution is that there is no downtime. In contrast, some severe cons like time-consuming rollout and rollback, no influence on traffic lead you to version problems.
Blue/Green: The new version instances are deployed to the destination environment while the traffic is routed to the old instances. The traffic is switched to the instances with the latest version. Lastly, the old instances are deleted.
Now we have won the following three: no downtime, fast rollout/rollback, control over traffic – no version problems. The downside of this technique is the price of having both the old and new instances of the application at the same time. We can use this approach in a production environment.
· Not supported by Kubernetes services out of the box.
Canary: With this strategy, you will also have both the old and new instances alongside. However, the switch of the traffic from the old instances to the new ones is different. In this solution, only weighted traffic is switched to the new instances. After some iterations, you will send the whole traffic to the new instances, and the old versions can be terminated. As an outcome, the users are testing new releases.
The pros here are fast rollback, measurable performance, and failures, more control over traffic. The cons: slow rollout, can be expensive, there is no control over traffic on the level of users. By using an ingress controller like NGINX, the weighted traffic can be routed way more precisely and cost-effectively. This approach can be used in a production environment as well.
Not supported by Kubernetes services out of the box.
A/B testing: Pretty much the same as the canary. The difference is that instead of using a weight for traffic switching, you can use a so-called canary cookie or header. You can highly accurately specify the subset of users who are routed to the new instances. As you might know, A/B testing is originally a technique for making business decisions by rolling out the version that converts the most.
A remarkable benefit over the weighted canary is the complete control over the traffic. The drawback is still the slow rollout and using a Layer-7 load balancer like NGINX. The strategy can be very useful in production environments.
Not supported by Kubernetes services out of the box
Shadow: New instances are deployed along with the old instances. After rollout, traffic is routed both to the old and new versions. One can mirror traffic, e.g., with the help of NGINX.
With this strategy, performance tests with full production traffic can be made quickly; furthermore, there is no impact on the users. On the other hand, it is expensive since we are doubling the required resources.
You can read more about these deployment strategies at https://thenewstack.io/deployment-strategies/.
Now that we have seen some exciting deployment strategies choosing the right one for our needs is time. We had needed a deployment strategy that satisfies the following requirements:
The prerequisites above implicate that we will need to mix the Blue/Green and the A/B testing strategies to fit our needs.
Now let me show you an overview of the relevant components of our system. The figure below shows the infrastructure requirements regarding the chosen deployment strategies. Each component has a Helm chart, which represents the used Kubernetes objects as code, and actually, it is the unit of installation. Note that these codes are simplified to get familiar with the central concept more easily.
To introduce the Blue/Green strategy, the system needs to handle so-called slots for blue and green versions.
In the solution above, slots are represented by Kubernetes Deployment objects. In a nutshell, the deployment object is responsible for the pods containing a version of our component. A helm chart template for a component deployment object should look something like this:
The highlighted part of the code is essential because the deployed pod objects related to a component and a slot can be selected clearly by the app and the slot labels.
Furthermore, to satisfy the A/B testing strategy, alternatively, you want to make the system able to operate two different versions of a component from another point of view. Therefore, at the same time, we need to double Kubernetes Service objects as well.
The primary role of a service object is to expose a set of pods as a network service. In the figure above, there is a service for the component that points to a set of pods whose slot label is set to blue, and they are running the current version of the application component. Additionally, a canary service for the component points to a set of pods that slot label is set to green, and they are running the next version of the application component.
The only difference between the template of the service and that of the canary service object is that the canary service template additionally contains the highlighted parts.
As a consequence, a Kubernetes Ingress object for both of the services needs to be created. An ingress object exposes HTTP and HTTPS routes from outside the cluster to services within the cluster. Traffic routing is controlled by rules defined on the ingress object.
The only difference between the ingress and the canary ingress objects template is that the canary ingress template contains the highlighted parts. Service and ingress object templates are packed in the same helm chart. The default values file content is the following:
Obviously, we use the same host for both services; however, there are some canary-related annotations (rules) on the canary ingress object
Two deployments of the same component are hosted parallelly only if a new version just has been rolled out. In this situation, we would like to check whether the new version of the component is fully functioning and/or we would like to do some warmup for the new version. Therefore, with the help of the canary annotations, we can do it easily by browsing the same URL along with an additional header - added to each request - named canary with stage value. This can be achieved by using plugins like this in your favorite browser.
In summary, the ingress helm chart is responsible for installing the service and ingress objects. When installing or updating the chart that mostly happens after rolling out a new component's deployment, the service selector labels must be provided (app and slot).
The last but significant part of the system is an Ingress controller. To satisfy an ingress object, you must have a running Ingress controller. As you could see clues in ingress templates, you cannot be surprised that we use NGINX Ingress Controller in our solution.
Deployment of the NGINX ingress controller with Helm can be done with the following few commands:
As a next step, we wanted to automate the rollout of our component to AKS. For this purpose, we have defined a Release Pipeline in Azure DevOps. Our release pipeline has exactly one stage called Deploy to AKS, which consists of several jobs.
There are some variables with Release scope defined:
Now let us go into the details of the steps mentioned above.
The first task of the job is the Pull charts from the ACR PowerShell task. As its name describes, it pulls the component chart and the ingress chart from the ACR with the correct version. Besides pulling them, it also exports them to a local folder on the build agent; therefore, it can be installed with helm. The version is extracted from some build artifacts. It is important to mention to set HELM_EXPERIMENTAL_OCI environment variable with the value of 1, as helm requires it for the used commands.
The second task, named Helm Login is a Package and deploy Helm charts task, which is part of the default Azure DevOps tasks. It logs into the Kubernetes cluster.
The last one is the Deploy chart and updates ingress PowerShell task with the following script:
This task does the deployment of the component chart and the ingress chart exported to local path before. As you might have noticed, there are two PowerShell scripts which run in this task get-whitelist.ps1 and deploy-bg.ps1.The first is reading whitelist from a json artifact file and converts it to the proper format. The second script is the following:
First of all, this script checks helm releases and whether the component has already been deployed to the blue or the green slot. If there are no deployments yet, we will use the blue slot for the new version. Otherwise, we will use the opposite slot of the one already used. Secondly, we set the whitelist and the firewall mode in the values.YAML file of the ingress helm chart. The upcoming command will upgrade or install the ingress helm chart with the parameters given. Lastly, we will run the installation of the component chart with the proper parameters.
Now let us return to the last row of the Deploy job's Deploy chart and update ingress PowerShell task. As you can observe, we write the slot name used for the deployment into a file on the build agent machine. This is necessary considering that we need to pass this information to other jobs.
This job, unlike the others, is an agentless job with only one Manual intervention task added from the common Azure DevOps task list. At this step, the deployment process is paused while the user decides whether to reject or resume it. Reject means the task fails, whereas resume results in the task succeeding, thus the job as well.
This job runs only when a previous job has failed, this needs to be set on the job. The first task of the job is the Pull ingress chart from ACRPowerShell task, which is very similar to the first task of the Deploy job. It does almost the same but only pulls and exports the ingress chart. The HELM_EXPERIMENTAL_OCI environment variable needs to be set as well.
The second task, named Helm Login is also the same as in the Deploy job. It logs in to the Kubernetes cluster.
The last one is the Cleanup deployment PowerShell task with the following script:
The last one is the Cleanup deployment PowerShell task with the following script:
The task does the cleanup of the current deployment and updates the ingress chart exported to a local path before. Two PowerShell scripts here correspondingly run in this task get-whitelist.ps1 and delete-bg.ps1. You are already familiar with the first one from the deploy job. The second script is the following:
Firstly, the script determines if the blue or the green slot needs to be restored both for the service and the canary service. Secondly, as in the deploy-bg.ps1, we set the whitelist and the firewall mode in the values.YAML file of the ingress helm chart. Next, we upgrade or install the ingress helm chart with the proper parameters. The last command will uninstall the newly deployed helm chart release, if any.
We have reached the part where the deployment process finalizes the deployment by swapping the slots and removing the old version of the component.
The first task of the job is the Pull ingress chart from ACR PowerShell task; the second is the Helm login task. We will not go into detail with these two; hence they are the same as in the Cleanup deployment job.
The third is the Swap slots and remove unused slot's deployment PowerShell task with the following script:
Similarly, we have two PowerShell scripts in this task get-whitelist.ps1 and swap-bg.ps1.The swap-bg.ps1 script is the following:
As a first step, the script decides whether the blue or the green slot needs to be removed. Secondly, as in the deploy-bg.ps1, we set the whitelist and the firewall mode in the values.yaml file of the ingress helm chart. Next, we upgrade or install the ingress helm chart with the proper parameters and set both the service and the canary service selector labels to the new deployments slot. The last step uninstalls the previously deployed helm chart release if any.
A deployment process should look like this:
In the beginning, I showed you a solution for zero-downtime deployment with Kubernetes in the Azure cloud. It is essential to know your requirements, think over which technique you might need to use from industry-standard Deployment strategies. Remember, there is no silver bullet, only different techniques under different circumstances. You can even mix them to fit your needs, like we did.
While we were going through the System overview, we defined our main Kubernetes objects like ingress controller, ingress-, service-, and deployment objects we intended to use in our solution. We also prepared our custom helm charts to encapsulate objects within, make them reusable and easier to release to the cluster.
Lastly, we created an Azure DevOps deployment pipeline, a deployment process consisting of:
All the presented source codes are available for download from the following link.