About the Customer
The customer is one of the largest financial analytics organizations in the world. They have lines of business in financial intelligence and analytics tools, software services, and consulting.
Key Challenge / Problem Statement
The customer wants to replace their on-premises, diverse CI/CD pipelines with cloud-native, standardized CI/CD pipeline patterns to reduce variation in tooling and allow their teams and tools to scale better as the business grows. These pipelines will eventually be available to application teams via a self-service mechanism like AWS Service Catalog, so they must be easily deliverable via such a solution. Given the opportunity to innovate as they transform their DevOps pipelines, the customer also wants their new pipelines to deploy only immutable infrastructure using a blue/green strategy, in contrast to their current deployment processes, which often update long-lived resources in place.
State of Customer’s Business Prior to Engagement
The customer’s DevOps practices have so far grown organically and now encompass several on-premises tools that have not moved to the cloud at the same rate as the applications and infrastructure they support. The customer’s DevOps pipelines support several dozen applications across various lines of business, runtimes, and architectures. Some pipelines are built with Bamboo, some with Jenkins, and some target immutable infrastructure on ECS and Lambda, while still others use Puppet configuration management to configure long-lived EC2 servers. Monitoring and notification for the pipelines are similarly fractured – different monitoring and notification tools are used for Bamboo and Jenkins, for instance – so standardizing and centralizing these components is also a priority.
Proposed Solution & Architecture
The customer’s core request was that Vertical Relevance (VR) simplify and standardize the existing pipelines, so the VR team began by surveying the current applications and pipelines and distilling these into three paradigmatic use cases targeting three distinct platforms:
- AWS ECS
- AWS Lambda
- Windows EC2s configured with Puppet
The recommendation was to build three “baseline” pipelines to cover these three major use cases, and to make them to be as alike as possible without sacrificing flexibility; this would help meet the customer’s need for standardization while also allowing easy adoption by teams with different needs.
The pipelines were built using AWS native services; CodePipeline for pipeline orchestration, automation, and notifications, CodeBuild to allow for multiple build runtimes and other kinds of variation between applications, and CodeDeploy to facilitate blue/green deployment for each type of target infrastructure. CloudWatch would capture logs from each step in the pipeline as well as the infrastructure and applications.
The broad steps in each pipeline were essentially the same:
- The pipeline is triggered when a change is pushed to the “develop” branch of the GitHub repository.
- CodePipeline clones the “develop” branch of the GitHub repository, storing the clone in an S3 bucket.
- The pipeline initiates the CodeBuild project associated with the application type. The name of this project is different depending on which pipeline is being run.
- CodeBuild uses the buildspec.yml file from the application repository to build the application. Teams customize this file to suit their build (and testing) needs. This stage must publish some artifact(s), including any container image, application binaries, etc. that downstream steps, including CodeDeploy, depend on. These artifacts differ per pipeline type.
- When the CodeBuild step completes, the built application artifacts are passed back to the pipeline for deployment by the later CodeDeploy step, either via CodePipeline’s artifacts S3 bucket, or via a reference to a Lambda source code object in S3, or a reference to artifacts in the Artifactory artifact store, depending on the type of pipeline.
- A CloudFormation deployment action is initiated in CodePipeline to create the application’s infrastructure resources.
- The artifacts communicated to CodePipeline in step (5) are deployed by CodeDeploy in a blue/green style.
The concept of Infrastructure as Code (IaC) for infrastructure management and version control were both core recommendations throughout each part of the engagement. IAC promotes reusability, auditability, and extensibility for cloud solutions. Version control enables a single source of truth for code and easy collaboration amongst different teams.
Blue/Green Deployments to Windows EC2s
The Windows EC2 pipeline seems relatively simple on the surface, not differing much from a typical blue/green EC2-targeted deployment using the CodeDeploy agent, but this engagement presented a few unique challenges due to constraints around the new (green group) EC2 bootstrapping process and the time taken to provision Windows EC2 instances.
After both the CodeBuild and CloudFormation steps were complete, a Lambda function would run to update the deployment group in the development account to set blue/green as the deployment strategy. This was necessary because CloudFormation did not directly support creation of a deployment group with blue/green as the deployment strategy for EC2 deployments. Additionally, logging and monitoring configuration in the target account that accompanied resource configuration was setup. Each new instance had a configured Windows CloudWatch agent that sent logs to a standardized location that was then forwarded to a centralized security monitoring and auditing account owned by the customer.
After CloudFormation created the target resources, Windows bootstrapping began. The customer had golden Windows AMIs from which to build EC2s, but some configuration was still needed before an application team could use the AMI for their deployments. Ideally the final required AMI would be built continuously from the base AMIs, but building such a pipeline was out of scope for this engagement. The customer and VR agreed to utilize the customer’s extensive Puppet repositories to finish configuration of EC2s once they were launched from the latest base AMI by CloudFormation.
The customer’s legacy on-premises Bamboo deployments had targeted long-lived EC2 instances that were continuously configured/re-configured with Puppet, but this did not follow the immutable infrastructure pattern desired by the customer. Instead, existing Puppet code was utilized to bootstrap and automatically configure new EC2 instances, opening the door to immutable infrastructure (and thus, blue/green deployment) while reusing as much code as possible. Once this was achieved, teams could be sure that their applications would be deployed to fresh EC2 instances having no drift or lingering artifacts from prior deployments – issues that had caused the customer difficulties in the past.
Timing Puppet and CodeDeploy actions correctly was crucial; otherwise, race conditions could cause deployments to fail intermittently. If CodeDeploy finished before Puppet, the application could be missing configuration that it needed to behave properly. CodeDeploy pre-deploy (
beforeInstall) and post-deploy (
afterInstall) hooks were used to manage these temporal dependencies. The pre-deploy hook would first join Windows machines to the correct Active Directory domain using a unique hostname generated from the application name and EC2 instance metadata. It would then mount the required storage drives. Once CodeDeploy had obtained the deployment artifact from Artifactory and placed the application files in the correct locations on the server, the post-deploy hook would wait for Puppet configuration to complete and start the IIS service hosting the application.
Once the load balancer determined the new instances were healthy, the existing instances would be removed from the rotation, which completed the shift from the old (blue) to new (green) application and infrastructure. Next, the number of instances in the new (green) target group were scaled out according to IIS request rates collected via CloudWatch metrics. Instances in the new (green) group were placed in a different AutoScaling group from those in the old (blue) group in order to facilitate reasonable scaling in/out for each target group before and after the deployment process. The old (blue) group would be scaled down naturally and kept in a lukewarm state as long as it remained out of rotation, allowing a quick rollback if needed without consuming expensive additional resources to keep a production-ready number of instances running.
One additional challenge of this pipeline type was its initially lengthy execution time. Provisioning Windows EC2 instances proved to be the most time-consuming step, so a recommendation was to run the CodeBuild and CloudFormation steps in parallel to reduce pipeline execution time. One positive consequence of taking this strategy is that failed application builds and failed infrastructure creations could interrupt another. This could cause wasted work (for instance, a successful application build could be discarded due to failed infrastructure creation), but the benefits outweighed these costs. Taking this parallel approach reduced execution time over the serial approach especially in the situation where infrastructure creation failed following a successful execution build. In the parallel scenario, the failed infrastructure creation status would be reported much more quickly to DevOps personnel.
Blue/Green Deployments to ECS
Implementing the Windows EC2 pipeline type presented challenges when creating new instances, but since containers are inherently immutable infrastructure and can usually be provisioned in less than a second, the blue/green step was naturally simpler and faster.
For ECS, the CloudFormation
AWS::CodeDeploy::BlueGreen transform handled the transition from the blue to green task definition, on stack updates. The CodeBuild step of the pipeline would build and push an application container image to Artifactory and a reference (Artifactory URI) for this image would be shared with the controlling CodePipeline pipeline, which then passed this reference to the CloudFormation stack update stage. The
AWS::CodeDeploy::BlueGreen transform then automatically handled the commissioning of new (green) resources and decommissioning of old (blue) resources, including the ELB target groups, ECS tasks, and ECS task definitions. The below diagram illustrates this pipeline flow.
Blue/Green Deployments to Lambda
The first-class support for blue/green Lambda deployments for the Lambda pipeline was leveraged, which kept this pipeline simpler than even the ECS pipeline. One obvious benefit of this pipeline type is the ease with which pre- and post-cutover checks could be performed via built-in CodeDeploy hooks. These hooks are represented in the following diagram via the arrows labeled “Invoke before traffic shift” and “Invoke after traffic shift.”
These pre- and post-traffic hooks were expected to be stored in the application repository alongside Lambda function code. If the hooks’ handler names followed the convention outlined by the associated CodeDeploy service role, they would be detected and used to automatically judge the safety of the traffic cutover. The major advantage of this conventional approach is that development teams can define the pre- and post-conditions required for safe deployment of their application depending on their unique needs.
The pipelines that were created were all defined in CloudFormation and were stored in service catalog so that the different teams with use cases for ECS, Lambda, and Ec2 pipelines could consume these as needed through self-service.
AWS Services Used
- AWS Infrastructure Scripting – CloudFormation
- AWS Storage Services – S3
- AWS Compute Services – EC2, ECS, Lambda
- AWS Management and Governance Services – CloudWatch, CloudTrail
- AWS Security, Identity, Compliance Services – IAM, Key Management Service
- AWS Developer Tools – CodePipeline, CodeBuild, CodeDeploy
- AWS Networking – Elastic Load Balancing
Third-party applications or solutions used
- JFrog Artifactory
- Microsoft Windows
- Provided standardized self-service CI/CD pipelines across their enterprise without ignoring the differing needs of their many application teams.
- The customer’s CI/CD infrastructure can now move to the AWS cloud alongside the infrastructure it supports, reducing heterogeneity in tools and required skillsets, which increased scalability of the customer’s CI/CD practices and DevOps teams.
- Three archetypical pipelines created to support several dozen applications.
By engaging with Vertical Relevance, the customer moved closer to providing standardized self-service CI/CD pipelines across their enterprise without ignoring the differing needs of their many application teams. Furthermore, the customer’s CI/CD infrastructure can now move to the AWS cloud alongside the infrastructure it supports, reducing heterogeneity in tools and required skillsets, which will pay off in the form of increased scalability of the customer’s CI/CD practices and DevOps teams.