SRE & Resiliency

Delivering Business Outcomes through Operational Excellence in AWS

Improving Operational Excellence for our customers by applying the principles of DevOps, integrated with Security, and deep collaboration with Application Modernization to reduce risk to the business by decreasing the complexity of managing a Cloud environment while maintaining the highest levels of availability. Financial Service Firms have some of the most rigorous requirements regarding compliance and reliability, they deal with critical data high volume/low latency workloads, and demanding downtime requirements. The Resiliency & SRE addresses these challenges by implementing a series of synergistic frameworks – The Resiliency & SRE Maturity Model. 

Our Approach: The Resiliency & SRE Maturity Model

Resiliency & SRE maturity comes in stages that build on one another to fully optimize the proper business outcomes. It touches the other areas of cloud management including DevOps, SecOps, and Application Modernization. Below shows the stages to fully adopting Vertical Relevance’s SRE Maturity Model. While every organization we work with is unique and presents its own business challenges to solve, a commonality between most of our clients is the requirement on how to have a well-defined, repeatable process for managing and improving their operations in the Cloud. We commonly hear similar questions in the Financial Services Industry such as: 

  • How do I know my application can achieve my RTO and RPO requirements through high availability / disaster recovery?  
  • How can I guarantee the proper replication of my data the prevent impact of failure? 
  • How do I ensure the performance of my application in a cloud-based environment? E.g. Does my application perform at peak loads with a multi-region setup? 
  • Can my migrated applications handle the volume and latency demand needed for my business requirements? 
  • What insights/data do I need to record to ensure the continuity of operations for my ecosystem? 
  • How does my organization respond to failure in the cloud? 
  • How do I automate my resiliency testing? How can I include resiliency in my CI/CD pipeline? 
The SRE Maturity Model

Observability is the foundation of SRE, and needs to be built in an automated manner that supports complex enterprise environments across all environment tiers – network, infrastructure, data, application, etc. This includes observability across accounts and regions that contain dashboards, alerts, and logs. 

Performance and baseline testing is the next step in the Maturity Model, as it is required by the other frameworks. Performance covers the generation of transactional data for the system, identifying the necessary load tests, automation of load tests with the use of a load generator, and collaborating with the application and infrastructure teams to ensure the environment meets the latency, volume, and scaling requirements of the business. 

Resiliency is a framework developed to minimize the risk and impact of failure that can occur in both infrastructure and applications by testing and exercising the system to discover expected and unexpected workloads’ components behavior. This effort involves architecture reviews, identification of critical Non-Functional Requirements (RTO, RPO, etc.), implementation of the Failure Mode Effect and Analysis (FMEA) testing framework, and automated resiliency testing. 

Gameday takes the experience farther than performance and resiliency. While performance and resiliency test the infrastructure and application, Gameday tests the supporting people, processes, tools, and documentation surrounding performance and resiliency events. Gameday, when executed correctly ensures the appropriate target group 1) gets notified, 2) has the right access, 3) can find the right documentation and solution, and 4) knows when to appropriately escalate to ensure that any events that require manual intervention still meet the business requirements. 

Disaster Recovery (DR) is a crucial aspect of ensuring business continuity in the face of unforeseen disruptions. Once an application and its supporting teams and tools have been made resilient, disaster recovery measures can be confidently put in place to meet the business’s requirements. This effort encompasses ensuring that the architecture, runbooks, and operational capabilities are all adequately prepared and as automatable as possible

Chaos is the last step in the Maturity Model. Once the previous pieces are implemented, the enterprise will have enough confidence in their system to enable chaos testing – unplanned events – to ensure that the system can handle actual unplanned events. These events are random in type, severity, and timing.  

Our Solutions

Solution Spotlight: Resiliency Foundations

Regular and robust resiliency testing provides assurances your cloud application can weather whatever outages may occur. The Vertical Relevance Resiliency Automation Framework can help guarantee your workload can prevail through disruptions and failures and prevent the damaging consequences of an outage.

Solution Spotlight: Gameday Foundations

In this solution, we will be breaking down the steps required to perform a successful gameday by testing the people, processes, and technology involved. By executing gamedays, teams can ensure they’re properly documenting system knowledge, testing their procedures, and evaluating their processes in a targeted, controlled environment.

Solution Spotlight: Performance Foundations

The Vertical Relevance Automated Performance Testing Framework lowers the barrier to entry in performance tests by providing a starting point upon which a mature solution can be built to meet the needs of your organization.  By following this guidance, you can gain confidence that your production systems are going to meet the current and future demands of your organization and customers

Solution Spotlight: Monitoring Foundations

The combination of time-to-market acceleration, continuous deployment, and microservices based development has made the identification and resolution of system issues extremely challenging requiring a robust monitoring framework. In this solution, learn how to implement a monitoring system to lowercosts, mitigate risk, and provide an optimal end-user experience.

Key Outcomes

Continuously Improving Operational Excellence
Improve the processes, technology, infrastructure, and applications to reduce the operational burden of workloads operating in the cloud.
Gain Confidence in the Production Readiness of your Applications
Certify the Business Requirements (RTO, RPO, Latency, etc.) of your applications through testing to ensure they are ready for production, eliminating go-live failures.
Remove Negative Impact by Reducing the Risk of Failure
Avoid impact to the company brand and regulatory infractions through comprehensive testing at all levels of an application.
Gain Confidence Finding Unknown Risks
Running comprehensive failure tests uncovers many unknown responses to a system that can be remediated, allowing for increased confidence in the system’s ability to deal with failure.

Thought Leadership

Press Release
Vertical Relevance Achieves the AWS Resilience Competency

Vertical Relevance, an Amazon Web Services (AWS) Advanced Tier Services Partner, announced today that it has achieved the AWS Resilience Competency in the Disaster Recovery category. This specialization recognizes Vertical Relevance as an AWS Partner that provides validated solutions to help customers improve their critical systems availability and resilience posture using AWS Resilience Services.

Module
Experiment Broker – Rapidly deploy and execute experiment tests in your AWS environment

In this post, learn how you can provide infrastructure to implement automated resiliency experiments via code to achieve standardized resiliency testing at scale across your whole organization.

Module
Experiment Generator – A systematic, centralized solution to generate resiliency experiments

Vertical Relevance has developed the Experiment Generator solution, an automated and systematic approach to designing and generating experiments. The Experiment Generator provides a centralized, clearly defined process for experiment creation with reusability and ease of use in mind. We take a modular approach such that once an experiment is designed, it can be reused across different application teams; each application team can access the full scope of experiments in the generator.

Event
Achieve Operational Resiliency hosted by AWS and Vertical Relevance

Vertical Relevance and AWS hosted an Operational Resiliency event for executives and professionals in the financial services industry. The event was designed to provide new strategies to protect mission-critical applications, uncover potential weaknesses, and meet regulatory requirements. The event discussed the changing nature of threats and disruptions and showcased the best methods to prepare for and respond to them. It also addressed the changing expectations of regulators to help operational resilience professionals prioritize their efforts.

Case Study
Panel Discussion – Operational Resiliency Strategies to Remain Resilient in Current Market: Featuring Global Payments

AWS and Vertical Relevance hosted an Operational Resiliency event for executives and professionals in the financial services industry. In this panel discussion hear from Krishna Sarvepalli, Global Payments, Product Engineering, on the following aspects:

– The systematic approach they have adopted to address and fortify operational resilience leveraging the capabilities of AWS.
– The significant challenges they encountered during their journey towards operational resiliency.
– The collaborative partnership they forged with Vertical Relevance and AWS to effectively align and achieve their resilience objectives

Case Study
Panel Discussion – Operational Resiliency Strategies to Remain Resilient in Current Market: Featuring Broadridge

AWS and Vertical Relevance hosted an Operational Resiliency event for executives and professionals in the financial services industry. In this panel discussion hear from Deepak Elias, Broadridge, VP Enterprise Architecture on the following aspects:

– The systematic approach they have adopted to address and fortify operational resilience leveraging the capabilities of AWS.
– The significant challenges they encountered during their journey towards operational resiliency.
– The collaborative partnership they forged with Vertical Relevance and AWS to effectively align and achieve their resilience objectives

Offer
Achieve Operational Resiliency on AWS For Financial Services

The finance sector is in a constant state of change and must confront ongoing challenges. Resilience and adherence to regulations are crucial within these institutions, as they hold considerable financial implications. Vertical Relevance and AWS Professional Services have extensive experience in helping financial services institutions achieve operational resiliency on AWS. Together, AWS Professional Services and Vertical Relevance guide their customers through this Resiliency Journey, allowing customers at different stages of resiliency maturity to join at different points.

Use Case
Use Case: Ensuring Application and Environment Resiliency through the Failure Mode Effect and Analysis Framework

In this use case learn how a leading payment technology company leveraged a Resiliency Automation Framework to execute test cases to improve the architecture of applications being taken to the cloud. As a result, the customer can now operate at scale with the full knowledge of how their system works in the event of a failure. 

Blog
Automating Disaster Recovery on AWS for Financial Services

The term Disaster Recovery (DR) is enough to keep both engineers and executives up at night. Any event that can have a negative impact on your business continuity could be characterized as an adverse event.

Blog
Resiliency on AWS for Financial Services – Introduction to the Testing Framework

For financial services organizations looking to move their applications into AWS, not knowing the true resiliency of those applications, and the infrastructure behind them presents a great risk. Businesses need to have a reliable testing strategy framework in place that regularly tests the resiliency of their AWS infrastructure.

Offer
Ensure Business Continuity and Resiliency for Core Financial Services Applications

The Financial Services industry is one of the most critical and heavily regulated industries, requiring resilient applications to serve businesses and consumers across the globe. To achieve assurance about resiliency of applications and overall workflow, Vertical Relevance’s Resiliency offering conducts comprehensive architecture reviews and testing.

Drive Financial Services Innovation

Financial Services institutions want to become more agile so they can innovate and respond to changes faster to better serve customers. Without speed, institutions begin to lose momentum which is why Vertical Relevance has developed tools and resources to accelerate your digital-first journey.

Contact Us

Learn More