• 沒有找到結果。

AWS Resilience Hub

N/A
N/A
Protected

Academic year: 2022

Share "AWS Resilience Hub"

Copied!
68
0
0

加載中.... (立即查看全文)

全文

(1)

AWS Resilience Hub

User Guide

(2)

AWS Resilience Hub: User Guide

Copyright © Amazon Web Services, Inc. and/or its affiliates. All rights reserved.

Amazon's trademarks and trade dress may not be used in connection with any product or service that is not Amazon's, in any manner that is likely to cause confusion among customers, or in any manner that disparages or discredits Amazon. All other trademarks not owned by Amazon are the property of their respective owners, who may or may not be affiliated with, connected to, or sponsored by Amazon.

(3)

Table of Contents

What Is AWS Resilience Hub? ... 1

Resilience Hub concepts ... 2

Resiliency ... 2

Recovery point objective (RPO) ... 2

Recovery time objective (RTO) ... 2

Application ... 2

Application component ... 2

Application compliance status ... 2

Resiliency assessment ... 3

Resiliency score ... 3

Disruption type ... 3

Fault injection experiments ... 3

SOP ... 4

How Resilience Hub works ... 4

Supported Resilience Hub resources ... 5

AppComponent grouping ... 6

Getting started ... 8

Prerequisites ... 8

Create IAM roles for an account ... 8

Add an application ... 8

Get started by adding an application ... 9

Step 1: Discover the structure ... 9

Step 2: Describe application details ... 10

Step 3: Add tags to your application ... 11

Step 4: Identify resources ... 11

Step 5: Select a resiliency policy ... 12

Step 6: Review and publish ... 13

Step 7: Run an assessment ... 13

Step 8: Review recommendations ... 14

Using Resilience Hub ... 16

Applications ... 16

Edit application resources ... 16

Viewing application summary ... 18

Publish a new application version ... 19

Deleting an application ... 19

Managing resiliency policies ... 19

Creating resiliency policies ... 20

Accessing resiliency policy details ... 22

Resiliency assessments ... 23

Running resiliency assessments ... 23

Reviewing assessments reports ... 24

Deleting resiliency assessments ... 26

Understanding resiliency scores ... 26

Calculating Resiliency scores ... 27

Calculating application component level and disruption types ... 27

Weight tables ... 28

Accessing the resiliency scores ... 28

Standard operating procedures ... 29

Building an SOP based on AWS Resilience Hub recommendations ... 30

Creating a custom SSM document ... 30

Using a custom SSM document instead of the default ... 30

Testing SOPs ... 31

Fault injection experiments ... 31

Running a fault injection experiment ... 32

(4)

Creating experiments from the assessment report ... 33

Fault injection experiment failures/status check ... 33

Managing alarms ... 34

Creating alarms ... 34

Viewing alarms ... 34

Integrating recommendations into applications ... 35

Modifying the AWS CloudFormation template ... 37

Security ... 40

Data protection ... 40

Encryption at rest ... 41

Encryption in transit ... 41

Identity and access management ... 41

Audience ... 41

Authenticating with identities ... 42

Managing access using policies ... 43

How AWS Resilience Hub works with IAM ... 45

Infrastructure security ... 59

Working with other services ... 61

AWS CloudFormation resources ... 61

Resilience Hub and AWS CloudFormation templates ... 61

Learn more about AWS CloudFormation ... 61

AWS CloudTrail ... 61

AWS Systems Manager ... 62

Document history ... 63

AWS glossary ... 64

(5)

What Is AWS Resilience Hub?

AWS Resilience Hub gives you a central place to define, validate and track the resiliency of your AWS application. The Resilience Hub helps you to protect your mission critical applications from disruptions, reduce recovery costs to optimize business continuity, and keep an audit trail of planned and unplanned outages, to help meet compliance and regulatory requirements. Use Resilience Hub to:

• Analyze your infrastructure and get recommendations to improve the resiliency of your applications.

In addition to architectural guidance for improving your applications' resiliency, the recommendations provide code for implementing tests, alarms, and standard operating procedures (SOPs) that you can deploy and run with your application in your integration and delivery (CI/CD) pipeline.

• Validate recovery time (RTO) and recovery point (RPO) targets under different conditions.

• Optimize business continuity while reducing recovery costs.

• Identify and resolve issues before they occur in production.

After you deploy an application into production, you can add Resilience Hub to your CI/CD pipeline to validate every build before it is released into production.

Describe

Describe your applications using AWS CloudFormation, including cross-region and cross-account stacks.

Applications can also be described using Resource Groups or chosen from applications that are already defined in the AWS Service Catalog AppRegistry.

Define

Define the resilience policies for your applications. These policies include RTO and RPO targets for applications, infrastructure, Availability Zone, and Region disruptions.

Assess

Resilience Hubs assessment uses best practices from the AWS Well-Architected Framework to analyze the components of an application and uncover potential resilience weaknesses. These weaknesses can be caused by incomplete infrastructure setup, misconfiguration, or situations where additional configuration improvements are needed.

Validate

After the application and SOPs are updated to incorporate recommendations from the resilience assessment, you can use Resilience Hub to test and verify your application to see if it can meet its resilience targets before releasing it into production. Resilience Hub is integrated with AWS Fault Injection Simulator (FIS), a chaos engineering service, to provide fault-injection simulations of real- world failures such as network errors or too many open connections to a database, to validate the application recovers within the resilience targets you defined. Resilience Hub also provides APIs for you to integrate its resilience assessment and testing into your CI/CD pipelines for ongoing resilience validation. Integrating resilience validating into CI/CD pipelines helps ensure that changes to the application’s underlying infrastructure do not compromise resilience.

View and Track

Resilience Hub provides a comprehensive view of your overall application portfolio resilience status through its dashboard. To help you track the resilience of applications, Resilience Hub aggregates and

(6)

Resilience Hub concepts

organizes resilience events (e.g., unavailable database or failed resilience validation), alerts, and insights from services like Amazon CloudWatch, Amazon Route 53 Application Recovery Controller, and AWS FIS). Resilience Hub also generates a resilience score, a scale that indicates the level of implementation for recommended resilience test, alarms and recovery SOPs. This score is used to measure resilience improvements over time.

Resilience Hub concepts

These concepts can help you better understand the Resilience Hub's approach to helping improve application resiliency and prevent application outages.

Resiliency

The ability to maintain availability and to recover from software and operational disruption in a designated time frame.

Recovery point objective (RPO)

The maximum acceptable amount of time since the last data recovery point. This determines what is considered an acceptable loss of data between the last recovery point and the interruption of service.

Recovery time objective (RTO)

The maximum acceptable delay between the interruption of service and restoration of service. This determines what is considered an acceptable time window when service is unavailable.

Application

An AWS Resilience Hub application is a collection of AWS resources used to discover and prevent application disruptions and outages, and helps systems automatically recover from disruptions.

Application component

A group of related AWS resources that work and fail as a single unit. For example, if you have a primary and replica database, then both databases belong to the same application component.

AWS Resilience Hub determines which AWS resources can belong to which type of application

component. For example, a DBInstance cannot belong to AWS::ResilienceHub::ComputeAppComponent compute.

Application compliance status

AWS Resilience Hub reports these types of compliance status for your applications.

Policy met

The application meets its RTO and RPO objectives defined in the policy. All its components meet the defined policy objectives. For example, you selected an RTO and RPO of 24 hours for disruptions across AWS Regions. AWS Resilience Hub can see that your backups are copied to your fallback Region. You are still expected to maintain a recover from a backup SOP, and to test and time it. This is in the operational recommendations and part of your overall resiliency score.

(7)

Resiliency assessment

Policy breached

The application did not meet the RTO and RPO objectives defined in the policy. One or more of its application components do not satisfy the policy objectives. For example, you selected an RTO and RPO of 24 hours for disruptions across AWS Regions disruptions, but your database configuration does not include any cross-Region recovery method, such as a global replication and backup copies.

Not assessed

The application requires an assessment, as it has not been assessed, nor is it being tracked.

Resiliency assessment

Resilience Hub uses a list of gaps and potential remedies to measure the effectiveness of a selected policy to recover and continue from a disaster. It evaluates each application component or application compliance status with the policy. This report can includes cost optimization recommendations and references to potential issues.

Resiliency score

Resilience Hub generates a score that indicates the level of successful implementation of the tests, alarms, and SOPs from the resiliency assessment.

Disruption type

Resilience Hub helps you assess resiliency against the following types of outages:

Application RTO and RPO

The application is healthy, but the software stack does not operate as needed, usually after deployment of new code, configuration changes, or malfunctioning of the downstream dependencies.

Cloud Infrastructure RTO and RPO

The cloud infrastructure is healthy, but a partial outage was caused because of a local error in a

component. This type of outage could be resolved in most cases with a reboot of the faulty component.

Cloud Infrastructure AZ outage

One or more Availability Zones are unavailable. This type of outage can be resolved by switching to a different Availability Zone.

Cloud Infrastructure Region outage

One or more Regions are unavailable. This type of outage can be resolved by switching to a different Region.

Fault injection experiments

Resilience Hub recommends tests to verify an application’s resiliency against different types of outages.

These outages include software, hardware, Availability Zones (AZ), or Region outages of application components.

These tests allow the following:

• Injection of failure.

• Verification that alarms can detect an outage.

(8)

SOP

• Verification that recovery procedures, or standard operating procedures (SOPs), work correctly to recover the application from the outage.

Tests for SOPs run your SOPs to measure RTO and RPO. You can test different application configurations and measure whether the output RTO and RPO meet the objectives defined in your policy.

SOP

SOP (or standard operating procedure) manages recovery procedures that are based on the outage type and application components in the application.

How Resilience Hub works

Resilience Hub helps you proactively prepare and protect your AWS applications from disruptions. The Resilience Hub offers resiliency assessment and validation that integrate into your software development lifecycle to uncover resiliency weaknesses. Resilience Hub ensures that the recovery time objective (RTO) and recovery point objective (RPO) for your applications can be met, and helps resolve issues before they are released into production.

After you deploy an AWS application into production, you can use Resilience Hub to continue tracking the resiliency posture of your application. If an outage occurs, Resilience Hub sends a notification to the operator to launch the associated recovery process.

The following steps outline How Resilience Hub works at a high-level.

1. Describe the existing AWS application that you want to protect from disruptions as a Resilience Hub application and then set resiliency objectives for the application.

When you describe the application, you import resources from AWS CloudFormation stacks, Resources Groups, or an AppRegistry to form the structural basis of and application in Resilience

(9)

Supported Resilience Hub resources

Hub. You can also use an existing application to build off of an existing structure. You then attach a resiliency policy to the application.

A Resilience Hub resiliency policy contains the information and objectives that are used to assess whether your application can recover from a disruption type, such as software or hardware disruption. When you create a resiliency policy, you define RTO and RPO for the disruption types.

These objectives are used to determine whether the application meets the resiliency policy.

2. Assess the application that you describe to learn if it meets your objectives.

After you describe your application and attach a resiliency policy to it, run a resiliency assessment.

The assessment evaluates your application configuration against the resiliency policy that is

attached to the application and generates a report. The report shows how your application measures against the objectives in your resiliency policy.

3. Receive recommendations to improve resiliency.

To improve resiliency, you update your application and resiliency policy according to the recommendations from the assessment report. Recommendations include configurations of components, alarms, tests, and recovery SOPs. You can then run another assessment and compare the results with the previous report to see how much resiliency improves. You reiterate this process until you achieve your goals for RTO and RPO.

4. Validate objectives and disaster recovery procedures.

You run tests to measure the resiliency of your AWS resources and the amount of time it takes to recover from software, hardware, Availability Zone, and AWS Region outages. To measure

resiliency, these tests simulate outages to your AWS resources. Examples of outages include network unavailable errors, failovers, stopped processes, Amazon RDS boot recovery, and problems with your Availability Zone.

When the test concludes, you can determine whether an application can recover from the outage types defined in the RTO in the resiliency policy.

5. View and track your applications resiliency over time.

After you deploy an AWS application into production, you can use Resilience Hub to continue tracking the resiliency posture of the application. If an outage occurs, the operator can view the outage in Resilience Hub and launch the associated recovery process.

6. Start recovery if there is a disruption.

If an application disruption occurs, AWS Resilience Hub helps identify the type of disruption, and alerts the operator. The operator can launch the associated SOP for recovery.

AWS Resilience Hub supported resources

Resources that affect RTO/RPO and fully supported by AWS Resiliency hub top level resources like AWS::RDS::DBInstance, AWS::RDS::DBCluster etc. View the full list of supported resources in the Application Component Resources table below.

Resiliency Hub ignores the following types of resources:

• Resources that do not affect RTO/RPO - Resources such as AWS::RDS::DBParameterGroup which never affects RTO/RPO and is always ignored by AWS Resilience Hub.

• Non-top level resources - AWS Resilience Hub only imports top-level resources since they can derive other properties by querying the properties of top-level resources. Example: For

APIGateway - AWS::ApiGateway::RestApi and AWS::ApiGatewayV2::Api are supported resources, but AWS::ApiGatewayV2::Stage is a non-top level resource which is not imported by Resilience Hub. This does not mean Resilience Hub ignores them.

(10)

AppComponent grouping

Note

Unsupported resources

These resources can affect RTO/RPO, but are not fully supported by AWS Resilience Hub as of now. AWS Resilience Hub makes the best efforts to warn users about unsupported resources when resolving resources if the application is backed by CloudFormation stake, resources group, or AppRegistry application.

AppComponent grouping

AppComponent is a group of related AWS resources that work and fail as a single unit. For example, if you have a primary and replica database, then both databases belong to the same application component. AWS Resilience Hubhas rules governing which AWS resources can belong to which type of application component. For example, a DBInstance can belong to

AWS::ResilienceHub::DatabaseAppComponent but not to AWS::ResilienceHub::ComputeAppComponent compute.

When a CFN stack, resource group, or AppRegistry application is imported into Resilience Hub, it makes its best effort to group related resources into the same application component, but may not always be 100% accurate. You know the architecture of your application the best, so you should regroup these resources if required. For example, if you have 3 EC2 instances in CFN stack, Resilience Hub creates a single application component per EC2 instance, but all 3 EC2 instances might be running the same customer application software. In this case, the correct choice is to regroup the 3 EC2 instances under a single ComputeAppComponent.

Example of correct groupings:

• Primary databases and replicas should be grouped under a single application component

• An S3 bucket and its replication should be grouped under a single application component

• EC2 instances running same customer should be grouped under a single application component

• SQS queue and its dead letter queue should be grouped under a single application component

Note

The correct grouping is required in order for Resilience Hub to compute RTO/RPO correctly and also give correct recommendations.

Each application component can contain certain types of resources as defined in the following table:

Application Component Resources

Resource Type appComponentType

AWS::EC2::Volume AWS::ResilienceHub::StorageAppComponent

AWS::EC2::NatGateway AWS::ResilienceHub::NetworkingAppComponent

AWS::DynamoDB::Table AWS::ResilienceHub::DatabaseAppComponent

AWS::RDS::DBInstance AWS::ResilienceHub::DatabaseAppComponent

AWS::SQS::Queue AWS::ResilienceHub::QueueAppComponent

AWS::AutoScaling::AutoScalingGroup AWS::ResilienceHub::ComputeAppComponent

(11)

AppComponent grouping

Resource Type appComponentType

AWS::ECS::Service AWS::ResilienceHub::ComputeAppComponent

AWS::S3::Bucket AWS::ResilienceHub::StorageAppComponent

AWS::ApiGatewayV2::Api AWS::ResilienceHub::ComputeAppComponent

AWS::EFS::FileSystem AWS::ResilienceHub::StorageAppComponent

AWS::RDS::DBCluster AWS::ResilienceHub::DatabaseAppComponent

AWS::ApiGateway::RestApi AWS::ResilienceHub::ComputeAppComponent

AWS::Lambda::Function AWS::ResilienceHub::ComputeAppComponent

AWS::DocDB::DBCluster AWS::ResilienceHub::DatabaseAppComponent

AWS::EC2::Instance AWS::ResilienceHub::ComputeAppComponent

(12)

Prerequisites

Getting started

This section describes how to start using AWS Resilience Hub. This includes creating AWS Identity and Access Management ((IAM) permissions for an account.

Prerequisites

Before you can use the Resilience Hub, you must set up:

• One or more AWS accounts.

• AWS Identity and Access Management ( (IAM) permissions.

Create IAM roles for an account

Resilience Hub integrates with AWS Identity and Access Management (IAM) so you can grant Resilience Hub users the permissions to access, test, and monitor applications to prevent disruptions and

implement disaster recovery.

Permissions for Resilience Hub depend on the type of resources that comprise your applications, and the specific Resilience Hub features that they use. For example, if an application consists of RDS DB, then you need to have rds:DescribeDBInstances permission.

For instructions on working with IAM roles and policies, see How AWS Resilience Hub works with IAM (p. 45).

Add an application to AWS Resilience Hub

AWS Resilience Hub offers resiliency assessment and validation that integrates into your software development lifecycle. Resilience Hub helps you proactively prepare and protect your AWS applications from disruptions by:

• Uncovering resiliency weaknesses.

• Ensuring that your recovery time objective (RTO) and recovery point objective (RPO) can be met.

• Resolving issues before they are released into production.

This section guides you through adding, or describing, an application by gathering resources from an existing application, AWS CloudFormation stacks, resource groups, or AppRegistry and creating an appropriate resiliency policy. After describing an application, you can publish it in Resilience Hub, and generate an assessment report on the resiliency of your application. You can then use recommendations from the assessment to improve resiliency by running another assessment, comparing results, and then iterating until you achieve your goals for RTO and RPO.

Topics

• Get started by adding an application (p. 9)

• Step 1: Discover the structure and describe your Resilience Hub application (p. 9)

• Step 2: Describe the details of your application in Resilience Hub (p. 10)

(13)

Get started by adding an application

• Step 3: Add tags (p. 11)

• Step 4: Review resources for your Resilience Hub application (p. 11)

• Step 5: Select a policy for your application (p. 12)

• Step 6: Review and publish your Resilience Hub application (p. 13)

• Step 7: Run an assessment of your Resilience Hub application (p. 13)

• Step 8: Review the application and operational recommendations for your Resilience Hub application (p. 14)

Get started by adding an application

Get started with AWS Resilience Hub by describing the details of your existing AWS application and running a report to assess resiliency.

To get started

• On the AWS Resilience Hub home page under Get started, choose Add application.

Next

Next (p. 9)

Step 1: Discover the structure and describe your Resilience Hub application

This section discusses the following methods that you use to form the basis of your application structure:

• CloudFormation stacks

• An existing AWS Resilience Hub application

• AppRegistry

• Resource Groups

CloudFormation

You can use up to five CloudFormation stacks.

Choose the CloudFormation stacks that contain the resources you want to use in the application you're describing. The stacks can be from the AWS account that you are using to describe the application or they can be from different accounts or different Regions.

To discover the resources that form the basis of your application structure

1. Select Start with CloudFormation stacks to discover your stack based resources.

2. Choose stacks from Select stacks that are associated with your AWS account and Region.

To use stacks that are in a different AWS account or different Region, enter the Amazon Resource Name (ARN) of the stack in the Stack ARN box, and then choose Stack ARN. For more information about ARNs, see Amazon Resource Names (ARNs) in the AWS General Reference.

3. Enter a name for every stack that you add.

4. Choose Next.

(14)

Step 2: Describe application details

Existing application

To get started we're going to use an existing application.

To discover the resources that form the basis of your application structure

1. Select Start with an existing application to build off of an existing structure.

2. Choose Next.

AppRegistry

Choose the AppRegistry applications that contain the resources you want to use in the application that you're describing. You can add only one AppRegistry application at a time.

To discover the resources that form the basis of your application structure

1. Select Start with AppRegistry to select from a list of applications created in AppRegistry.

2. Choose applications from Select application that were created in AppRegistry.

3. Choose Next.

Resource groups

Choose the resource groups that contain the resources that you want to use in the application that you're describing. You can use up to five resource groups.

To discover the resources that form the basis of your application structure

1. Select Start with Resource Groups to discover your resource group-based resources.

2. Choose resources from Select resource groups. You can also add Resource Group ARNs.

3. Choose Next.

Next Step

Step 2: Describe the details of your application in Resilience Hub (p. 10)

Step 2: Describe the details of your application in Resilience Hub

This section shows you how to describe the details of your existing AWS application in AWS Resilience Hub.

To describe the details of your application

1. Enter a name for the application.

2. (Optional) Enter a description for the application.

3. (Optional) Choose Add new tag if you want to associate one or more tags with the application. For more information about tags, see Tagging resources in the AWS General Reference.

4. Make sure that your name and description are what you want.

(15)

Step 3: Add tags to your application

Choose Next.

Next Step

Next Step (p. 11)

Step 3: Add tags

Assign a tag or label to an AWS resource to search and filter your resources, or track your AWS costs.

To add tags to your application

• (Optional) Choose Add new tag if you want to associate one or more tags with the application. For more information about tags, see Tagging resources in the AWS General Reference.

Next Step

Next Step (p. 11)

Step 4: Review resources for your Resilience Hub application

You need to identify the resources in your application to ensure that it contains the ones that you want.

You can add resources that are missing, or remove resources that you don’t need. The assessment reports, validation, and recommendations are based on the listed resources.

Resources are grouped into logical application components. You can edit the application components to better reflect the structure of your application. Editing the resources modifies only the Resilience Hub reference of your application. No changes are made to your actual resources or to the AWS CloudFormation stacks that contain the resources.

To identify application resources

1. The resources from the CloudFormation stacks, manually added application, AppRegistry, or resource groups that you chose for your application description are listed under Resources. You can identify them by the following:

Logical ID – A logical ID is a name used to identify resources in your CloudFormation stack, manually added application, AppRegistry, or resource groups.

Status – If the resource has been identified as being supported by Resilience Hub, the status is Included, Excluded, or is Not supported.

Resource Type – The resource type identifies the type of resource in your application. For example, AWS::EC2::Instance declares an EC2 instance.

Name – A name for the resource that you can add and edit.

Component name – The component name used to identify components.

Physical ID – The actual assigned identifier for that resource, such as an EC2 instance ID or an S3 bucket name.

CloudFormation stack – The CloudFormation stack that contains the resource. This column depends on the type of application structure that you selected.

2. To find a resource that is not listed under Resources, enter the logical ID for the resource in the search box.

(16)

Step 5: Select a resiliency policy

3. To remove a resource from your application, select the resource, and then choose Exclude resource.

When prompted, choose Exclude to remove the resource from your application.

To see the list of excluded resources, choose the Exclude resources tab.

You cannot import an excluded resource until the exclude for the resource is removed.

4. Choose Next.

Next Step

Step 5: Select a policy for your application (p. 12)

Step 5: Select a policy for your application

A Resilience Hub resiliency policy contains the information and objectives that are used to assess whether your application can recover from a disruption type, such as software or hardware disruption.

When you create a resiliency policy, you define the recovery time objective (RTO) and recovery point objective (RPO) for the disruption types. These objectives determine whether the application meets the resiliency policy.

When you create a new policy or select an existing Resilience Hub resiliency policy, consider your resiliency objectives and potential disruptions. For example, consider the business impact of the application, and the RTO and RPO objectives that you want your application to meet. Also, identify the most concerning disruptions that your application might encounter since you can set RTO and RPO goals for each disruption type.

Create a policy to get started.

To create a resiliency policy

1. Select Create a new policy.

2. Enter a name for the policy.

3. Choose one of the following from Tier:

Foundational IT core services

Mission critical

Critical

Important

Non critical

If you want to add tags, you can do that later as you continue creating your policy. For more information about tags, see Tagging resources in the AWS General Reference.

4. Under Customer Application RTO and RPO, enter a numeric value in the box and then choose the unit of time that the value represents, for both RTO and RPO.

Repeat these entries under Cloud Infrastructure RTO and RPO for Infrastructure and Availability Zone.

5. If you have a multi-Region application, you may want to define a Region RTO and RPO.

Under Region - Optional enter a numeric value in the box and then choose the unit of time that the value represents, for both RTO and RPO.

6. Choose Create to create the policy.

(17)

Step 6: Review and publish

7. Verify that the policy you just created is selected by default from the list of policies under Resiliency policies, and choose Next.

Next Step

Step 6: Review and publish your Resilience Hub application (p. 13)

Step 6: Review and publish your Resilience Hub application

Now that you have set up your application, identified your resources, and selected your resiliency policy, you are now ready to review and publish your AWS Resilience Hub application.

To review and publish your application

1. Review all the information that you entered in the previous steps.

If you see information that you need to change, choose Edit. After you make an edit, choose Next for each step until you get back to Step 6: Review and publish.

2. After you finish your review, choose Publish.

Next Step

Step 7: Run an assessment of your Resilience Hub application (p. 13)

Step 7: Run an assessment of your Resilience Hub application

The application that you published is listed on the Summary page.

After you publish your AWS Resilience Hub application, you are redirected to the application

summary page where you can run a resiliency assessment. The assessment evaluates your application configuration against the resiliency policy that is attached to your application. An assessment report is generated that shows how your application measures against the objectives in your resiliency policy.

To run a resiliency assessment

1. On the Applications summary page, choose Assess resiliency.

2. Under Report name, enter a unique name for the report or use the generated name.

3. Choose Run.

4. After you are notified that the assessment report has been generated, choose the Assessment tab and your assessment to view the report.

5. Choose the Review tab to your application's assessment report.

Next Step

Step 8: Review the application and operational recommendations for your Resilience Hub application (p. 14)

(18)

Step 8: Review recommendations

Step 8: Review the application and operational

recommendations for your Resilience Hub application

Review the resiliency and operational recommendations for the application that you published from the Review page. This page displays the application's assessment overview, RTO and PRO summary, and disruption type details, as follows:

Overview - The overview section contains information such as the application name, attached policy name, and the assessment creation date.

RTO Summary - The RTO summary displays the targeted RTO time against the estimated time assessed.

RPO Summary - The PRO summary displays the targeted RPO time against the estimated time assessed.

Details - The details section lists the disruption type, application component, and the actual RTO and RPO times tested against the attached policy configurations.

To improve resiliency, you can update your application and resiliency policy according to the

recommendations from the report. Then run another assessment, compare results, and then reiterate the process until you achieve your goals for recovery time objective (RTO) and recovery point objective (RPO).

To view application recommendations

1. On the Review page, choose Application recommendations.

2. On the Application recommendations tab, choose a component from the Components section to view application recommendations.

The Application recommendations displays recommendation information for optimizing for RTO and RPO (AZ), cost, and minimal changes.

Optimize for RTO and RPO (AZ) - This section provides recommendation information about achieving the lowest RTO and RPO times for each disruption type. This information includes the recommendation description, estimated cost to implement the recommendation, architecture type, recommended changes, and the estimated RTO and RPO time.

Optimize for cost - This section provides recommendation information about reaching your policy's RTO and RPO times at the lowest cost. This information includes the recommendation description, estimated cost to implement the recommendation, architecture type, recommended changes, and the estimated RTO and RPO time.

Optimize for minimal changes - This section provides recommendation information about reaching your policy's RTO and RPO times with the minimal infrastructure changes. This information includes recommendation description, estimated cost to implement the recommendation, architecture type, recommended changes, and the estimated RTO and RPO time.

To review operational recommendations

1. On the Review page, choose Operational recommendations.

2. On the Operational recommendations tab, choose a component from the Components section to view application recommendations.

(19)

Step 8: Review recommendations

The Operational recommendations displays recommendation information on for optimizing for RTO and RPO (AZ), cost, and minimal changes.

Optimize for RTO and RPO (AZ) - This section provides recommendation information about achieving the lowest RTO and RPO times for each disruption type. This information includes the recommendation description, estimated cost to implement the recommendation, architecture type, recommended changes, and the estimated RTO and RPO time.

Optimize for cost - This section provides recommendation information about reaching your policy's RTO and RPO times at the lowest cost. This information includes the recommendation description, estimated cost to implement the recommendation, architecture type, recommended changes, and the estimated RTO and RPO time.

Optimize for minimal changes - This section provides recommendation information about reaching your policy's RTO and RPO times with the minimal infrastructure changes.

This information includes recommendation description, estimated cost to implement the recommendation, architecture type, recommended changes, and the estimated RTO and RPO time.

For more information about describing applications, editing resources, creating resiliency policies, running assessments, and more, see Using Resilience Hub (p. 16).

(20)

Applications

Using Resilience Hub

AWS Resilience Hub helps you improve the resiliency of your applications on AWS and reduce the recovery time in the event of application outages.

To use Resilience Hub, you:

• Describe your AWS applications in Resilience Hub.

• Manage your AWS resources in Resilience Hub.

• Create effective resiliency policies.

• Manage assessments that indicate the resiliency of your applications.

• Manage alarms, standard operating procedures (SOPs), and tests for your applications.

Describing and managing Resilience Hub Applications

An AWS Resilience Hub application is a collection of AWS resources structured to prevent and recover AWS application disruptions.

To describe an Resilience Hub application, you provide an application name, resources from one or more–up to five–AWS CloudFormation stacks, and an appropriate resiliency policy. You can also use any existing Resilience Hub application as a template to describe your application.

After you describe a Resilience Hub application, you publish it so that you can run a resiliency assessment on it. You can then use recommendations from the assessment to improve resiliency by running another assessment, comparing results, and then reiterating the process until you achieve your goals for recovery time objective (RTO) and recovery point objective (RPO).

The following topics show the different approaches for describing a Resilience Hub application and how to manage them.

Topics

• Editing Resilience Hub application resources (p. 16)

• Viewing a Resilience Hub application summary (p. 18)

• Publishing a new Resilience Hub application version (p. 19)

• Deleting an AWS Resilience Hub application (p. 19)

Editing Resilience Hub application resources

To receive accurate and helpful resiliency assessments, it's important that your application's description is current and matches your actual AWS application and resources. Assessment reports, validation, and recommendations are based on the listed resources. If you add or remove resources from an AWS application, then you should reflect those changes exactly in AWS Resilience Hub.

You can identify and edit the resources in your application. Editing the resources modifies only the AWS Resilience Hub reference of your application. No changes are made to your actual resources or AWS CloudFormation stacks.

(21)

Edit application resources

You can add resources that are missing, or remove resources that you don’t need. Resources are grouped into logical application components. You can edit the application components to better reflect the structure of your application.

You add to or update your Resilience Hub application resources by editing a draft version of your

application. You make changes to the draft version and then publish a new version — a release version — which is the version that is assessed when you run resiliency assessments.

To edit application resources

1. In the navigation pane, choose Applications.

2. On the Applications page, choose the name of the application that you want to edit.

3. Choose the Versions tab.

4. Under Versions, select draft, if it's not already selected.

5. The resources from the application you chose to use as a template for your application description are listed under the Resources tab. You can identify the resources by the following:

Logical ID – The name used to identify resources in your CloudFormation stack.

CloudFormation stack – The CloudFormation stack the resource is from.

Status – If the resource listed is Included, Excluded, or Not supported.

Component name - The name assigned to the resource by Resilience Hub.

Physical ID – The actual assigned name for that resource, such as an EC2 instance ID or an S3 bucket name.

Resource Type – The resource type identifies the type of resource in your application. For example, AWS::EC2::Instance declares an EC2 instance.

6. To find a resource that is not listed, enter the logical ID for the resource in the search box.

7. To remove a resource from your application, select the resource, and then choose Exclude from Edit.

To see the list of excluded resources, choose the Excluded resources tab.

8. To add a resource to your application, from Actions, choose Add resource.

9. To resolve resources on your application, from Actions, choose Resolve resources.

10. To update stacks on your application, from Actions, choose Update Stacks.

11. If a CloudFormation stack associated with your application changed, you can reimport the stack.

All new resources to the stack are imported, except for resources that are currently excluded from Resilience Hub.

To reimport CloudFormation stacks, choose Update stacks.

a. In Select stacks, select stacks that are associated with your AWS account and Region.

To use stacks that are in a different AWS account or Region, enter the Amazon Resource Name (ARN) of the stack in the Stack ARN box and then choose Add stack ARN. For more information about ARNs, see Amazon Resource Names (ARNs) in the AWS General Reference.

b. Choose Update.

12. To view the logical components that the resources are grouped into, choose the Components tab.

Under the Components tab, you can add new components, rename a component, or delete a component by using the Actions menu.

After you make changes to your resource list, you receive an alert that indicates changes have been made to the draft version of your application. To run an accurate resiliency assessment, you must publish a new version of your application. For information about how to publish a new version, see Publishing a new Resilience Hub application version (p. 19).

(22)

Viewing application summary

Viewing a Resilience Hub application summary

The application summary page in AWS Resilience Hub console provides an overview of your application information and resiliency health.

To view an application summary

1. In the navigation pane, choose Applications.

2. On the Applications page, choose the name of the application.

The applications summary page has the following sections.

Topics

• Details (p. 18)

• Application resiliency (p. 18)

• Alarms (p. 19)

• Fault injection experiments (p. 19)

Details

The application summary Details section shows a summary of the selections for the application.

Resiliency policy shows the name of resiliency policy that you created when you added the application to Resilience Hub. Choose the name to open the detail page for the policy. For more information about resiliency policies, see Managing resiliency policies (p. 19).

Availability policy – The name of availability policy that you created when you added the application to Resilience Hub.

Description – The description of the application.

Status – Specifies whether the policy is active or inactive.

Creation time – The date and time that the application was created.

Version – The current version of your application.

Last assessment time – The date and time when the application was last assessed.

Application resiliency

The metrics shown on the Application resiliency section are from the most recent resiliency assessment of the application.

Resiliency score

The resiliency score helps you quantify your readiness to handle a potential disruption. This score reflects how closely you have followed Resilience Hub’s recommended alarms, standard operating procedures (SOPs), and tests.

The maximum resiliency score that your application can achieve is 100. The score represents all

recommended tests that run in a predefined period of time. It indicates the tests are initiating the correct alarm, and that the alarm initiates the correct SOP.

(23)

Publish a new application version

For example, suppose Resilience Hub recommends one test with one alarm and one SOP. When the test runs, the alarm initiates the associated SOP, and then runs successfully. For more information about the resiliency score, see Understanding resiliency scores (p. 26).

Alarms

The application summary Alarms section lists the alarms that you set up in Amazon CloudWatch to monitor the application. For more information about alarms, see Managing alarms (p. 34).

Fault injection experiments

The application summary Fault injection experiments section shows a list of the fault injection experiments. For more information about fault injection experiments, see Fault injection experiments (p. 31).

Publishing a new Resilience Hub application version

After you make changes to your AWS Resilience Hub application resources as described in Editing Resilience Hub application resources (p. 16), you must publish a new version of your application to run an accurate resiliency assessment. Also, you might need to publish a new version of your application if you added new recommended alarms, SOPs, and tests to your application.

To publish a new version of an application

1. In the navigation pane, choose Applications.

2. On the Applications page, choose the name of the application.

3. Choose the Versions tab.

4. Choose the Resources tab.

5. Choose Publish new version. When you publish a new version of your application, this becomes the version that is assessed when you run resiliency assessments.

6. Choose Publish.

After you publish a new version of your application, we recommend you can run a new resiliency assessment report to confirm your application still meets your resiliency policy. For information about running an assessment, see Running and managing Resilience Hub resiliency assessments (p. 23).

Deleting an AWS Resilience Hub application

After you've reached the maximum of ten application limits, you must delete one or more applications before you can add more.

To delete an application

1. In the navigation pane, choose Applications.

2. On the Applications page, select the application that you want to delete.

3. Choose Actions, and then choose Delete.

4. To confirm the deletion, enter Delete.

Managing resiliency policies

This section describes how to create resiliency policies for your applications. Setting resiliency policies correctly enables you to understand your applications resiliency posture. A resiliency policy contains

(24)

Creating resiliency policies

information and objectives that you use to assess whether your application can recover from a disruption type, such as software, hardware, Availability Zone, or AWS Region. Resiliency policies are guidelines that measure your objectives. These policies do not change or affect an actual application. Multiple applications can have the same resiliency policy.

When you create a resiliency policy, you define the objectives: Recovery Time Objective (RTO) and Recovery Point Objective (RPO). The objectives determine whether the application meets the resiliency policy. Attach the policy to your application and run a resiliency assessment. You can create different policies for the different types of applications in your portfolio. For example, a real-time trading application would have a different resiliency policy than a monthly reporting application.

The assessment evaluates your application configuration against the attached resiliency policy. At the end of the process,AWS Resilience Hub provides an assessment of how your application measures against the objectives in your resiliency policy.

You can create resiliency policies in Applications, and also in Resiliency policies. You can access relevant details about your policies, and also modify and delete them.

AWS Resilience Hub uses your RTO and RPO objectives to measure resiliency for these potential types of disruptions:

Application – Loss of a required software service or process.

Cloud infrastructure – Loss of hardware, such as EC2 instances.

Cloud infrastructure Availability Zone (AZ) – One or more Availability Zones are unavailable.

Cloud infrastructure Region – One or more Regions are unavailable.

AWS Resilience Hub enables you to create customized resiliency policies or use our recommended, open standard resiliency policies. When you create customized policies, name and describe your policy and choose the appropriate level or tier that defines your policy. These tiers include: Foundational IT core services, Mission critical, Critical, Important, and Non-critical.

Choose the tier that is appropriate for your class of application. For example, you might classify a real- time trading system as mission critical, while you might classify a monthly reporting application as non- critical. When you use our standard policies, you can choose a resiliency policy with a preconfigured tier and values for RTO and RPO objectives by disruption type. If necessary, you can change the tier and RTO and RPO values.

You can create resiliency policies in Resiliency policies, or when you describe a new application.

Creating resiliency policies

In AWS Resilience Hub, you can create a resiliency policy. A resiliency policy contains information and objectives that you use to assess whether your application can recover from a disruption type, such as software, hardware, Availability Zone, or AWS Region. Resiliency policies are guidelines that measure your objectives. These policies do not change or affect an actual application. Multiple applications can have the same resiliency policy.

When you create a resiliency policy, you define the objectives: Recovery Time Objective (RTO) and Recovery Point Objective (RPO). The objectives determine whether the application meets the resiliency policy. Attach the policy to your application and run a resiliency assessment.

The assessment evaluates your application configuration against the attached resiliency policy. At the end of the process, AWS Resilience Hub provides an assessment of how your application measures against the objectives in your resiliency policy.

You can create resiliency policies in Applications, and also in Resiliency policies. You can access relevant details about your policies, and also modify and delete them.

(25)

Creating resiliency policies

To create resiliency policies in Applications

1. In the left navigation menu, choose Applications.

2. In Applications, choose Add Application. Then choose either Quick Start or Walk-through (guided instructions). Based on your selection, proceed to step 3 or step 4.

3. If you choose Quick Start, enter a name and optional description, and then choose Add. You can add resources and resiliency policies later.

4. If you choose Walk-through:

• In Describe application details, enter the name and an optional description. Choose Next.

• Specify how your application discovers resources from either existing applications, or CloudFormation stacks.

• In Resiliency policies, choose Create a new policy.

• If you know how you want to set up your resiliency policy, choose Create a policy.

• Name and describe the policy and select the tier that defines the policy.

• Enter RTO and RPO values for Hardware disruption, Software disruption, Availability Zone disruption, and Region disruption (Region is optional).

• Choose Create to complete the process.

• If you need recommendations to set up your resiliency policy, choose Select a policy from suggestions.

• Name and describe the policy.

• Choose one of the following resiliency policies. You can get any details about the policy later.

Non-Critical Application, Important Application Tier, Critical Application Tier, Global Critical Application Tier, Mission Critical Application Tier, Global Mission Critical Application Tier, and Foundational Core Service Tier. You can get details about the policy later.

• Choose Create to complete the process.

To create resiliency policies in Resiliency policies

1. In the left navigation menu, choose Resiliency policies.

2. In Resiliency policies, choose Create a new policy.

• Name and describe the policy and select the tier that defines the policy.

• Enter RTO and RPO values for Hardware disruption, Software disruption, Availability Zone disruption, and Region disruption (optional).

• Choose Create to complete the process.

3. You can add internal tags to search, filter, and manage your AWS resources in your application.

• To add tags, choose Add new tag.

• Enter information in the Key and Value fields.

To create resiliency policies based on a suggested policy

1. In the left navigation menu, choose Resiliency policies.

2. In Resiliency policies, choose Select a policy based on a suggested policy.

• Name and describe the policy.

(26)

Accessing resiliency policy details

Non-Critical Application, Important Application Tier, Critical Application Tier, Global Critical Application Tier, Mission Critical Application Tier, Global Mission Critical Application Tier, and Foundational Core Service Tier. You can get details about the policy later.

• Input the Customer application RTO and RPO targets.

• Input the Cloud Infrastructure RTO and RPO targets.

• Choose Create to complete the process.

3. You can add internal tags to search, filter, and manage your AWS resources in your application.

• To add tags, choose Add new tag.

• Enter information in the Key and Value fields.

Accessing resiliency policy details

When you open a resiliency policy, you see important details about the policy. You can also edit or delete the resiliency.

Resiliency policy details consist of two major views: Summary and Tags.

Summary Basic information

Provides the following information about resiliency policy: Name, Description, Tier, Cost Tier, and Date Created.

RTO and RPO

Shows the RTO and RPO disruption type associated with this resiliency policy.

Tags

Use this view to manage, add, and delete tags internal to this application.

To edit resiliency policies in Resiliency policy details

1. In the left navigation menu, choose Policies.

2. In Resiliency policies, open a resiliency policy.

3. Choose Edit. Enter appropriate changes to Basic Info and RTO and RPO fields. Then choose Save changes.

To edit resiliency policies in Resiliency policy

1. In the left navigation menu, choose Policies.

2. In Resiliency policies, choose a resiliency policy.

3. Choose Actions, and then select Edit.

4. Enter appropriate changes to Basic Info and RTO and RPO fields. Then choose Save changes.

To delete resiliency policies in resiliency policy details

(27)

Resiliency assessments

1. In the left navigation menu, choose Policies.

2. In Resiliency policies, open a resiliency policy.

3. Choose Delete. Confirm your deletion, and then choose Delete.

To delete resiliency policies in resiliency policy

1. In the left navigation menu, choose Policies.

2. In Resiliency policies, choose a resiliency policy.

3. Choose Actions, and then select Delete.

4. Confirm your deletion, and then choose Delete.

Running and managing Resilience Hub resiliency assessments

When your application changes, you should run a resiliency assessment. The assessment compares each application component configuration to the policy and makes alarm, SOP, and test recommendations.

These configuration recommendations can improve the speed of recovery procedures.

Alarm recommendations help you set alarms that detect outages. SOP recommendations provide scripts that manage common recovery processes, such as recovery from a backup. Test recommendations offer suggestions to verify your configurations work properly. For example, you can test whether an application recovers during automatic recovery processes, such as automatic scaling or load balancing because of network issues. You can test whether application alarms are triggered when resources reach their limits. You can also test how well SOPs work under the conditions that you indicate.

Running resiliency assessments

You can run a resiliency assessment report from your application’s Actions menu, the Assessments view, or in the Get started banner on the Application page.

To run a resiliency assessment from the Actions menu

1. In the left navigation menu, choose Applications.

2. Choose your Application.

3. Choose the Run resiliency assessment from the Actions menu.

4. Enter a unique name or use the generated name.

5. Choose Run.

To review the assessment report, choose Assessments in your application. For more information, see the section called “Reviewing assessments reports” (p. 24)

To run a resiliency assessment from the Get Started banner

1. In the left navigation menu, choose Applications.

2. In Applications, open an application. In the Get Started banner in the middle of the application page, choose Resiliency assessed.

3. In Run resiliency assessment, enter a unique name or use the generated name, and choose Run.

To review the assessment report, choose Assessments in your application. For more information, see the section called “Reviewing assessments reports” (p. 24).

(28)

Reviewing assessments reports

To run a resiliency assessment from the Assessments view

You can run a new resiliency assessment when your application or resiliency policy changes.

1. In the left navigation menu, choose Applications.

2. Choose your application Application.

3. Choose the Assessment tab.

4. In the Assessments, choose Run resiliency assessment enter a unique name or use the generated name.

5. Choose Run.

To review the assessment report, choose Assessments in your application. For more information, see the section called “Reviewing assessments reports” (p. 24)

Reviewing assessments reports

You find assessment reports in the Assessments view of your application.

To find an assessment report

1. In the left navigation menu, choose Applications.

2. In Applications, open an application.

3. In Assessments, open an assessment report in the Resiliency assessments table.

When you open the report, you can see:

• An overall overview of the assessment report

• Recommendations to improve resiliency

• Recommendations to set up alarms, SOPs, and tests

• How to create and manage tags to search and filter your AWS resources.

Review

This section provides an overview of the assessment report. AWS Resilience Hub lists each disruption type and the associated application component. It also lists your actual RTO and RPO policies and determines whether the application component can achieve the policy goals.

Overview

Shows the name of the application, the name of the resiliency policy, and the creation date of the report.

RTO summary

Shows a graphical representation of whether the application meets resiliency policies objectives. This is based on the amount of time that an application can be down without causing significant damage to the organization. The assessment is an estimate of the expected RTO.

RPO summary

Shows a graphical representation of whether the application meets resiliency policies objectives. This is based on the amount of time that data can be lost before a significant harm to the business occurs. The assessment is an estimate of the expected RPO.

(29)

Reviewing assessments reports

Details

Provides detailed descriptions of each disruption type (application, infrastructure, Availability Zone, and Region), and provides the following information about it:

Component

The resources that comprise the application. For example, your application might have a database or compute component.

Actual RTO (Policy RTO)

Indicates whether your policy configuration aligns with your policy requirement. We provide two values, our RTO estimate and your current RTO. For example, suppose you see this value for Actual RTO policy: 40 min (2 hours). Here we estimate an RTO of 40 minutes, while your current RTO is two hours. We base our RTO calculation on the configuration, not the policy. As a result, a multi-Availability Zone database will have the same RTO for Availability Zone failure, no matter what policy you select.

Actual RPO (Policy RPO

Shows the actual RPO policy that AWS Resilience Hub estimates, based on the RPO policy that you set for each application component. For example, you might have set the policy RPO for Availability Zone failures to one hour. The actual result might be calculated to zero. This assumes that Aurora, where we commit every transaction, is successful in four out of six nodes, spanning multiple Availability Zones. It might be five minutes for point-in-time restore.

The only RTO and RPO that you can opt not to supply is Region. For some applications, it is useful to plan for recovery when there is a crucial dependency on an AWS service, which might become unavailable in the entire Region.

If you choose this option, such as setting RTO or RPO targets for the Region, you’ll receive an estimated recovery time and operational recommendations for such failures.

Reviewing resiliency recommendations

Resiliency recommendations evaluate application components and recommend how to optimize by RTO and RPO, costs, and minimal changes.

AWS Resilience Hub enables you to optimize in three categories:

Optimize for Availability Zone RTO/RPO

The lowest possible RTO and RPO during an Availability Zone disruption.

Optimize for cost

The lowest cost that you can consume and still meet your policy.

Optimize for minimal changes

Achieve your policy limit and keep implementation changes minimal.

Each optimization category breaks down as follows:

Description

The description of the suggestion.

Estimated cost

(30)

Deleting resiliency assessments

A rough estimate of how much the suggested configuration would cost, relative to current setup.

We base our estimate on list prices and usage approximation. The values weigh the impact of configuration changes on RTO and RPO versus cost. For example, if you add a passive replica, it doubles the cost by reducing RTO and RPO to minutes. Using backups will store RTO and RPO for hours, or possibly days, but the cost will be extra storage and network overhead.

Architecture type

A one word description to the architecture for hardware and Availability Zone faults, such as NoRecoveryPlan, Backup&Restore, or PilotLight, WarmStandBy, or Multisite.

Changes

A list of text changes that describe the necessary tasks to switch to the suggested configuration.

Estimated RTO

The estimated RTO after changes.

Estimated RPO

The estimated RPO after changes.

Reviewing operational recommendations

Operational recommendations contain recommendations to set up alarms, SOPs, and tests.

AWS Resilience Hub provides CloudFormation templates for you to download. AWS Resilience Hub, manages applications infrastructure as code. As a result, we supply the recommendations in CloudFormation so you can add the recommendations to the application code.

You provision the selected alarms, SOPs, and tests. To provision alarms, SOPs, and tests, select the appropriate CloudFormation template and enter a unique name. AWS Resilience Hub creates a template based on your selected recommendations. In Templates, you can access your created templates through an Amazon Simple Storage Service (Amazon S3) URL.

You can also create and manage tags. You can add tags to an application and see all the tags associated with it. It also allows you to search, add, and remove tags for an application.

Deleting resiliency assessments

To delete a resiliency assessment

1. In the left navigation menu, choose Applications.

2. In Applications, open an application.

3. In Assessments, choose an assessment report in the Resiliency assessments table.

4. To confirm the deletion, choose Delete.

The report no longer appears in the Resiliency assessments table.

Understanding resiliency scores

This section describes how AWS Resilience Hub quantifies application readiness from different disruption scenarios.

AWS Resilience Hub indicates readiness through a resiliency score. This score reflects how closely the application follows our recommendations for SOPs, monitors, and tests. Based on the type of resources

(31)

Calculating Resiliency scores

the application uses, AWS Resilience Hub recommends a set of tests, monitors, and SOPs for each disruption type.

The top resiliency score is 1. To achieve a top score, all recommended tests must complete in a

predefined time period, initiate all the correct alarms, and initiate of all the SOPs attached to them. For example, AWS Resilience Hub recommends one test with one alarm and one SOP. The test runs and fires the alarm and initiates the associated SOP. If they perform successfully, it receives a resiliency score of 1.

Recommendation types

AWS Resilience Hub associates recommendations with disruption types. Disruption types are software, hardware, Availability Zone, and AWS Region. Some recommendations might apply to multiple disruption types. For example, we could recommend to simulate Amazon Relational Database Service (Amazon RDS) hardware disruption type by a test that reboots the Amazon RDS instance. To improve resiliency scores, you should regularly implement and verify higher priority recommendations.

Calculating Resiliency scores

AWS Resilience Hub computes coverage scores for test, monitor, and SOPs for each combination of application component and disruption type. It then aggregates them based on the weight of the application component and disruption type.

This table presents the formulas AWS Resilience Hub uses to determine the resiliency scores for tests, monitors, and SOPs.

Resiliency score formulas

Name Description Formula

AWS Resilience Hub

Test coverage (T) A normalized score (0-1) based on number of tests that successfully completed out of the number of tests AWS Resilience Hub recommended.

T = Number of tests run / Total number of tests recommended.

Monitors coverage

(M) A normalized score (0-1) based on number of CloudWatch alarms that were successfully implemented out of the number of

CloudWatch alarms AWS Resilience Hub recommended.

M = Number of monitors implemented / Total number of monitors recommended.

SOP coverage (S) A normalized score (0-1) based on number of SOPs (manual or automated) that were successfully tested using AWS Resilience Hub. Tests out of the number of SOPs AWS Resilience Hub recommended.

S = Number of testable SOPs initiated / Total number of testable SOPs recommended.

Calculating application component level and disruption types

This section explains how we aggregate the recommendation type score for tests (T), monitors, and SOPs (S) to calculate the resiliency score for application components and applications.

• Resiliency Score per application component per disruption type, RSao = T * Weight(T) * M * Weight(M)

* S * Weight(S) (or) Resiliency Score per application component per disruption type is: RSao = Weighted Average (T,M,S)

(32)

Weight tables

• Resiliency Score per application component, RSa = SUM(RSao * Weight of corresponding disruption type)/SUM(Weight of corresponding disruption type)

• Resiliency Score per disruption type, RSo = SUM(RSao * Weight of corresponding app component)/

SUM(Weight of corresponding app component)

• Resiliency Score for application, RS = SUM(RSo * Weight of corresponding disruption type)/SUM(Weight of corresponding disruption type) (or) Resiliency score for application, RS = SUM(RSa * Weight of

corresponding application component)/SUM(Weight of corresponding application component)

Weight tables

AWS Resilience Hub assigns a weight to each recommendation type for the total resiliency score.

These tables show the weight for tests, monitors, SOPs, and disruption types.

Weights for Tests/Monitors/SOPs

Recommendation type Weight

Tests 25

Monitors 25

SOPs 50

Weight for Disruption Type

Recommendation type Weight

Region 10

AZ 20

Hardware 30

Software 40

Accessing the resiliency scores

You can see resiliency scores in the dashboard or from applications.

Accessing the resiliency score from the dashboard

1. In the left navigation menu, choose Dashboard.

2. In Latest resiliency score, choose one or more applications in the Filter applications dropdown menu.

3. See the resiliency score for the application.

Accessing the Resiliency score from Applications

1. In the left navigation menu, choose Applications.

2. In Applications, open an application.

參考文獻

相關文件

 Promote project learning, mathematical modeling, and problem-based learning to strengthen the ability to integrate and apply knowledge and skills, and make. calculated

As with all poetry, is-poems are a little more complicated than it looks. You need to write down all your associations and ideas and then just select a few, adding the

•  Flux ratios and gravitational imaging can probe the subhalo mass function down to 1e7 solar masses. and thus help rule out (or

This kind of algorithm has also been a powerful tool for solving many other optimization problems, including symmetric cone complementarity problems [15, 16, 20–22], symmetric

•  Please select Multiline Text and insert it into the survey. •  Optional item: you can set the minimum and maximum characters count in the edit panel on the right.. Save

Programming languages can be used to create programs that control the behavior of a. machine and/or to express algorithms precisely.” -

1) Ensure that you have received a password from the Indicators Section. 2) Ensure that the system clock of the ESDA server is properly set up. 3) Ensure that the ESDA server

To facilitate the Administrator to create student accounts, a set of procedures is prepared for the Administrator to extract the student accounts from WebSAMS. For detailed