Amazon Elastic Container Service

(1)

Amazon Elastic Container Service

Best Practices Guide

(2)

Amazon Elastic Container Service: Best Practices Guide

Amazon's trademarks and trade dress may not be used in connection with any product or service that is not Amazon's, in any manner that is likely to cause confusion among customers, or in any manner that disparages or discredits Amazon. All other trademarks not owned by Amazon are the property of their respective owners, who may or may not be aﬃliated with, connected to, or sponsored by Amazon.

(3)

Introduction

Amazon Elastic Container Service (Amazon ECS) is a highly scalable and fast container management service that you can use to manage containers on a cluster. This guide covers many of the most important operational best practices for Amazon ECS. It also describes several core concepts that are involved in how Amazon ECS based applications work. The goal is to provide a concrete and actionable approach to operating and troubleshooting Amazon ECS based applications.

This guide is revised regularly to incorporate new Amazon ECS best practices as they're established. If you have any questions or comments about any of the content in this guide, raise an issue in the GitHub repository. For more information, see Amazon ECS Best Practices Guide on GitHub.

• Best Practices - Running your application with Amazon ECS (p. 2)

• Best Practices - Networking (p. 12)

• Best Practices - Auto scaling and capacity management (p. 34)

• Best Practices - Persistent storage (p. 44)

• Speeding up deployment (p. 54)

• Best Practices - Security (p. 62)

(7)

Container image

Best Practices - Running your application with Amazon ECS

Before you run an application using Amazon Elastic Container Service, make sure that you understand how the various aspects of your application work with features in Amazon ECS. This guide covers the main Amazon ECS resources types, what they're used for, and best practices for using each of these resource types.

Container image

A container image holds your application code and all the dependencies that your application code requires to run. Application dependencies include the source code packages that your application code relies on, a language runtime for interpreted languages, and binary packages that your dynamically linked code relies on.

Container images go through a three-step process.

1.Build - Gather your application and all its dependencies into one container image.

2.Store - Upload the container image to a container registry.

3.Run - Download the container image on some compute, unpack it, and start the application.

When you create your own container image, keep in mind the best practices described in the following sections.

(8)

Make container images complete and static

Ideally, a container image is intended to be a complete snapshot of everything that the application requires to function. With a complete container image, you can run an application by downloading one container image from one place. You don't need to download several separate pieces from diﬀerent locations. Therefore, as a best practice, store all application dependencies as static ﬁles inside the container image.

At the same time, don't dynamically download libraries, dependencies, or critical data during application startup. Instead, include these things as static ﬁles in the container image. Later on, if you want to change something in the container image, build a new container image with the changes applied to it.

There are a few reasons why we recommend this best practice.

• Including all dependencies as static files in the container image reduces the number of potentially breaking events that can happen during deployments. As you scale out to tens, hundreds, or even thousands of copies of your container, downloading a single container image rather than downloading from two or three different places simplifies your workload by limiting potential breaking points.

For example, assume that you're deploying 100 copies of your application, and each copy of the application has to download pieces from three diﬀerent sources. There are 300 downloads that can fail. If you're downloading a container image, there's only 100 dependencies that can break.

• Container image downloads are optimized for downloading the application dependencies in parallel.

By default, a container image is made up of layers that can be downloaded and unpacked in parallel.

This means that a container image download can get all of your dependencies onto your compute faster than a hand coded script that downloads each dependency in a series.

• By keeping all your dependencies inside of the image, your deployments are more reliable and

reproducible. If you change a dynamically loaded dependency, it might break the application inside the container image. However, if the container is truly standalone, you can always redeploy it, even in the future. This is because it already has the right versions and right dependencies inside of it.

Maintain fast container launch times by keeping container images as small as possible

Complete containers hold everything that's needed to run your application, but they don't need to include your build tools. Consider this example. Assume that you're building a container for a Node.js application. You must have the NPM package manager to download packages for your application.

However, you no longer need NPM itself when the application runs. You can use a multistage Docker build to solve this.

The following is an example of what such a multistage Dockerfile might look like for a Node.js application that has dependencies in NPM.

FROM node:14 AS build

(9)

Only run a single application process with a container image WORKDIR /srv

ADD package.json . RUN npm install FROM node:14-slim

COPY --from=build /srv . ADD . .

EXPOSE 3000

CMD ["node", "index.js"]

The first stage uses a full Node.js environment that has NPM, and a compiler for building native code bindings for packages. The second stage includes nothing but the Node.js runtime. It can copy the downloaded NPM packages out of the first stage. The final product is a minimal image that has the Node.js runtime, the NPM packages, and the application code. It doesn't include the full NPM build toolchain.

Keep your container images as small as possible and use shared layers. For example, if you have multiple applications that use the same data set, you can create a shared base image that has that data set. Then, build two diﬀerent image variants oﬀ of the same shared base image. This allows the container image layer with the dataset to be downloaded one time, rather than twice.

The main beneﬁt of using smaller container images is that these images can be downloaded onto compute hardware faster. This allows your application to scale out faster and quickly recover from unexpected crashes or restarts.

Only run a single application process with a container image

In a traditional virtual machine environment, it's typical to run a high-level daemon like systemd as the root process. This daemon is then responsible for starting your application process, and restarting the application process if it crashes. We don't recommend this when using containers. Instead, only run a single application process with a container.

If the application process crashes or ends, the container also ends. If the application must be restarted on crash, let Amazon ECS manage the application restart externally. The Amazon ECS agent reports to the Amazon ECS control plane that the application container crashed. Then, the control plane determines whether to launch a replacement container, and if so where to launch it. The replacement container may be placed onto the same host, or onto a diﬀerent host in the cluster.

Treat containers as ephemeral resources. They only last for the lifetime of the main application process.

Don't keep restarting application processes inside of a container, to try to keep the container up and running. Let Amazon ECS replace containers as needed.

(10)

Handle SIGTERM within the application

This best practice has two key beneﬁts.

• It mitigates scenarios where an application crashed because of a mutation to the local container filesystem. Instead of reusing the same mutated container environment, the orchestrator launches a new container based off the original container image. This means that you can be confident that the replacement container is running a clean, baseline environment.

• Crashed processes are replaced through a centralized decision making process in the Amazon ECS control plane. The control plane makes smarter choices about where to launch the replacement process. For example, the control plane can attempt to launch the replacement onto diﬀerent

hardware in a diﬀerent Availability Zone. This makes the overall deployment more resilient than if each individual compute instance attempts to relaunch its own processes locally.

Handle SIGTERM within the application

When you're following the guidance of the previous section, you're allowing Amazon ECS to replace tasks elsewhere in the cluster, rather than restart the crashing application. There are other times when a task may be stopped that are outside the application's control. Tasks may be stopped due to application errors, health check failures, completion of business workﬂows or even manual termination by a user.

When a task is stopped by ECS, ECS follows the steps and conﬁguration shown in SIGTERM responsiveness (p. 56).

To prepare your application, you need to identify how long it takes your application to complete its work, and ensure that your applications handles the SIGTERM signal. Within the application's signal handling, you need to stop the application from taking new work and complete the work that is in-progress, or save unﬁnished work to storage outside of the task if it would take too long to complete.

After sending the SIGTERM signal, Amazon ECS will wait for the time speciﬁed in the StopTimeout in the task deﬁnition. Then, the SIGKILL signal will be sent. Set the StopTimeout long enough that your application completes the SIGTERM handler in all situations before the SIGKILL is sent.

For web applications, you also need to consider open connections that are idle. See the following page of this guide for more details Network Load Balancer (p. 16).

(11)

Conﬁgure containerized applications to write logs to stdout and stderr

If you use an init process in your container, use a lightweight init process such as tini. This init process takes on the responsibility of reaping zombie processes if your application spawns worker processes. If your application doesn't handle the SIGTERM signal properly, tini can catch that signal for you and terminate your application process. However, if your application process crashes tini doesn't restart it.

Instead tini exits, allowing the container to end and be replaced by container orchestration. For more information, see tini on GitHub.

Conﬁgure containerized applications to write logs to stdout and stderr

There are many different ways to do logging. For some application frameworks, it's common to use an application logging library that writes directly to disk files. It's also common to use one that streams logs directly to an ELK (OpenSearch, Logstash, Kibana) stack or a similar logging setup. However, we recommend that, when an application is containerized, you configure it to write application logs directly to the stdout and stderr streams.

Docker includes a variety of logging drivers that take the stdout and stderr log streams and handle them. You can choose to write the streams to syslog, to disk on the local instance that's running the container, or use a logging driver to send the logs to Fluentd, Splunk, CloudWatch, and other destinations. With Amazon ECS, you can choose to configure the FireLens logging driver. This driver can attach Amazon ECS metadata to logs, filter logs, and route logs to different destinations based on criteria such as HTTP status code. For more information about Docker logging drivers, see Configure logging drivers. For more information about FireLens, see Using FireLens.

(12)

Version container images using tags

When you decouple log handling from your application code, it gives you greater ﬂexibility to adjust log handling at the infrastructure level. Assume that you want to switch from one logging system to another. You can do so by adjusting a few settings at the container orchestrator level, rather than having to change code in all your services, build a new container image, and deploy it.

Version container images using tags

Container images are stored in a container registry. Each image in a registry is identiﬁed by a tag.

There's a tag called latest. This tag functions as a pointer to the latest version of the application container image, similar to the HEAD in a git repository. We recommend that you use the latest tag only for testing purposes. As a best practice, tag container images with a unique tag for each build. We recommend that you tag your images using the git SHA for the git commit that was used to build the image.

You don’t need to build a container image for every commit. However, we recommend that you build a new container image each time you release a particular code commit to the production environment. We also recommend that you tag the image with a tag that corresponds to the git commit of the code that's inside the image. If you tagged the image with the git commit, you can more quickly ﬁnd which version of the code the image is running.

We also recommend that you enable immutable image tags in Amazon Elastic Container Registry. With this setting, you can't change the container image that a tag points at. Instead Amazon ECR enforces that a new image must be uploaded to a new tag, rather than overwriting a pre-existing tag. For more information, see immutable image tags on the AWS Blog.

Task deﬁnition

The task deﬁnition is a document that describes what container images to run together, and what settings to use when running the container images. These settings include the amount of CPU and memory that the container needs. They also include any environment variables that are supplied to the container and any data volumes that are mounted to the container. Task deﬁnitions are grouped based on the dimensions of family and revision.

Use each task deﬁnition family for only one business purpose

You can use an Amazon ECS task deﬁnition to specify multiple containers. All the containers that you specify are deployed along the same compute capacity. Don't use this feature to add multiple

(13)

Use each task deﬁnition family for only one business purpose

application containers to the same task deﬁnition because this prevents copies of each application scaling separately. For example, consider this situation. Assume that you have a web server container, an API container, and a worker service container. As a best practice, use a separate task deﬁnition family for each of these pieces of containerized code.

If you group multiple types of application container together in the same task definition, you can’t independently scale those containers. For example, it's unlikely that both a website and an API require scaling out at the same rate. As traffic increases, there will be a different number of web containers required than API containers. If these two containers are being deployed in the same task definition, every task runs the same number of web containers and API containers.

We recommend that you scale each type of container independently based on demand.

We don't recommend that you use multiple containers in a single task definition for grouping different types of application container. The purpose of having multiple containers in a single task definition is so that you can deploy sidecars, small addon containers that enhance a single type of container. A sidecar might help with logging and observability, traffic routing, or other addon features.

We recommend that you use sidecars to attach extra functionality, but that the task has a single business function.

(14)

Match each application version with a task deﬁnition revision within a task deﬁnition family

A task deﬁnition can be conﬁgured to point at any container image tag, including the “latest” tag.

However, we don't recommend that you use the “latest” tag in your task deﬁnition. This is because

“latest” tag functions as a mutable pointer, so the contents of the image it points at can change while Amazon ECS doesn't identify the modiﬁcation.

Within a task deﬁnition family, consider each task deﬁnition revision as a point in time snapshot of the settings for a particular container image. This is similar to how the container is a snapshot of all the things that are needed to run a particular version of your application code.

Make sure that there's a one-to-one mapping between a version of application code, a container image tag, and a task definition revision. A typical release process involves a git commit that gets turned into a container image that's tagged with the git commit SHA. Then, that container image tag gets its own Amazon ECS task definition revision. Last, the Amazon ECS service is updated to tell it to deploy the new task definition revision.

By using this approach, you can maintain consistency between settings and application code when rolling out new versions of your application. For example, assume that you make a new version of your application that uses a new environment variable. The new task deﬁnition that corresponds to that change also deﬁnes the value for the environment variable.

Use diﬀerent IAM roles for each task deﬁnition family

You can define different IAM roles for different tasks in Amazon ECS. Use the task definition to specify an IAM role for that application. When the containers in that task definition are run, they can call AWS APIs based on the policies that are defined in the IAM role. For more information, see IAM roles for tasks.

(15)

Amazon ECS service

Define each task definition with its own IAM role. This recommendation should be done in tandem with our recommendation for providing each business component its own task definition family. By implementing both of these best practices, you can limit how much access each service has to resources in your AWS account. For example, you can give your authentication service access to connect to your passwords database. At the same time, you can also ensure that only your order service has access to the credit card payment information.

Amazon ECS service

ECS uses the service resource to group, monitor, replace, and scale identical tasks. The service resource determines what task definition and revision that Amazon ECS launches. It also determines how many copies of the task definition are launched and what resources are connected to the launched tasks. These connected resources include load balancers and service discovery. The service resource also defines rules for networking and placement of the tasks on hardware.

Use awsvpc network mode and give each service its own security group

We recommend that you use awsvpc network mode for tasks on Amazon EC2. This allows each task to have a unique IP address with a service-level security group. Doing so creates per-service security group rules, instead of instance-level security groups that are used in other network modes. Using per-service security group rules, you can, for example, authorize one service to talk to an Amazon RDS database.

Another service with a diﬀerent security group is denied from opening a connection to that Amazon RDS database.

(16)

Enable Amazon ECS managed tags and tag propagation

After you enable Amazon ECS managed tags and tag propagation, Amazon ECS can attach and propagate tags on the tasks that the service launches. You can customize these tags and use them to create tag dimensions such as environment=production or team=web or

application=storefront. These tags are used in usage and billing reports. If you set up the tags correctly, you can use them to see how many vCPU hours or GB hours that a particular environment, team, or application used. This can help you to estimate the overall cost of your infrastructure along diﬀerent dimensions.

(17)

Connecting to the internet

Best Practices - Networking

Modern applications are typically built out of multiple distributed components that communicate with each other. For example, a mobile or web application might communicate with an API endpoint, and the API might be powered by multiple microservices that communicate over the internet.

This guide presents the best practices for building a network where the components of your application can communicate with each other securely and in a scalable manner.

Topics

• Connecting to the internet (p. 12)

• Receiving inbound connections from the internet (p. 15)

• Choosing a network mode (p. 19)

• Connecting to AWS services from inside your VPC (p. 26)

• Networking between Amazon ECS services in a VPC (p. 27)

• Networking services across AWS accounts and VPCs (p. 31)

• Optimizing and troubleshooting (p. 31)

Connecting to the internet

Most containerized applications have a least some components that need outbound access to the internet. For example, the backend for a mobile app requires outbound access to push notiﬁcations.

Amazon Virtual Private Cloud has two main methods for facilitating communication between your VPC and the internet.

Topics

• Using a public subnet and internet gateway (p. 13)

• Using a private subnet and NAT gateway (p. 14)

(18)

Using a public subnet and internet gateway

By using a public subnet that has a route to an internet gateway, your containerized application can run on a host inside a VPC on a public subnet. The host that runs your container is assigned a public IP address. This public IP address is routable from the internet. For more information, see Internet gateways in the Amazon VPC User Guide.

This network architecture facilitates direct communication between the host that runs your application and other hosts on the internet. The communication is bi-directional. This means that not only can you establish an outbound connection to any other host on the internet, but other hosts on the internet might also attempt to connect to your host. Therefore, you should pay close attention to your security group and ﬁrewall rules. This is to ensure that other hosts on the internet can’t open any connections that you don't want to be opened.

For example, if your application is running on Amazon EC2, make sure that port 22 for SSH access is not open. Otherwise, your instance could receive constant SSH connection attempts from malicious bots on the internet. These bots trawl through public IP addresses. After they ﬁnd an open SSH port, they attempt to brute-force passwords to try to access your instance. Because of this, many organizations limit the usage of public subnets and prefer to have most, if not all, of their resources inside of private subnets.

(19)

Using a private subnet and NAT gateway

Using public subnets for networking is suitable for public applications that require large amounts of bandwidth or minimal latency. Applicable use cases include video streaming and gaming services.

This networking approach is supported both when you use Amazon ECS on Amazon EC2 and when you use it on AWS Fargate.

• Using Amazon EC2 — You can launch EC2 instances on a public subnet. Amazon ECS uses these EC2 instances as cluster capacity, and any containers that are running on the instances can use the underlying public IP address of the host for outbound networking. This applies to both the host and bridge network modes. However, the awsvpc network mode doesn't provide task ENIs with public IP addresses. Therefore, they can’t make direct use of an internet gateway.

• Using Fargate — When you create your Amazon ECS service, specify public subnets for the networking conﬁguration of your service, and ensure that the Assign public IP address option is enabled.

Each Fargate task is networked in the public subnet, and has its own public IP address for direct communication with the internet.

Using a private subnet and NAT gateway

By using a private subnet and a NAT gateway, you can run your containerized application on a host that's in a private subnet. As such, this host has a private IP address that's routable inside your VPC, but isn't routable from the internet. This means that other hosts inside the VPC can make connections to the host using its private IP address, but other hosts on the internet can't make any inbound communications to the host.

With a private subnet, you can use a Network Address Translation (NAT) gateway to enable a host inside a private subnet to connect to the internet. Hosts on the internet receive an inbound connection that appears to be coming from the public IP address of the NAT gateway that's inside a public subnet.

The NAT gateway is responsible for serving as a bridge between the internet and the private VPC. This conﬁguration is often preferred for security reasons because it means that your VPC is protected from direct access by attackers on the internet. For more information, see NAT gateways in the Amazon VPC User Guide.

(20)

Receiving inbound connections from the internet

This private networking approach is suitable for scenarios where you want to protect your containers from direct external access. Applicable scenarios include payment processing systems or containers storing user data and passwords. You're charged for creating and using a NAT gateway in your account.

NAT gateway hourly usage and data processing rates also apply. For redundancy purposes, you should have a NAT gateway in each Availability Zone. This way, the loss in availability of a single Availability Zone doesn't compromise your outbound connectivity. Because of this, if you have a small workload, it might be more cost eﬀective to use private subnets and NAT gateways.

This networking approach is supported both when using Amazon ECS on Amazon EC2 and when using it on AWS Fargate.

• Using Amazon EC2 — You can launch EC2 instances on a private subnet. The containers that run on these EC2 hosts use the underlying hosts networking, and outbound requests go through the NAT gateway.

• Using Fargate — When you create your Amazon ECS service, specify private subnets for the

networking conﬁguration of your service, and don't enable the Assign public IP address option. Each Fargate task is hosted in a private subnet. Its outbound traﬃc is routed through any NAT gateway that you have associated with that private subnet.

Receiving inbound connections from the internet

If you run a public service, you must accept inbound traﬃc from the internet. For example, your public website must accept inbound HTTP requests from browsers. In such case, other hosts on the internet must also initiate an inbound connection to the host of your application.

One approach to this problem is to launch your containers on hosts that are in a public subnet with a public IP address. However, we don't recommend this for large-scale applications. For these, a better approach is to have a scalable input layer that sits between the internet and your application. For this approach, you can use any of the AWS services listed in this section as an input.

Topics

• Application Load Balancer (p. 15)

• Network Load Balancer (p. 16)

• Amazon API Gateway HTTP API (p. 18)

Application Load Balancer

An Application Load Balancer functions at the application layer. It's the seventh layer of the Open Systems Interconnection (OSI) model. This makes an Application Load Balancer suitable for public HTTP services. If you have a website or an HTTP REST API, then an Application Load Balancer is a suitable load balancer for this workload. For more information, see What is an Application Load Balancer? in the User Guide for Application Load Balancers.

(21)

Network Load Balancer

With this architecture, you create an Application Load Balancer in a public subnet so that it has a public IP address and can receive inbound connections from the internet. When the Application Load Balancer receives an inbound connection, or more speciﬁcally an HTTP request, it opens a connection to the application using its private IP address. Then, it forwards the request over the internal connection.

An Application Load Balancer has the following advantages.

• SSL/TLS termination — An Application Load Balancer can sustain secure HTTPS communication and certiﬁcates for communications with clients. It can optionally terminate the SSL connection at the load balancer level so that you don’t have to handle certiﬁcates in your own application.

• Advanced routing — An Application Load Balancer can have multiple DNS hostnames. It also has advanced routing capabilities to send incoming HTTP requests to different destinations based on metrics such as the hostname or the path of the request. This means that you can use a single Application Load Balancer as the input for many different internal services, or even microservices on different paths of a REST API.

• gRPC support and websockets — An Application Load Balancer can handle more than just HTTP. It can also load balance gRPC and websocket based services, with HTTP/2 support.

• Security — An Application Load Balancer helps protect your application from malicious traffic. It includes features such as HTTP de sync mitigations, and is integrated with AWS Web Application Firewall (AWS WAF). AWS WAF can further filter out malicious traffic that might contain attack patterns, such as SQL injection or cross-site scripting.

Network Load Balancer

A Network Load Balancer functions at the fourth layer of the Open Systems Interconnection (OSI) model.

It's suitable for non-HTTP protocols or scenarios where end-to-end encryption is necessary, but doesn’t have the same HTTP-speciﬁc features of an Application Load Balancer. Therefore, a Network Load Balancer is best suited for applications that don’t use HTTP. For more information, see What is a Network Load Balancer? in the User Guide for Network Load Balancers.

(22)

Network Load Balancer

When a Network Load Balancer is used as an input, it functions similarly to an Application Load Balancer.

This is because it's created in a public subnet and has a public IP address that can be accessed on the internet. The Network Load Balancer then opens a connection to the private IP address of the host running your container, and sends the packets from the public side to the private side.

Network Load Balancer features

Because the Network Load Balancer operates at a lower level of the networking stack, it doesn't have the same set of features that Application Load Balancer does. However, it does have the following important features.

• End-to-end encryption — Because a Network Load Balancer operates at the fourth layer of the OSI model, it doesn't read the contents of packets. This makes it suitable for load balancing communications that need end-to-end encryption.

• TLS encryption — In addition to end-to-end encryption, Network Load Balancer can also terminate TLS connections. This way, your backend applications don’t have to implement their own TLS.

• UDP support — Because a Network Load Balancer operates at the fourth layer of the OSI model, it's suitable for non HTTP workloads and protocols other than TCP.

Closing connections

Because the Network Load Balancer does not observe the application protocol at the higher layers of the OSI model, it cannot send closure messages to the clients in those protocols. Unlike the Application Load Balancer, those connections need to be closed by the application or you can conﬁgure the Network Load Balancer to close the fourth layer connections when a task is stopped or replaced. See the

connection termination setting for Network Load Balancer target groups in the Network Load Balancer documentation.

Letting the Network Load Balancer close connections at the fourth layer can cause clients to display undesired error messages, if the client does not handle them. See the Builders Library for more information on recommended client conﬁguration here.

(23)

Amazon API Gateway HTTP API

The methods to close connections will vary by application, however one way is to ensure that the Network Load Balancer target deregistration delay is longer than client connection timeout. The client would timeout ﬁrst and reconnect gracefully through the Network Load Balancer to the next task while the old task slowly drains all of its clients. For more information about the Network Load Balancer target deregistration delay, see the Network Load Balancer documentation.

Amazon API Gateway HTTP API

Amazon API Gateway HTTP API is a serverless ingress that's suitable for HTTP applications with sudden bursts in request volumes or low request volumes. For more information, see What is Amazon API Gateway? in the API Gateway Developer Guide.

The pricing model for both Application Load Balancer and Network Load Balancer include an hourly price to keep the load balancers available for accepting incoming connections at all times. In contrast, API Gateway charges for each request separately. This has the effect that, if no requests come in, there are no charges. Under high traffic loads, an Application Load Balancer or Network Load Balancer can handle a greater volume of requests at a cheaper per-request price than API Gateway. However, if you have a low number of requests overall or have periods of low traffic, then the cumulative price for using the API Gateway should be more cost effective than paying a hourly charge to maintain a load balancer that's being underutilized. The API Gateway can also cache API responses, which might result in lower backend request rates.

API Gateway functions which use a VPC link that allows the AWS managed service to connect to hosts inside the private subnet of your VPC, using its private IP address. It can detect these private IP addresses by looking at AWS Cloud Map service discovery records that are managed by Amazon ECS service

discovery.

API Gateway supports the following features.

• The API Gateway operatation is similar to a load balancer, but has additional capabilities unique to API management

• The API Gateway provides additional capabilities around client authorization, usage tiers, and request/

response modiﬁcation. For more information, see Amazon API Gateway features.

• The API Gateway can support edge, regional, and private API gateway endpoints. Edge endpoints are available through a managed CloudFront distribution. Regional and private endpoints are both local to a Region.

• SSL/TLS termination

(24)

Choosing a network mode

• Routing diﬀerent HTTP paths to diﬀerent backend microservices

Besides the preceding features, API Gateway also supports using custom Lambda authorizers that you can use to protect your API from unauthorized usage. For more information, see Field Notes: Serverless Container-based APIs with Amazon ECS and Amazon API Gateway.

Choosing a network mode

The approaches previously mentioned for architecting inbound and outbound network connections can apply to any of your workloads on AWS, even if they aren’t inside a container. When running containers on AWS, you need to consider another level of networking. One of the main advantages of using containers is that you can pack multiple containers onto a single host. When doing this, you need to choose how you want to network the containers that are running on the same host. The following are the options to choose from.

• Host mode - The host network mode is the most basic network mode that's supported in Amazon ECS.

• Bridge mode - The bridge network mode allows you to use a virtual network bridge to create a layer between the host and the networking of the container.

• AWSVPC mode - With the awsvpc network mode, Amazon ECS creates and manages an Elastic Network Interface (ENI) for each task and each task receives its own private IP address within the VPC.

Host mode

The host network mode is the most basic network mode that's supported in Amazon ECS. Using host mode, the networking of the container is tied directly to the underlying host that's running the container.

(25)

Bridge mode

Assume that you're running a Node.js container with an Express application that listens on port 3000 similar to the one illustrated in the preceding diagram. When the host network mode is used, the container receives traﬃc on port 3000 using the IP address of the underlying host Amazon EC2 instance.

We do not recommend using this mode.

There are significant drawbacks to using this network mode. You can’t run more than a single instantiation of a task on each host. This is because only the first task can bind to its required port on the Amazon EC2 instance. There's also no way to remap a container port when it's using host network mode. For example, if an application needs to listen on a particular port number, you can't remap the port number directly. Instead, you must manage any port conflicts through changing the application configuration.

There are also security implications when using the host network mode. This mode allows containers to impersonate the host, and it allows containers to connect to private loopback network services on the host.

The host network mode is only supported for Amazon ECS tasks hosted on Amazon EC2 instances. It's not supported when using Amazon ECS on Fargate.

Bridge mode

With bridge mode, you're using a virtual network bridge to create a layer between the host and the networking of the container. This way, you can create port mappings that remap a host port to a container port. The mappings can be either static or dynamic.

(26)

Bridge mode

With a static port mapping, you can explicitly define which host port you want to map to a container port. Using the example above, port 80 on the host is being mapped to port 3000 on the container. To communicate to the containerized application, you send traffic to port 80 to the Amazon EC2 instance's IP address. From the containerized application’s perspective it sees that inbound traffic on port 3000.

If you only want to change the traﬃc port, then static port mappings is suitable. However, this still has the same disadvantage as using the host network mode. You can't run more than a single instantiation of a task on each host. This is because a static port mapping only allows a single container to be mapped to port 80.

To solve this problem, consider using the bridge network mode with a dynamic port mapping as shown in the following diagram.

(27)

AWSVPC mode

By not specifying a host port in the port mapping, you can have Docker choose a random, unused port from the ephemeral port range and assign it as the public host port for the container. For example, the Node.js application listening on port 3000 on the container might be assigned a random high number port such as 47760 on the Amazon EC2 host. Doing this means that you can run multiple copies of that container on the host. Moreover, each container can be assigned its own port on the host. Each copy of the container receives traﬃc on port 3000. However, clients that send traﬃc to these containers use the randomly assigned host ports.

Amazon ECS helps you to keep track of the randomly assigned ports for each task. It does this by automatically updating load balancer target groups and AWS Cloud Map service discovery to have the list of task IP addresses and ports. This makes it easier to use services operating using bridge mode with dynamic ports.

However, one disadvantage of using the bridge network mode is that it's difficult to lock down service to service communications. Because services might be assigned to any random, unused port, it's necessary to open broad port ranges between hosts. However, it's not easy to create specific rules so that a particular service can only communicate to one other specific service. The services have no specific ports to use for security group networking rules.

The bridge network mode is only supported for Amazon ECS tasks hosted on Amazon EC2 instances. It is not supported when using Amazon ECS on Fargate.

AWSVPC mode

With the awsvpc network mode, Amazon ECS creates and manages an Elastic Network Interface (ENI) for each task and each task receives its own private IP address within the VPC. This ENI is separate from the underlying hosts ENI. If an Amazon EC2 instance is running multiple tasks, then each task’s ENI is separate as well.

(28)

AWSVPC mode

In the preceding example, the Amazon EC2 instance is assigned to an ENI. The ENI represents the IP address of the EC2 instance used for network communications at the host level. Each task also has a corresponding ENI and an private IP address. Because each ENI is separate, each container can bind to port 80 on the task ENI. Therefore, you don't need to keep track of port numbers. Instead, you can send traﬃc to port 80 at the IP address of the task ENI.

The advantage of using the awsvpc network mode is that each task has a separate security group to allow or deny traffic. This means you have greater flexibility to control communications between tasks and services at a more granular level. You can also configure a task to deny incoming traffic from another task located on the same host.

The awsvpc network mode is supported for Amazon ECS tasks hosted on both Amazon EC2 and Fargate.

Be mindful that, when using Fargate, the awsvpc network mode is required.

When using the awsvpc network mode there are a few challenges you should be mindful of.

Increasing task density with ENI Trunking

The biggest disadvantage of using the awsvpc network mode with tasks that are hosted on Amazon EC2 instances is that EC2 instances have a limit on the number of ENIs that can be attached to them. This limits how many tasks you can place on each instance. Amazon ECS provides the ENI trunking feature which increases the number of available ENIs to achieve more task density.

(29)

AWSVPC mode

When using ENI trunking, two ENI attachments are used by default. The ﬁrst is the primary ENI of the instance, which is used for any host level processes. The second is the trunk ENI, which Amazon ECS creates. This feature is only supported on speciﬁc Amazon EC2 instance types.

Consider this example. Without ENI trunking, a c5.large instance that has two vCPUs can only host two tasks. However, with ENI trunking, a c5.large instance that has two vCPU’s can host up to ten tasks.

Each task has a diﬀerent IP address and security group. For more information about available instance types and their density, see Supported Amazon EC2 instance types in the Amazon Elastic Container Service Developer Guide.

ENI trunking has no impact on runtime performance in terms of latency or bandwidth. However, it increases task startup time. You should ensure that, if ENI trunking is used, your autoscaling rules and other workloads that depend on task startup time still function as you expect them to.

For more information, see Elastic network interface trunking in the Amazon Elastic Container Service Developer Guide.

Preventing IP address exhaustion

By assigning a separate IP address to each task, you can simplify your overall infrastructure and

maintain security groups that provide a great level of security. However, this conﬁguration can lead to IP exhaustion.

The default VPC on your AWS account has pre-provisioned subnets that have a /20 CIDR range. This means each subnet has 4,091 available IP addresses. Note that several IP addresses within the /20 range are reserved for AWS speciﬁc usage. Consider this example. You distribute your applications across three subnets in three Availability Zones for high availability. In this case, you can use approximately 12,000 IP addresses across the three subnets.

(30)

AWSVPC mode

Using ENI trunking, each Amazon EC2 instance that you launch requires two IP addresses. One IP address is used for the primary ENI, and the other IP address is used for the trunk ENI. Each Amazon ECS task on the instance requires one IP address. If you're launching an extremely large workload, you could run out of available IP addresses. This might result in Amazon EC2 launch failures or task launch failures. These errors occur because the ENIs can't add IP addresses inside the VPC if there are no available IP addresses.

When using the awsvpc network mode, you should evaluate your IP address requirements and ensure that your subnet CIDR ranges meet your needs. If you have already started using a VPC that has small subnets and begins to run out of address space, you can add a secondary subnet.

By using ENI trunking, the Amazon VPC CNI can be configured to use ENIs in a different IP address space than the host. By doing this, you can give your Amazon EC2 host and your tasks different IP address ranges that don't overlap. In the example diagram, the EC2 host IP address is in a subnet that has the 172.31.16.0/20 IP range. However, tasks that are running on the host are assigned IP addresses in the 100.64.0.0/19 range. By using two independent IP ranges, you don’t need to worry about tasks consuming too many IP addresses and not leaving enough IP addresses for instances.

Using IPv6 dual stack mode

The awsvpc network mode is compatible with VPCs that are conﬁgured for IPv6 dual stack mode. A VPC using dual stack mode can communicate over IPv4, IPv6, or both. Each subnet in the VPC can have both an IPv4 CIDR range and an IPv6 CIDR range. For more information, see IP addressing in your VPC in the Amazon VPC User Guide.

You can't disable IPv4 support for your VPC and subnets to address IPv4 exhaustion issues. However, with the IPv6 support, you can use some new capabilities, speciﬁcally the egress-only internet gateway.

An egress-only internet gateway allows tasks to use their publicly routable IPv6 address to initiate outbound connections to the internet. But the egress-only internet gateway doesn't allow connections from the internet. For more information, see Egress-only internet gateways in the Amazon VPC User Guide.

(31)

Connecting to AWS services

Connecting to AWS services from inside your VPC

For Amazon ECS to function properly, the ECS container agent that runs on each host must communicate with the Amazon ECS control plane. If you're storing your container images in Amazon ECR, the Amazon EC2 hosts must communicate to the Amazon ECR service endpoint, and to Amazon S3, where the image layers are stored. If you use other AWS services for your containerized application, such as persisting data stored in DynamoDB, double-check that these services also have the necessary networking support.

Topics

• NAT gateway (p. 26)

• AWS PrivateLink (p. 27)

NAT gateway

Using a NAT gateway is the easiest way to ensure that your Amazon ECS tasks can access other AWS services. For more information about this approach, see Using a private subnet and NAT gateway (p. 14).

The following are the disadvantages to using this approach:

• You can't limit what destinations the NAT gateway can communicate with. You also can't limit which destinations your backend tier can communicate to without disrupting all outbound communications from your VPC.

• NAT gateways charge for every GB of data that passes through. If you use the NAT gateway for downloading large ﬁles from Amazon S3, or doing a high volume of database queries to DynamoDB, you're charged for every GB of bandwidth. Additionally, NAT gateways support 5 Gbps of bandwidth and automatically scale up to 45 Gbps. If you route through a single NAT gateway, applications that require very high bandwidth connections might encounter networking constraints. As a workaround, you can divide your workload across multiple subnets and give each subnet its own NAT gateway.

(32)

AWS PrivateLink

AWS PrivateLink provides private connectivity between VPCs, AWS services, and your on-premises networks without exposing your traﬃc to the public internet.

One of the technologies used to accomplish this is the VPC endpoint. A VPC endpoint enables private connections between your VPC and supported AWS services and VPC endpoint services. Traﬃc between your VPC and the other service doesn't leave the Amazon network. A VPC endpoint doesn't require an internet gateway, virtual private gateway, NAT device, VPN connection, or AWS Direct Connect connection. Amazon EC2 instances in your VPC don't require public IP addresses to communicate with resources in the service.

The following diagram shows how communication to AWS services works when you are using VPC endpoints instead of an internet gateway. AWS PrivateLink provisions elastic network interfaces (ENIs) inside of the subnet, and VPC routing rules are used to send any communication to the service hostname through the ENI, directly to the destination AWS service. This traﬃc no longer needs to use the NAT gateway or internet gateway.

The following are some of the common VPC endpoints that are used with the Amazon ECS service.

• S3 gateway VPC endpoint

• DynamoDB VPC endpoint

• Amazon ECS VPC endpoint

• Amazon ECR VPC endpoint

Many other AWS services support VPC endpoints. If you make heavy usage of any AWS service, you should look up the speciﬁc documentation for that service and how to create a VPC endpoint for that traﬃc.

Networking between Amazon ECS services in a VPC

Using Amazon ECS containers in a VPC, you can spilt monolithic applications into separate parts that can be deployed and scaled independently in a secure environment. However, it can be challenging to make

(33)

Using service discovery

sure that all of these parts, both in and outside of a VPC, can communicate with each other. There are several approaches for facilitating communication, all with diﬀerent advantages and disadvantages.

Using service discovery

One approach for service-to-service communication is direct communication using service discovery. In this approach, you can use the AWS Cloud Map service discovery integration with Amazon ECS. Using service discovery, Amazon ECS syncs the list of launched tasks to AWS Cloud Map, which maintains a DNS hostname that resolves to the internal IP addresses of one or more tasks from that particular service. Other services in the Amazon VPC can use this DNS hostname to send traﬃc directly to another container using its internal IP address. For more information, see Service discovery in the Amazon Elastic Container Service Developer Guide.

In the preceding diagram, there are three services. serviceA has one container and communicates with serviceB, which has two containers. serviceB must also communicate with serviceC, which has one container. Each container in all three of these services can use the internal DNS names from AWS Cloud Map to ﬁnd the internal IP addresses of a container from the downstream service that it needs to communicate to.

This approach to service-to-service communication provides low latency. At ﬁrst glance, it's also simple as there are no extra components between the containers. Traﬃc travels directly from one container to the other container.

This approach is suitable when using the awsvpc network mode, where each task has its own unique IP address. Most software only supports the use of DNS A records, which resolve directly to IP addresses.

When using the awsvpc network mode, the IP address for each task are an A record. However, if you're using bridge network mode, multiple containers could be sharing the same IP address. Additionally, dynamic port mappings cause the containers to be randomly assigned port numbers on that single IP address. At this point, an A record is no longer be enough for service discovery. You must also use an SRV record. This type of record can keep track of both IP addresses and port numbers but requires that you

(34)

Using an internal load balancer

conﬁgure applications appropriately. Some prebuilt applications that you use might not support SRV records.

Another advantage of the awsvpc network mode is that you have a unique security group for each service. You can conﬁgure this security group to allow incoming connections from only the speciﬁc upstream services that need to talk to that service.

The main disadvantage of direct service-to-service communication using service discovery is that you must implement extra logic to have retries and deal with connection failures. DNS records have a time- to-live (TTL) period that controls how long they are cached for. It takes some time for the DNS record to be updated and for the cache to expire so that your applications can pick up the latest version of the DNS record. So, your application might end up resolving the DNS record to point at another container that's no longer there. Your application needs to handle retries and have logic to ignore bad backends.

Using an internal load balancer

Another approach to service-to-service communication is to use an internal load balancer. An internal load balancer exists entirely inside of your VPC and is only accessible to services inside of your VPC.

The load balancer maintains high availability by deploying redundant resources into each subnet.

When a container from serviceA needs to communicate with a container from serviceB, it opens a connection to the load balancer. The load balancer then opens a connection to a container from service B. The load balancer serves as a centralized place for managing all connections between each service.

If a container from serviceB stops, then the load balancer can remove that container from the pool. The load balancer also does health checks against each downstream target in its pool and can automatically remove bad targets from the pool until they become healthy again. The applications no longer need to be aware of how many downstream containers are there. They just open their connections to the load balancer.

(35)

Using a service mesh

This approach is advantageous to all network modes. The load balancer can keep track of task IP addresses when using the awsvpc network mode, as well as more advanced combinations of IP address and port when using the bridge network mode. It evenly distribute traﬃc across all the IP address and port combinations, even if several containers are actually hosted on the same Amazon EC2 instance, just on diﬀerent ports.

The one disadvantage of this approach is cost. To be highly available, the load balancer needs to have resources in each Availability Zone. This adds extra cost because of the overhead of paying for the load balancer and for the amount of traﬃc that goes through the load balancer.

However, you can reduce overhead costs by having multiple services share a load balancer. This is particularly suitable for REST services that use an Application Load Balancer. You can create path- based routing rules that route traffic to different services. For example, /api/user/* might route to a container that's part of the user service, whereas /api/order/* might route to the associated order service. With this approach, you only pay for one Application Load Balancer, and have one consistent URL for your API. However, you can split the traffic off to various microservices on the backend.

Using a service mesh

AWS App Mesh is a service mesh that can help you manage a large number of services and have better control of how traﬃc gets routed among services. App Mesh functions as an intermediary between basic service discovery and load balancing. With App Mesh, applications don't directly interact with each other, but they also don’t use a centralized load balancer either. Instead, each copy of your task is accompanied by an Envoy proxy sidecar. For more information, see What is AWS App Mesh in the AWS App Mesh User Guide.

In the preceding diagram, each task has an Envoy proxy sidecar. This sidecar is responsible for proxying all inbound and outbound traffic for the task. The App Mesh control plane uses AWS Cloud Map to get the list of available services and the IP addresses of specific tasks. Then App Mesh delivers the configuration to the Envoy proxy sidecar. This configuration includes the list of available containers that can be connected to. The Envoy proxy sidecar also conducts health checks against each target to ensure that they're available.

This approach provides the features of service discovery, with the ease of the managed load balancer.

Applications don't implement as much load balancing logic within their code because the Envoy proxy

(36)

Networking services across AWS accounts and VPCs

sidecar handles that load balancing. The Envoy proxy can be configured to detect failures and retry failed requests. Additionally, it can also be configured to use mTLS to encrypt traffic in transit, and ensure that your applications are communicating to a verified destination.

There are a few diﬀerences between an Envoy proxy and a load balancer. In short, with Envoy proxy, you're responsible for deploying and managing your own Envoy proxy sidecar. The Envoy proxy sidecar uses some of the CPU and memory that you allocate to the Amazon ECS task. This adds some overhead to the task resource consumption, and additional operational workload to maintain and update the proxy when needed.

App Mesh and an Envoy proxy allows for extremely low latency between tasks. This is because Envoy proxy runs collocated to each task. There's only one instance to instance network jump, between one Envoy proxy and another Envoy proxy. This means there's also less network overhead compared to when using load balancers. When using load balancers, there are two network jumps. The ﬁrst is from the upstream task to the load balancer, and the second is from the load balancer to the downstream task.

Networking services across AWS accounts and VPCs

If you're part of an organization with multiple teams and divisions, you probably deploy services independently into separate VPCs inside a shared AWS account or into VPCs that are associated with multiple individual AWS accounts. No matter which way you deploy your services, we recommend that you supplement your networking components to help route traﬃc between VPCs. For this, several AWS services can be used to supplement your existing networking components.

• AWS Transit Gateway — You should consider this networking service ﬁrst. This service serves as a central hub for routing your connections between Amazon VPCs, AWS accounts, and on-premises networks. For more information, see What is a transit gateway? in the Amazon VPC Transit Gateways Guide.

• Amazon VPC and VPN support — You can use this service to create site-to-site VPN connections for connecting on-premises networks to your VPC. For more information, see What is AWS Site-to-Site VPN? in the AWS Site-to-Site VPN User Guide.

• Amazon VPC — You can use Amazon VPC peering to help you to connect multiple VPCs, either in the same account, or across accounts. For more information, see What is VPC peering? in the Amazon VPC Peering Guide.

• Shared VPCs — You can use a VPC and VPC subnets across multiple AWS accounts. For more information, see Working with shared VPCs in the Amazon VPC User Guide.

Optimizing and troubleshooting

The following services and features can help you to gain insights about your network and service conﬁgurations. You can use this information to troubleshoot networking issues and better optimize your services.

CloudWatch Container Insights

CloudWatch Container Insights collects, aggregates, and summarizes metrics and logs from your containerized applications and microservices. Metrics include the utilization of resources such as CPU, memory, disk, and network. They're available in CloudWatch automatic dashboards. For more information, see Setting up Container Insights on Amazon ECS in the Amazon CloudWatch User Guide.

(37)

AWS X-Ray

AWS X-Ray is a tracing service that you can use to collect information about the network requests that your application makes. You can use the SDK to instrument your application and capture timings and response codes of traﬃc between your services, and between your services and AWS service endpoints.

For more information, see What is AWS X-Ray in the AWS X-Ray Developer Guide.

You can also explore AWS X-Ray graphs of how your services network with each other. Or, use them to explore aggregate statistics about how each service-to-service link is performing. Last, you can dive deeper into any speciﬁc transaction to see how segments representing network calls are associated with that particular transaction.

You can use these features to identify if there's a networking bottleneck or if a speciﬁc service within your network isn't performing as expected.

(38)

VPC Flow Logs

You can use Amazon VPC ﬂow logs to analyze network performance and debug connectivity issues.

With VPC ﬂow logs enabled, you can capture a log of all the connections in your VPC. These include connections to networking interfaces that are associated with Elastic Load Balancing, Amazon RDS, NAT gateways, and other key AWS services that you might be using. For more information, see VPC Flow Logs in the Amazon VPC User Guide.

Network tuning tips

There are a few settings that you can ﬁne tune in order to improve your networking.

noﬁle ulimit

If you expect your application to have high traffic and handle many concurrent connections, you should take into account the system quota for the number of files allowed. When there are a lot of network sockets open, each one must be represented by a file descriptor. If your file descriptor quota is too low, it will limit your network sockets,. This results in failed connections or errors. You can update the container specific quota for the number of files in the Amazon ECS task definition. If you're running on Amazon EC2 (instead of AWS Fargate), then you might also need to adjust these quotas on your underlying Amazon EC2 instance.

sysctl net

Another category of tunable setting is the sysctl net settings. You should refer to the speciﬁc settings for your Linux distribution of choice. Many of these settings adjust the size of read and write buﬀers. This can help in some situations when running large-scale Amazon EC2 instances that have a lot of containers on them.

(39)

Determining task size

Best Practices - Auto scaling and capacity management

Amazon ECS is used to run containerized application workloads of all sizes. This includes both the extremes of minimal testing environments and large production environments operating at a global scale.

With Amazon ECS, like all AWS services, you pay only for what you use. When architected appropriately, you can save costs by having your application consume only the resources that it needs at the time that it needs them. This best practices guide shows how to run your Amazon ECS workloads in a way that meets your service-level objectives while still operating in a cost-eﬀective manner.

Topics

• Determining task size (p. 34)

• Conﬁguring service auto scaling (p. 35)

• Capacity and availability (p. 38)

• Cluster capacity (p. 41)

• Choosing Fargate task sizes (p. 42)

• Choosing the Amazon EC2 instance type (p. 42)

• Using Amazon EC2 Spot and FARGATE_SPOT (p. 43)

Determining task size

One of the most important choices to make when deploying containers on Amazon ECS is your container and task sizes. Your container and task sizes are both essential for scaling and capacity planning. In Amazon ECS, there are two resource metrics used for capacity: CPU and memory. CPU is measured in units of 1/1024 of a full vCPU (where 1024 units is equal to 1 whole vCPU). Memory is measured in megabytes. In your task deﬁnition, you can declare resource reservations and limits.

When you declare a reservation, you're declaring the minimum amount of resources that a task requires.

Your task receives at least the amount of resources requested. Your application might be able to use more CPU or memory than the reservation that you declare. However, this is subject to any limits that you also declared. Using more than the reservation amount is known as bursting. In Amazon ECS, reservations are guaranteed. For example, if you use Amazon EC2 instances to provide capacity, Amazon ECS doesn't place a task on an instance where the reservation can't be fulﬁlled.

A limit is the maximum amount of CPU units or memory that your container or task can use. Any attempt to use more CPU more than this limit results in throttling. Any attempt to use more memory results in your container being stopped.

Choosing these values can be challenging. This is because the values that are the most well suited for your application greatly depend on the resource requirements of your application. Load testing your application is the key to successful resource requirement planning and better understanding your application's requirements.

Stateless applications

For stateless applications that scale horizontally, such as an application behind a load balancer, we recommend that you ﬁrst determine the amount of memory that your application consumes when it

Amazon Elastic Container Service

Amazon Elastic Container Service

Best Practices Guide

Amazon Elastic Container Service: Best Practices Guide

Table of Contents

Introduction

Best Practices - Running your application with Amazon ECS

Container image

Make container images complete and static

Maintain fast container launch times by keeping container images as small as possible

Only run a single application process with a container image

Handle SIGTERM within the application

Conﬁgure containerized applications to write logs to stdout and stderr

Version container images using tags

Task deﬁnition

Use each task deﬁnition family for only one business purpose

Match each application version with a task deﬁnition revision within a task deﬁnition family

Use diﬀerent IAM roles for each task deﬁnition family

Amazon ECS service

Use awsvpc network mode and give each service its own security group

Enable Amazon ECS managed tags and tag propagation

Best Practices - Networking

Connecting to the internet

Using a public subnet and internet gateway

Using a private subnet and NAT gateway

Receiving inbound connections from the internet

Application Load Balancer

Network Load Balancer

Amazon API Gateway HTTP API

Choosing a network mode

Host mode

Bridge mode

AWSVPC mode

Increasing task density with ENI Trunking

Preventing IP address exhaustion

Using IPv6 dual stack mode

Connecting to AWS services from inside your VPC

NAT gateway

AWS PrivateLink

Networking between Amazon ECS services in a VPC

Using service discovery

Using an internal load balancer

Using a service mesh

Networking services across AWS accounts and VPCs

Optimizing and troubleshooting

CloudWatch Container Insights

AWS X-Ray

VPC Flow Logs

Network tuning tips

noﬁle ulimit

sysctl net

Best Practices - Auto scaling and capacity management

Determining task size

Stateless applications