AWS Best Practices

This page highlights the principles to consider when migrating existing applications to AWS or designing new applications for the cloud. Information in this page is taken from the Architecting for the Cloud whitepaper.

The Cloud Computing Difference

IT assets become programmable resource
- You don't need to provision capacity based on a guess of a theoretical maximum peak. You access as much or as little as you need and dynamically scale.
Global
- Choose any AWS Region.
Available
- Spread across multiple data centers with AWS Availability Zones.
- Reduce latency with CloudFront
Unlimited capacity
Higher level managed services
Security build in

Design Principles

Scalability

A scalable architecture can support growth in users, traffic, or data size with no drop in performance. There are two ways to scale an IT architecture: vertically and horizontally. Scaling vertically takes place through an increase in the specifications of an individual resource. Scaling horizontally takes place through an increase in the number of resources.

Applications and components can be stateless and stateful. A stateless application can scale horizontally since any request can be serviced by any of the available compute resources. Some applications need to maintain some kind of state information. You can still make a portion of these architectures stateless by storing this information in a shared database.

Distributed processing can help scalability. By dividing a task and its data into many small fragments of work, you can execute each of them in any of a larger set of available compute resources.

Disposable Resources Instead of Fixed Servers

In a traditional infrastructure environment, you have to work with fixed resources due to the upfront cost and lead time of introducing new hardware. When designing with AWS you have the opportunity to reset that mindset so that you take advantage of the dynamically provisioned nature of cloud computing. Think of servers and other components as temporary resource.

Make setting up new resources an automated and repeatable process. Bootstrap instances with scripts that have installation steps and configuration settings. You can also start from a Golden Image that represents a snapshot a a particular state of a resource. AWS AMIs are an example. Docker presents another tool for this purpose. You may also do a hybrid of bootstrapping and golden images.

Lastly, you should treat infrastructure as code to make your infrastructure reusable, maintainable, extensible, and testable. AWS CloudFormation templates give developers an easy way to create and manage a collection of related AWS resource, and provision and update them in an orderly and predictable fashion.

Automation

Some AWS tools for this purpose:

AWS Elastic Beanstalk
Amazon EC2 Auto recovery
Auto Scaling that automatically starts and stops EC2 instances.
Amazon CloudWatch Alarms and Events.
AWS OpWorks Lifecycle events
AWS Lambda Scheduled events.

Loose Coupling

As application complexity increases, a desirable attribute of an IT system is that it can be broken into smaller, loosely coupled components. This means that IT systems should be designed in a way that reduces interdependencies - a change or a failure in one component should not cascade to other components.

Key elements:

Well-defined interfaces
- e.g. RESTful APIs
Service discovery
- For example, don't hardcode the IP address of the compute resource. Use DNS with Route53 or an ELB.
Asynchronous integration
- This model is suitable for any interaction that does not need an immediate response and where an acknowledgment that a request has been registered will suffice. It involves one component that generates events and another that consumes them. The two components usually integrate through an intermediate durable storage layer (e.g. an SQS queue). This decouples and adds additional resiliency.
Graceful failure

Services, Not Servers

Use managed services to increase productivity and operational efficiency. Services can provide the building blocks so developers move faster. Serverless architectures can also reduce the operational complexity.

Layers of abstraction baby.

Databases

On AWS, users can have a polyglot data layer and choose the right technology for each workload.

Relational databases offer powerful querying functionality and strong integrity. Relational databases scale vertically. You can also scale reads horizontally with read replicas.

NoSQL databases trade some of the query and transaction capabilities of relational databases for a more flexible data model that seamlessly scales horizontally.

Data warehouses are a specialized type of relational database, optimized for analysis and reporting of large amounts of data.

Lastly, some applications that require sophisticated search functionality will typically outgrow the capabilities or relational or NoSQL databases. A search service can be used to index and search both structured and free text format and can support functionality that is not available in other databases, such as customizable result ranking, faceting for filtering, synonyms, stemming, etc.

Removing Single Points of Failure

Single points of failure can be removed by introducing redundancy, which is having multiple resources for the same task. Redundancy can be implemented in either standby or active mode. In standby redundancy when a resource fails, functionality is recovered on a secondary resource using a process called failover. In active redundancy, requests are distributed to multiple redundant compute resources, and when one of them fails, the rest can simply absorb a larger share of the workload.

Detect failure! You should aim to build as much automation as possible in both detecting and reacting to failure. You can use services like ELB and Amazon Route53 to configure health checks and mask failure by routing traffic to healthy endpoints. In addition, Auto Scaling can be configured to automatically replace unhealthy nodes

Data replication is the technique that introduces redundant copies of data. It can help horizontally scale read capacity, but it also increase data durability and availability. Replication might be synchronous or asynchronous.