Amazon S3

S3 (Simple Storage Service) provides secure, durable, highly-scalable object storage in the cloud. Object storage (also known as object-based storage) saves flat files. S3 is not block-based storage and thus doesn't allow you to run an operating system or a database. You can have 100 buckets per account by default.

See the FAQ for more information.

Properties

  • S3 is a universal namespace, that is, names must be unique globally.
  • S3 creates a DNS: https://s3-<region>.amazonaws.com/<bucket>.
  • When you upload a file to S3 you will receive an HTTP 200 code if the upload was successful.

Objects

S3 provides simple key, value store. Files can be from 0 bytes to 5 TB. S3 gives unlimited storage.

Objects consist of the following:

  • Key (name)
  • Value (data: a sequence of bytes).
  • Version ID
  • Metadata (data about the data. e.g. date)
  • Subresources (bucket-specific configuration)
    • Bucket Policies, Access Control Lists (ACL).
    • Cross Origin Resource Sharing (CORS)
    • Transfer Acceleration

You can load files to S3 much faster by enabling multipart upload.

Buckets

Objects are stored in Buckets. A Bucket is similar to a folder.

Buckets reside in a certain AWS region.

Data Consistency Model

  • Read after write consistency for PUTS of new objects.
  • Eventual consistency for overwrite PUTS and DELETE (can take some time to propagate).

Updates to S3 are atomic. You will either get the old file or the new file. You won't get partial or corrupted data.

Tolerance

Data is spread across multiple devices and facilities; S3 is designed to withstand failure.

  • Built for 99.99% availability for the S3 platform.
  • Amazon guarantee 99.9% availability (note 3x9).
  • Amazon guarantees 99.999999999% durability for S3 information (note 11x9). This is the eleven 9s durability agreement.

Security

  • By default, all newly created buckets are PRIVATE.
  • You can set up access control to your buckets using;
    • Bucket Policies
      • Applied at the bucket level.
    • Access Control Lists (ACL)
      • Applied at an object level.
  • S3 buckets can be configured to create access logs which log all requests made to the S3 bucket. This logs can be setup to be saved to another bucket. Of course, logging is important for security.

Encryption

  • In Transit;
    • SSL/TLS (HTTPS)
  • At Rest;
    • Server Side Encryption
      1. S3 Managed Keys (SSE-S3)
        • Most common method. Uses AES-256.
      2. AWS Key Management Service, Managed Keys (SSE-KMS)
        • More expensive method. Provides an extra layer of protection using an envelope key (which has separate permissions). This key protects your encryption key. In addition, SSE-KMS provides an audit trail of when keys were used and by whom. Lastly, you can also manage your keys yourself.
      3. Customer Provided Keys (SSE-C)
        • You manage the keys but Amazon manages the encryption.
  • Client Side Encryption
    • Encrypt on your computer and then upload.

Enforcing Encryption

Every time a file is uploaded to S3, a PUT request is initiated. If the file is to be encrypted at upload time, the x-amz-server-side-encryption parameter will be included in the request header. You can specify either AWS256 (for SSE-S3) or ams:kms (for SSE-KMS). You can enforce the use of server-side encryption by using a bucket policy which denies any S3 PUT request which doesn't include the x-amz-server-side-encryption parameter in the request header.

Storage Tiers/Classes

  • S3
    • Durable, immediately available, frequently accessed.
    • 99.99% availability, 99.99999999999% durability, stored redundantly across multiple devices in multiple facilities and is designed to sustain the loss of 2 facilities concurrently.
  • S3 - IA (Infrequently Accessed)
    • Durable, immediately available, infrequently accessed.
    • Lower fee than S3, but you are charged a retrieval fee.
  • S3 - One Zone IA
    • Similar to IA, but data is stored in a single Availability Zone. Durability is the same, but we have 99.5% availability.
    • Cost is 20% less than regular S3 - IA.
  • RRS (Reduced Redundancy Storage)
    • Data that is easily reproducible, such as thumnails etc.
    • Designed to provide 99.99% durability and 99.99% availability of objects over a given year. We can only lose 1 facility.
  • Glacier
    • Very cheap, but used for archival only. It takes 3-5 hours to restore from Glacier. First byte latency is minutes or hours instead of milliseconds like S3 services.
    • Costs as little as $0.01 per gigabyte per month.
  S3 S3 IA S3 One Zone IA RRS Glacier
Durability 99.999999999% 99.999999999% 99.999999999% 99.99% 99.999999999%
Availability 99.99% 99.9% 99.5% 99.99% N/A
Availability SLA 99.9% 99% 99% 99.9% N/A
Concurrent facility fault tolerance 2 2 1 1 2
First byte latency Milliseconds Milliseconds Milliseconds Milliseconds Minutes or hours

Charges

In S3, costs can come from

  • Storage per GB.
  • Requests (Get, Put, Copy, etc)
  • Storage Management Pricing (cost tracking)
  • Data Management Pricing (mostly just transferring out of S3)
  • Transfer Acceleration

Transfer Acceleration

Amazon S3 Transfer Acceleration enables fast, easy, and secure transfers of files over long distances between your end users and an S3 bucket. Transfer Acceleration takes advantage of Amazon CloudFront's globally distributes edge locations. As the data arrives at an edge location, data is routed to Amazon S3 over an optimized network path. Note that Transfer Acceleration does not provide caching, it's only for data transfer.

A distinct URL is used for transfer acceleration: <bucket>.s3-accelerate.amazonaws.com.

You can enable Transfer Acceleration by clicking on a bucket and selecting Properties -> Transfer acceleration.

Tags

You can tag buckets and objects. Individual objects don't inherit the tag from the bucket.

Versioning

"Versioning is a means of keeping multiple variants of an object in the same bucket" (https://docs.aws.amazon.com/AmazonS3/latest/dev/Versioning.html). It's a great backup tool.

Every time a file changes there will be a copy of both the original and the change. Of course, versioning can increase S3 costs. Even deleting is a version (a delete marker is made)!

Once enabled, versioning cannot be disabled. Versioning can only be suspended after turned on.

Versioning integrates with lifecycle rules.

MFA Delete helps stop people from accidentally deleting a version. This multi-factor authentication provides an extra layer of security.

Cross Region Replication

Cross-region replication enables automatic and asynchronous copying of objects across buckets in different AWS regions. Enable in the S3 console by clicking on a bucket then Management then Replication. Note that cross-region replication requires versioning enabled on both the source and destination buckets. Regions must be unique.

Note that only new objects will be replicated. Files in an existing bucket are not replicated automatically.

Currently, cross region replication cannot replicate to multiple buckets or use daisy chaining.

Version deletes are not replicated.

Lifecycle Management

Click a Bucket then select Management->/Lifecycle/. Here you can define lifecycle rules to automatically manage S3 objects. The following actions can be done:

  • Transition to Standard - Infrequent Access Storage Class (after a minimum of 30 days. Data must also be at least 128KB).
  • Archive to the Glacier Storage Class (after any number of days or a minimum of 30 days after transition to Standard-IA).
  • Permanently delete.

Lifecycle management can be used in conjunction with vertioning. It can be applied to current versions and previous versions. Can also be used without versioning.

Static Website Hosting

S3 allows static website hosting. These websites may contain client-side scripts. Dynamic, service-side processing is not supported.

The URL for static website hosting looks like: <bucket>.s3-website-<region>.amazonaws.com. For example http://elliotp.s3-website-us-east-1.amazonaws.com.

S3 static website hosting scales automatically and is very cheap.

Cross-Origin Resource Sharing (CORS)

Cross-origin resource sharing (CORS) defines a way for client web applications that are loaded in one domain to interact with resources in a different domain. By default resources in one bucket cannot access resources located in another. You can configure your bucket to allow cross-origin requests by creating a CORS configuration.

Performance Optimization

If your S3 buckets are routinely receiving >100 PUT/LIST/DELETE or >300 GET requests per second, then there are some best practice guidelines that will help optimize S3 performance.

Optimizing GET-Intensive Workloads

Use CloudFront. CloudFront will cache your most frequently accessed objects and will reduce latency for your GET requests.

Optimizing Mixed Request Type Workloads

S3 is lexographical. Objects are stored in alphabetical order; key name determines which partition will store the object.

The key names you use for your objects can impact performance for intensive workloads. The use of sequential key names e.g. names prefixed with a time stamp or alphabetical sequence increases the likelihood of having multiple objects stored on the same partition. For heavy workloads, this can cause I/O issues and contention.

Avoid sequential key names for you S3 objects! Add a random prefix (like a hex hash) to the key name to prevent multiple objects from being stored in the same partition.