3.14 Storage Summary
Last updated
Was this helpful?
Last updated
Was this helpful?
Remember that S3 is Object based, i.e. allows you to upload files.
Files can be from 0 Bytes to 5TB
There is unlimited storage
Files are stored in Buckets
S3 is universal namespace, that is, names must be unique globally.
Bucket name format:
Read after Write consistency for PUTS of new Objects
Eventually consistency for PUTS of overwriting and DELETES (can take some time to propagate)
S3 storage classes/tiers
S3 (durable, immediately available, frequently accessed)
S3 - IA (durable, immediately available, infrequently accessed)
S3 - RRS (less durability, data that is easily reproducible, such as thumb nails etc.)
Core fundamentals of S3:
Key (name)
Value (data)
Version ID (for versioning)
Metadata
Access control lists
Object based storage only (for files), not suitable to install an OS on (use EBS to do that)
Versioning
Store all versions of an object (including all writes and even if you delete an object).
You need to pay for every version of your object.
Great backup tool (deleting mark).
Once enabled, versioning cannot be disabled, only suspended.
Integrates with Lifecycle rules.
Versioning's MFA delete capability, which uses multi-factor authentication, can be used to provide an additional layer of security.
Cross Region Replication, requires versioning enabled on the source bucket and the destination bucket.
Lifecycle management
can be used in conjunction with versioning / independently
can be applied to current versions and previous versions
Following actions can now be done:
Transition to the Standard - Infrequent Access Storage Class (128KB and 30 days after the creation date)
Archive to the Glacier Storage Class (30 days after IA, if relevant)
Permanently delete
Securing your buckets
By default, all newly created buckets are private.
You can setup access control to your buckets using:
Bucket policies (bucket level)
Access control lists (object level)
Server Access Logging. To track requests for access to your bucket, you can enable access logging. S3 buckets can be configured to create access logs which log all requests made to the S3 bucket. This can be done to another bucket, even to the bucket in other AWS account.
Encryption:
In transit:
SSL/TLS
At rest:
Server-side encryption:
S3 Managed Keys - SSE-S3
AWS Key Management Service, Managed Keys - SSE-KMS
Server-side encryption with customer provided keys - SSE-C
Client-side encryption: encrypt your object then upload it to AWS S3.
S3 Transfer Acceleration.
You can speed up transfers to S3 using S3 transfer acceleration. This costs extra, and has the greatest impact on people who are in far away location.
S3 static website
You can use S3 to host static websites
It is Serverless
Very cheap, scales automatically
Static only, cannot host dynamic sites
Your URL (Endpoint) of your Static Website is "http://[bucketname].s3-website-[Region].amazonaws.com".
Archiving data, where you can wait 3 - 5 hours before accessing.
A single "archive" can be as large as 40 TB.
You can store an unlimited number of archives and an unlimited amount of data in Amazon Glacier.
Each archive is assigned a unique archive ID at the time of creation.
The content of the archive is immutable, meaning that after an archive is created it cannot be updated.
Edge Location - this is the location where content will be cached. This is separate to an AWS Region/AZ
Origin - this is the origin of all the files that CDN will distribute. This can be either an S3 bucket, an EC2 instance, an ELB, Route53 or other server.
Distribution - this is the name given the CDN which consists of a collection of Edge Locations. For your Distribution, you can have multiple Origins.
Web Distribution - typically used for websites
RTMP - used for media streaming
Edge locations are not just read only, you can write to them too (put an object on to them), and then the writing will be updated to the origins.
Objects are cached for the life of the TTL (default is 24 hours, you can change it when you configure Distributions).
You can clear cached objects from Edge locations, but you will be charged.
File Gateway: for flat files, stored directly on S3.
Volume Gateway:
Stored volumes - entire dataset is stored on site and is asynchronously backup to S3.
Cached volumes - entire dataset is stored on S3 and the most frequently accessed data is cached on site.
Gateway virtual tape library (VTL)
Used for backup and uses popular backup applications like NetBackup, Backup Exec, Veeam etc.
Snowball
Snowball Edge
Snowball Mobile
Snowball can:
Import to S3
Export from S3
Write to S3 - HTTP 200 code for a successful write. When an object is placed in S3, it is done via HTTP via a POST or PUT object request. When a writing success occurs, you will get a 200 HTTP response. But since a 200 response can also contain error information, a check of the MD5 checksum confirms on whether the request was a success or not.
You can upload files to S3 much faster by enabling multi-part upload. For the object that larger than maximum object size 5TB, you need to use Snowball or AWS Import/Export.
You can use Multi-Object Delete to delete large numbers of objects from S3. This feature allows you to send multiple object keys in a single request to speed up your deletion. Amazon does not charge you for using Multi-Object Delete.
Read S3 FAQ before taking exam. It comes up A LOT!