3.1 S3 Introduction
S3 provides developers with secure, highly-durable, highly-scalable, highly-availability object storage.
S3 Basics:
It is a safe place to store your files.
It is object based storage. Objects are videos, photos, documents and so on. S3 cannot be used to run operating systems or databases. You need a block based storage for that.
It is a kind of durable key-value store. Key is the name you assign to an object and the value is the content of your object.
The data is spread across multiple devices and facilities, so it is design to withstand failure.
Size of one file can be 0 bytes to 5TB.
Unlimited storage.
Files are stored in Buckets (folders).
Objects are files, Buckets are folders.
S3 is a universal namespace, that means the names of the buckets must be unique globally. Because when you create a bucket, you actually create a DNS address.
If you upload a file to S3, you will receive a HTTP 200 code if the upload was successful.
Tiered Storage Available. (Different types of storage).
Lifecycle Management. (Manage your data based on their age).
Versioning.
Encryption.
Secure your data using Accessing Control Lists and Bucket Policies.
Data Consistency Model for S3:
Read after Write consistency for PUTS (create) of new Objects. It means you can see your new objects immediately after you create it.
Eventual Consistency for overwrite PUTS (update) and DELETES (can take some time to propagate). It means if you update or delete your object, you cannot read new data immediately. The delay is usually a few milliseconds.
Updating to S3 is atomic. Either you get the new data, or you get the old data.
S3 is a simple key-value store:
S3 is object based. Objects consist of the following:
Value (This is simply the data and is made up of a sequence of bytes).
Version ID (Important for versioning)
Metadata (Data about the data you are storing)
Subresources (It exist underneath an object). They consists Access Control Lists (who can access this object, allows you to do fine-grained permissions in the bucket or individual file object level), and Torrent (S3 supports Bit Torrent Protocol).
S3 - Storage Tiers / Classes
S3 - 99.99% availability, 11*9% durability, stored redundantly across multiple devices in multiple facilities and is designed to sustain the loss of 2 facilities concurrently (out of 3 facilities replicas).
S3 - IA (Infrequently Accessed). For data that is accessed less frequently but requires rapid access when needed. Lower fee than S3, but you are charged a retrieval fee.
RRS (Reduced Redundancy Storage) - Designed to provide 99.99% durability and 99.99% availability of objects over a given year. Cheaper than standard S3. If the data is easily reproducible when the data is lost or corrupted, for example thumbnails, using RRS is a good choice.
Amazon Glacier
It is a extremely low cost storage service for data archival. $0.01 per GB per month. Retrieval time is 3 - 5 hours.
Data is stored in Amazon Glacier in "archives." An archive can be comprised of any data such as photos, videos, or documents. You can upload a single file as an archive or aggregate multiple files into a TAR or ZIP file and upload as one archive. Amazon Glacier uses "vaults" as containers to store archives. Under a single AWS account, you can have up to 1000 vaults.
A single archive can be as large as 40 terabytes. You can store an unlimited number of archives and an unlimited amount of data in Amazon Glacier. Each archive is assigned a unique archive ID at the time of creation, and the content of the archive is immutable, meaning that after an archive is created it cannot be updated.
S3 - Charges
Charge for: storage size, number of requests, storage management pricing, data transfer pricing, transfer acceleration.
Transfer Acceleration: it enable fast, easy, and secure transfer of files over long distances between your end users and an S3 Bucket. Transfer Acceleration takes advantages of Amazon CloudFront's globally distributed edge locations. As the data arrives at an edge location, data is routed to Amazon S3 over an optimized network path.
"Strong Consistency" vs. "Read-after-write Consistency"
Q: Why does AWS say "strong consistency" in DynamoDB and "read-after-write consistency" for S3? From what I've learned in this course and read in the FAQs, they seem to mean more or less the same thing.
A: Yes, you can tell they are the same thing. DynamoDB: "A strongly consistent read returns a result that reflects all writes that received a successful response prior to the read." S3: "Amazon S3 buckets in all Regions provide read-after-write consistency for PUTS of new objects and eventual consistency for overwrite PUTS and DELETES." There are a couple of general differences, in that there are consistency options for DynamoDB, where as S3 is what it is. Also, as detailed above S3 consistency is dependent on the object being new or overwritten.
"Availability" vs. "Durability"
Data availability: it is a term used to describe products and services that ensure that data continues to be available at a required level of performance in situations ranging from normal through "disastrous."
Data durability: this is the possibility of loss your data. For example, Amazon S3 is designed to provide 99.999999999% durability of objects over a given year. This durability level corresponds to an average annual expected loss of 0.000000001% of objects.
Last updated
Was this helpful?