AWS - S3

February 19, 2022

  • Allows storage of objects (files) into buckets
  • Must have globally unique name
  • Naming convention
    • No uppercase
    • No underscores
    • 3-63 characters
    • Not an IP
    • Must start with lowercase or number
  • Max. object size is 5TB, if uploading more than 5GB must use multi-part upload
  • Can version files, enabled at the bucket level
  • Versioning protects against unintended deletes (can restore a version)
  • Easy rollback to previous version

S3 Object Encryption

  • 4 ways to encrypt objects

SSE-S3

  • Encryption using keys handled and managed by AWS
  • Object encrypted server-side
  • AES-256 encryption type
  • Must set header "x-amz-server-side-encryption": "AES-256"

SSE-KMS

  • Encryption using keys handled and managed by AWS
  • KMS advantages: user control and audit trail
  • Encrypted server side
  • Must set header "x-amz-server-side-encryption": "aws:kms"

SSE-C

  • Server side encryption using data keys fully managed by the customer outside of AWS
  • S3 does not store encryption key
  • HTTPS must be used
  • Key must be provided in HTTP headers for every request made

Client-side encryption

  • Client library such as the S3 encryption client
  • Clients must encrypt data before sending
  • Clients must decrypt data when retrieving
  • Customer fully manages keys & encryption cycle

S3 encryption in transit

  • S3 exposes both HTTP & HTTPS endpoints
  • Can use either but HTTPS is recommended
  • HTTPS mandatory for SSE-C
  • Encryption in flight/transit also called SSL/TLS
  • Most clients use HTTPS by default

S3 Object lock

  • Store objects using write-once-read-many (WORM)
  • Prevents objects from being overwritten or deleted for a fixed amount of time or indefinitely
  • Helps meet regulatory requirements
  • Provides two ways to manage retention
    • Retention period: Specifies a fixed period of time during which an object is locked
    • Legal hold - Same as retention period, but has no expiration. In place until removed
  • Only works on versioned buckets
  • A new version of object can still be created
  • Object can gave both retention period and legal hold, one but not the other or none.

S3 Bucket policies

  • JSON based policies

    • Resources: buckets and objects
    • Actions: Set of APIs to ALLOW or DENY
    • Principal: The account or users to apply the policy to
  • Use S3 bucket policy for:

    • Grant public access to bucket
    • Force objects to be encrypted
    • Grant access to another account (cross account)
  • Block public access to buckets and objects granted through new ACLS, any ACLS, new public bucket or access point

  • Block public and cross-account access to buckets & objects through any public bucket or access point policies

  • These settings were created to prevent company data leaks

  • If you know the bucket should never be public leave these settings on. They can be set at the account level

S3 - Moving between classes

  • You can transisition between storage classes
  • Moving objects can be automated using a lifecycle configuration

S3 - Lifecycle rules

  • Transition actions
    • Defines when objects are transitioned to another storage class
  • Expiration actions
    • Configure objects to delete after some time
    • Can be used to delete old versions of files (if versioning enabled)
  • Rules can be created for a certain prefix
  • Rules can be created for object logs

S3 - Performance

  • S3 autoscales to high request rates
  • Your app can achieve at least 3500 PUT/COPY/POST/DELETE and 5500 GET/HEAD requests per second per prefix
  • No limits to no. of prefixes in a bucket
  • If you spread reads across prefixes evenly, you can achieve 22000 requests per second for GET and HEAD requests
  • Multi-part upload:
    • recommended for files > 100 MB
    • must use for files > 5GB
    • can help parallellize uploads
    • file divided into parts and uploaded
  • S3 Transfer Acceleration
    • Increase transfer speed by transferring file to an AWS edge location which will forward the data to the S3 bucket in target region
    • Compatible with multi-part upload

S3 Security

  • User based
    • IAM policies: which API calls should be allowed for a specific user
  • Resource based
    • Bucket policies: bucket wide rules from the S3 console allows cross account
    • Object Access Control List (ACL) - finer grain
    • Bucket Access Control List (ACL) - less common
  • An IAM principal can access object if:
    • the user permissions allow it OR the resource policy allows it AND no explicit DENY
  • Networking
    • Supports VPC endpoints
  • Logging & Audit
    • S3 Access logs can be stored in another bucket
    • API calls logged in CloudTrail

S3 Cors

  • If a client does a corss origin request on our S3 bucket we need to enable the correct CORS headers
  • Can allow for a specific origin or all origins

S3 Consistency Model

  • After a successful write of a new object or an overwrite or delete any subsequent request immediately recieves the latest version of the object (read after write consistency)
  • Any subsequent list request immediately reflects changes (list consistency)

S3 Replication (CRR & SRR)

  • Must enable versioning in source and destination
  • Cross region replication (CRR)
  • Same Region replication (SRR)
  • Buckets can be in different accounts
  • Copying is asynchronous
  • Needs proper IAM permissions
  • CRR uses cases:
    • compliance, lower latency access, replication across accounts
  • SRR use cases:
    • log aggregation, live replication between test and production accounts
  • Only new objects can be replicated
  • For DELETE operations:
    • Can replicate delete markers from source to target
    • Deletions with a version ID are not replicated
  • No chaining of replication
    • eg. If bucket 1 replicates to bucket 2, which repilcates to bucket 3, objects in 1 are not replicated to 3.

S3 Storage classes

  • Standard General purpose
  • Standard Infrequent access
  • One-Zone infrequent access
  • Intelligent tiering
  • Glacier
  • Glacier Deep archive

S3 Standard - General purpose

  • High durability of objects
  • High availability
  • Use cases: Big Data, Analytics, Mobile/gaming, content distribution

S3 Standard - Infrequent Access (IA)

  • For data accessed less frequently, but requires rapid access when needed
  • High durability
  • High availability
  • Use cases: Data store for disaster recovery, backups

S3 One Zone - IA

  • Same as standard IA, but data is stored in single AZ
  • High durability, but data lost when AZ destroyed
  • 99.5% availability
  • Low latency and high throughput performance
  • Supports SSL for data at transit and encryption at rest
  • Low cost compared to standard IA
  • Use cases: Storing secondary backups, storing data you can recreate

Amazon Glacier

  • Low cost storage meant for archiving/backups
  • Data is retained for the longer term (10s of years)
  • Alternative to on-premise magnetic tape storage
  • Avg. annual durability is 99.9%
  • Each item in Glacier is called an archive
  • Archives are stored in “vaults”

Amazon Glacier & Glacier Deep Archive

  • Amazon Glacier - 3 retrieval options
    • Expedited (1-5 mins)
    • Standard (3-5 hrs)
    • Bulk (5 - 12 hrs)
    • Minimum storage duration of 90 days
  • Amazon Glacier Deep Archive - for long term storage - cheaper
    • Standard (12 hrs)
    • Bulk (48 hrs)
    • Min. storage duration of 180 days

S3 Intelligent tiering

  • Same low latency & high throughput performance as standard S3
  • Small monthly monitoring & auto-tiering fee
  • Auto move objects between two access tiers based on changing access patterns
  • High durability of objects
  • Resilient against events that impact an entire AZ
  • Designed for 99.9% availability over a given year

© 2022 JLavs Notes