AWS - DynamoDB

February 19, 2022

  • Fully managed, highly available with replication across 3 AZs
  • NoSQL database
  • Scales to massive workloads
  • Millions of requests per second
  • Fast and consistent in performance
  • Integrated with IAM for security, authentication and admin
  • Enables event driven programming with DynamoDB streams
  • Low cost autoscaling capabilities

DynamoDB - Primary Keys

  • Option 1: (Partition key only)
    • Must be unique for each item
    • Must be ‘diverse’ so data is distributed
  • Option 2: (Partition key + Sort key)
    • Combination must be unique
    • Data is grouped
    • Sort key == range key

DynamoDB - Provisioned Throughput

  • Must have provisioned read & write capacity units
  • Read Capacity Units (RCU) - throughput for reads
  • Write Capacity Units (WCU) - throughput for writes
  • Option to setup autoscaling to meet demand
  • Throughput can be exceeded temporarily using ‘burst credit’
  • If burst is empty, you will get ProvisionedThroughputException
  • It’s then advised to do an exceptional back-off retry

DynamoDB - Write Capacity Units

  • 1 WCU represents one write per second for an item up to 1KB in size
  • If the items are larger than 1KB more WCU are consumed

Strongly consistent read ve Eventually consistent read

  • Eventually consistent read
    • If we read just after a write, it’s possible we’ll get unexpected response becuase of replication
  • Strongly consistent read
    • If we read just after a write, we will get correct data
  • By default, Dynamo uses eventually consistent reads, but you can set ConsistentRead flag to true for some operations

Read capacity units

  • One read capacity unit represents one strongly consistent read per second, or two eventually consistent reads per second, for an item up to 4KB in size
  • If items are greater than 4KB more RCU are consumed

DynamoDB - Partitions Internal

  • Data is divided into partitions
  • Partition keys go through a hashing algorithm to know which partition they go to
  • To compute the no. of partitions;
    • By capacity: (Total RCU/3000) + (Total WCU/100)
    • By size: (Total size / 10 GB)
    • Total partitions: CEIL(MAX(capacity,size))

DynamoDB - Throttling

  • If we exceed our RCU or WCU, we get ProvisionedThroughputExceededException
  • Reasons:
    • Hot keys - one parition key is being read too many times
    • Hot partitions
    • Very large items: RCU & WCU depends on size of items
  • Solutions:
    • Exponential backoff when exception is encountered =
    • Distribute partition keys as much as possible
    • If RCU issue, we can use DynamoDB Accelerator (DAX)

DynamoDB - Writing Data

  • PutItem: Write data to DynamoDB
    • Consumes RCU
  • UpdateItem: Update data in DynamoDB
  • Conditional writes:
    • Accept a write/update only if conditions are respected otherwise rejected
    • Helps with conncurent access to items
    • No performance impact

DynamoDB - LSI (Local secondary index)

  • Alternative range key for your table, local to the hash key
  • Up to 5 secondary indexes per table
  • The sort key consists of exactly one scaler attribute
  • The attribute you choose must be a scalar String, Number or Binary
  • LSI must be defined at creation time

DynamoDB - GSI (Global secondary index)

  • To speed up queries on non-key attributes, use a GSI
  • GSI = partition key + optional sort key
  • The index is a new ‘table’ and we can project attributes on it
    • The partition key and sort key of the original table are always projected
    • Can specify extra attributes to project (INCLUDE)
    • Can use all attributes from main table (ALL)

Indexes and Throttling

  • GSI
    • If the writes are throttled on the GSI, main table will also be throttled
    • Even if the WCU on the main tables are fine
    • Choose the GSI partition key carefully
  • LSI
    • Uses the WCU and RCU of the main table
    • No special throttling considerations

Deleting data

  • DeleteItem
    • Delete an individual row
    • Ability to perform a conditional delete
  • DeleteTable
    • Delete a whole table and all its items
    • Much quicker deletion than calling DeleteItem on all items

Batching writes

  • BatchWriteItem
    • Up to 25 PutItem and/or DeleteItem in one call
    • Up to 16MB of data written
    • Up to 400KB of data per item
  • Batching allows you to save in latency by reducing the no. of API calls done against DynamoDB
  • Operations are done in parallel for better efficiency
  • It is possible for part of a batch to fail, in which case we have to retry the failed items (using exponential back-off algorithm)

Reading data

  • GetItem
    • Read based on primary key
    • Primary Key - HASH or HASH-RANGE
    • Eventually consistent read by default
    • Option to use strongly consistent reads (more RCU - may take longer)
    • ProjectionExpression can be specified to include only certain attributes
  • BatchGetItem
    • Up to 100 items
    • Up to 16MB of data
    • Items are retrieved in parallel to minimize latency

DynamoDb as a session state cache

  • Common to use DynamoDB to store session state
  • vs. Elasticache
    • Elasticache is in memory, DynamoDb is serverless
  • vs. EFS
    • Must be attached to EC2 instances as a network drive
  • vs. EBS & Instance Store
    • EBS & Instance store can only be used for local caching not shared caching
  • vs. S3
    • S3 is higher latency and not meant for small objects

DynamoDb Write sharding

  • eg. Imagine we have a voting app with two candidates: A & B
  • If we use a partition key of candidate_id we will run into partition issues, as only have two options
  • Solutions: Add a suffix (usually random, sometimes calculated)

DynamoDb Write Types

  • Concurrent writes

  • Conditional writes

  • Atomic writes

  • Batch writes

DynamoDB - Large objects pattern

DynamoDB - Operations

  • Table cleanup
    • Option 1: Scan & Delete -> Very slow, expensive, consumes RCU & WCU
    • Option 2: Drop table & recreate -> fast, cheap,, efficient
  • Copying a table
    • Option 1: Use AWS DataPipeline (uses EMR)
    • Option 2: Create a backup and restore the backup into a new table (can take time)
    • Option 3: Write owmn code -> Scan & Write

DynamoDb - Concurrency

  • Dynamo has a feature called ‘condition update/delete’
  • Means you can ensure an item hasn’t changed before deleting/updating it
  • That make DynamoDB an optimistic locking/concurrency database

DynamoDB - DAX

  • DynamoDb Accelerator
  • Seamless cache for DynamoDB, no application re-write
  • Writes go through DAX
  • Micro second latency for cached reads & queries
  • Solves hot-key partitions (too many reads)
  • 5 mins TTL for cache by default
  • Up to 10 modes in cluster
  • Multi-AZ (3 nodes min. recommended for production)
  • Secure (encryption at rest with KMS, VPC, IAM, CloudTrail)

DynamoDB Streams

  • Changes in the DB can end up on a DynamoDB stream
  • Stream can be read by AWS Lambda & EC2 instances and then we can do:
    • react to changes in real time
    • Analytics
    • Create derivative tables
    • Insert into ElasticSearch
    • Streams are made of shards, automated by AWS
  • Could implement cross region replication using streams
  • Stream has 24 hr of data retention
  • Choose from the information that will be written to the stream whenever data is modified:
    • KEYS_ONLY: Only the key attributes of the modified item
    • NEW_IMAGE: The entire item, after it was modified
    • OLD_IMAGE: The entire item, before it was modified
    • NEW_AND_OLD_IMAGES: Entire new and old images

DynamoDB Steams & Lambda

  • Need to define an event source mapping to read from a stream
  • Need to ensure Lambda has appropriate permissions
  • Lambda function invoked synchonously

DynamoDB - TTL (time-to-live)

  • Automatically delete an item after an expiry date/time
  • Provided at no extra cost, deletions do not use WCU/RCU
  • Background task operated by DynamoDB itself
  • Helps reduce storage & manage table size over time
  • Helps to adhere to regulatory norms
  • Enabled per row (add a TTL column and add a date there)
  • Typically deletes expired items with 48 hrs of expiration
  • Deleted items due to TTL are also deleted in GSL/LSI
  • Streams can help recover expired items

DynamoDb - Transactions

  • Ability to create/update/delete multiple rows in different tables at the same time
  • ‘All or nothing’ operation
  • Write modes: Standard/Transactional
  • Read modes: Eventual consistency, strong consistency, transactional
  • Consume 2x of WCU/RCU
  • APIs: TransactWriteItems, TransactGetItems

Transactional capacity computations

  • 5KB item size
  • Transactional Item Writes/second: 3
    • [5KB/1KB per WCU] x 2 (cost of transactional) x 3(writes per second) = 30 WCU

DynamoDb - Security & Other features

  • Security
    • VPC endpoints available to access DynamoDB without internet
    • Access fully controlled by IAM
    • Encryption at rest using KMS
    • Encryption in transit using SSL/TLS
  • Backup and restore feature available
    • Point-in-time restore like RDS
    • No performance impact
  • Global Tables
    • Multi-region, fully replicated, high performance
  • Amazon DMS can be used to migrate to DynamoDB (from Mongo, Oracle, MySQL etc..)
  • Can launch a local DynamoDB on your computer

DynamoDB - Fine-Grained Access Control

  • Using Web Identity Federation or Cognito Identity Pools, each user gets AWS credentials
  • You can assign an IAM role to these users with a condition to limit their API access to DynamoDB
  • LeadingKeys - limit low-level access for users on the primary key
  • Attributes - limit specific attributes the user can see

