Fully managed, highly available with replication across 3 AZs
NoSQL database
Scales to massive workloads
Millions of requests per second
Fast and consistent in performance
Integrated with IAM for security, authentication and admin
Enables event driven programming with DynamoDB streams
Low cost autoscaling capabilities
DynamoDB - Primary Keys
Option 1: (Partition key only)
Must be unique for each item
Must be ‘diverse’ so data is distributed
Option 2: (Partition key + Sort key)
Combination must be unique
Data is grouped
Sort key == range key
DynamoDB - Provisioned Throughput
Must have provisioned read & write capacity units
Read Capacity Units (RCU) - throughput for reads
Write Capacity Units (WCU) - throughput for writes
Option to setup autoscaling to meet demand
Throughput can be exceeded temporarily using ‘burst credit’
If burst is empty, you will get ProvisionedThroughputException
It’s then advised to do an exceptional back-off retry
DynamoDB - Write Capacity Units
1 WCU represents one write per second for an item up to 1KB in size
If the items are larger than 1KB more WCU are consumed
Strongly consistent read ve Eventually consistent read
Eventually consistent read
If we read just after a write, it’s possible we’ll get unexpected response becuase of replication
Strongly consistent read
If we read just after a write, we will get correct data
By default, Dynamo uses eventually consistent reads, but you can set ConsistentRead flag to true for some operations
Read capacity units
One read capacity unit represents one strongly consistent read per second, or two eventually consistent reads per second, for an item up to 4KB in size
If items are greater than 4KB more RCU are consumed
DynamoDB - Partitions Internal
Data is divided into partitions
Partition keys go through a hashing algorithm to know which partition they go to
To compute the no. of partitions;
By capacity: (Total RCU/3000) + (Total WCU/100)
By size: (Total size / 10 GB)
Total partitions: CEIL(MAX(capacity,size))
DynamoDB - Throttling
If we exceed our RCU or WCU, we get ProvisionedThroughputExceededException
Reasons:
Hot keys - one parition key is being read too many times
Hot partitions
Very large items: RCU & WCU depends on size of items
Solutions:
Exponential backoff when exception is encountered =
Distribute partition keys as much as possible
If RCU issue, we can use DynamoDB Accelerator (DAX)
DynamoDB - Writing Data
PutItem: Write data to DynamoDB
Consumes RCU
UpdateItem: Update data in DynamoDB
Conditional writes:
Accept a write/update only if conditions are respected otherwise rejected
Helps with conncurent access to items
No performance impact
DynamoDB - LSI (Local secondary index)
Alternative range key for your table, local to the hash key
Up to 5 secondary indexes per table
The sort key consists of exactly one scaler attribute
The attribute you choose must be a scalar String, Number or Binary
LSI must be defined at creation time
DynamoDB - GSI (Global secondary index)
To speed up queries on non-key attributes, use a GSI
GSI = partition key + optional sort key
The index is a new ‘table’ and we can project attributes on it
The partition key and sort key of the original table are always projected
Can specify extra attributes to project (INCLUDE)
Can use all attributes from main table (ALL)
Indexes and Throttling
GSI
If the writes are throttled on the GSI, main table will also be throttled
Even if the WCU on the main tables are fine
Choose the GSI partition key carefully
LSI
Uses the WCU and RCU of the main table
No special throttling considerations
Deleting data
DeleteItem
Delete an individual row
Ability to perform a conditional delete
DeleteTable
Delete a whole table and all its items
Much quicker deletion than calling DeleteItem on all items
Batching writes
BatchWriteItem
Up to 25 PutItem and/or DeleteItem in one call
Up to 16MB of data written
Up to 400KB of data per item
Batching allows you to save in latency by reducing the no. of API calls done against DynamoDB
Operations are done in parallel for better efficiency
It is possible for part of a batch to fail, in which case we have to retry the failed items (using exponential back-off algorithm)
Reading data
GetItem
Read based on primary key
Primary Key - HASH or HASH-RANGE
Eventually consistent read by default
Option to use strongly consistent reads (more RCU - may take longer)
ProjectionExpression can be specified to include only certain attributes
BatchGetItem
Up to 100 items
Up to 16MB of data
Items are retrieved in parallel to minimize latency
DynamoDb as a session state cache
Common to use DynamoDB to store session state
vs. Elasticache
Elasticache is in memory, DynamoDb is serverless
vs. EFS
Must be attached to EC2 instances as a network drive
vs. EBS & Instance Store
EBS & Instance store can only be used for local caching not shared caching
vs. S3
S3 is higher latency and not meant for small objects
DynamoDb Write sharding
eg. Imagine we have a voting app with two candidates: A & B
If we use a partition key of candidate_id we will run into partition issues, as only have two options
Solutions: Add a suffix (usually random, sometimes calculated)