AWS DynamoDB - Complete Deep Dive

What is DynamoDB?

Amazon DynamoDB is a fully managed, serverless NoSQL database service that enables:

  • Automatic scaling: Capacity adjusts to traffic automatically
  • High performance: Single-digit millisecond latency
  • Fully managed: No servers, patches, or backups to manage
  • Multi-region: Built-in global tables and replication
  • Pay-per-request or provisioned: Flexible pricing models
  • ACID transactions: Single and multi-item transactions

Core Characteristics

AspectBenefit
ServerlessNo infrastructure to manage
Auto-scalingCapacity adjusts to demand
High performance< 10ms latency at any scale
Managed backupBuilt-in backup and point-in-time recovery
Global tablesMulti-region replication with millisecond latency
ACID transactionsSingle & multi-item transactional support
SecurityEncryption, IAM, VPC integration

DynamoDB Architecture Overview

Click to view code
AWS Region (US-East-1)
  ├─ DynamoDB Partition 0
  │  └─ Replica 1, Replica 2, Replica 3 (3-way replication)
  ├─ DynamoDB Partition 1
  │  └─ Replica 1, Replica 2, Replica 3
  └─ DynamoDB Partition N
     └─ Replica 1, Replica 2, Replica 3
       ↓
  Global Tables (Multi-region)
  ├─ Region 2 (EU-West-1)
  └─ Region 3 (AP-Southeast-1)

Key concepts:

  • Table: Collection of items (like a database table)
  • Item: Single record (like a row in SQL)
  • Attribute: Field in an item (like a column)
  • Partition: Logical unit containing items (distributed across nodes)
  • Partition Key: Determines which partition stores item
  • Replica: Copy of partition (for HA and read scaling)

Core Components

1. Tables and Items

Table: Collection of items with defined schema (partially)

Click to view code (python)
import boto3

dynamodb = boto3.resource('dynamodb', region_name='us-east-1')

# Create table
table = dynamodb.create_table(
    TableName='users',
    KeySchema=[
        {'AttributeName': 'user_id', 'KeyType': 'HASH'},      # Partition key
        {'AttributeName': 'created_at', 'KeyType': 'RANGE'}   # Sort key
    ],
    AttributeDefinitions=[
        {'AttributeName': 'user_id', 'AttributeType': 'S'},   # String
        {'AttributeName': 'created_at', 'AttributeType': 'S'} # String
    ],
    BillingMode='PAY_PER_REQUEST'  # Auto-scaling
)

Item: Single record in table

Click to view code (python)
# Put item
table.put_item(
    Item={
        'user_id': 'user123',
        'created_at': '2024-01-05T10:00:00Z',
        'email': 'user@example.com',
        'name': 'John Doe',
        'profile': {
            'age': 30,
            'location': 'NYC'
        },
        'tags': ['vip', 'verified']
    }
)

# Get item
response = table.get_item(
    Key={
        'user_id': 'user123',
        'created_at': '2024-01-05T10:00:00Z'
    }
)

2. Primary Keys

Partition Key (HASH): Determines which partition stores item

Click to view code
Partition key: user_id
- Values: user1, user2, user3, ...
- Hash function: hash(user_id) % num_partitions
- All items with same user_id in same partition
- Enables equality queries fast

Sort Key (RANGE): Enables range queries within partition

Click to view code (cql)
KeySchema:
  HASH: user_id
  RANGE: created_at

Query patterns:
  - user_id = 'user1' AND created_at = '2024-01-05'  (exact)
  - user_id = 'user1' AND created_at > '2024-01-01'  (range)
  - user_id = 'user1' AND created_at BETWEEN ... AND ...

Attribute types:

Click to view code (python)
# String (S)
'user_id': 'user123'

# Number (N)
'age': 30

# Binary (B)
'image': b'binary_data'

# Boolean (BOOL)
'is_active': True

# Null (NULL)
'phone': None

# Map (M)
'profile': {'age': 30, 'location': 'NYC'}

# List (L)
'tags': ['vip', 'verified']

# String Set (SS)
'colors': {'red', 'blue', 'green'}

# Number Set (NS)
'scores': {100, 200, 300}

3. Secondary Indexes

Global Secondary Index (GSI): Query on different attributes

Click to view code (python)
# Create GSI on email
table = dynamodb.create_table(
    TableName='users',
    KeySchema=[
        {'AttributeName': 'user_id', 'KeyType': 'HASH'},
    ],
    GlobalSecondaryIndexes=[
        {
            'IndexName': 'email-index',
            'KeySchema': [
                {'AttributeName': 'email', 'KeyType': 'HASH'}
            ],
            'Projection': {'ProjectionType': 'ALL'},
            'BillingMode': 'PAY_PER_REQUEST'
        }
    ],
    AttributeDefinitions=[
        {'AttributeName': 'user_id', 'AttributeType': 'S'},
        {'AttributeName': 'email', 'AttributeType': 'S'}
    ],
    BillingMode='PAY_PER_REQUEST'
)

# Query by email
response = table.query(
    IndexName='email-index',
    KeyConditionExpression='email = :email',
    ExpressionAttributeValues={
        ':email': 'user@example.com'
    }
)

Local Secondary Index (LSI): Range key on same partition key

Click to view code (python)
# Create LSI with different sort key
table = dynamodb.create_table(
    TableName='user_activity',
    KeySchema=[
        {'AttributeName': 'user_id', 'KeyType': 'HASH'},
        {'AttributeName': 'timestamp', 'KeyType': 'RANGE'}
    ],
    LocalSecondaryIndexes=[
        {
            'IndexName': 'activity-type-index',
            'KeySchema': [
                {'AttributeName': 'user_id', 'KeyType': 'HASH'},
                {'AttributeName': 'activity_type', 'KeyType': 'RANGE'}
            ],
            'Projection': {'ProjectionType': 'ALL'}
        }
    ]
)

# Query activities of type 'login'
response = table.query(
    IndexName='activity-type-index',
    KeyConditionExpression='user_id = :id AND activity_type = :type',
    ExpressionAttributeValues={
        ':id': 'user123',
        ':type': 'login'
    }
)

GSI vs LSI:

AspectGSILSI
Partition KeyCan be differentMust be same
Size Limit10GB per partition10GB total per partition key
ThroughputSeparate from tableShared with table
ConsistencyEventually consistentStrongly consistent option
Sparse IndexRecommendedNot efficient

4. Write Operations

Put Item (insert or replace):

Click to view code (python)
table.put_item(
    Item={'user_id': 'user123', 'email': 'new@example.com'},
    ConditionExpression='attribute_not_exists(user_id)'  # Only if not exists
)

Update Item (modify attributes):

Click to view code (python)
table.update_item(
    Key={'user_id': 'user123'},
    UpdateExpression='SET #email = :email, updated_at = :now',
    ExpressionAttributeNames={'#email': 'email'},
    ExpressionAttributeValues={
        ':email': 'newemail@example.com',
        ':now': '2024-01-05T11:00:00Z'
    },
    ReturnValues='ALL_NEW'
)

Delete Item:

Click to view code (python)
table.delete_item(
    Key={'user_id': 'user123'},
    ConditionExpression='attribute_exists(user_id)'
)

Batch Write (high throughput):

Click to view code (python)
with table.batch_writer(batch_size=25) as batch:
    for i in range(1000):
        batch.put_item(
            Item={'user_id': f'user{i}', 'email': f'user{i}@example.com'}
        )

5. Read Operations

Get Item (single item, fast):

Click to view code (python)
response = table.get_item(
    Key={'user_id': 'user123'},
    ConsistentRead=True  # Strong consistency
)
item = response.get('Item')

Query (partition key + optional sort key):

Click to view code (python)
response = table.query(
    KeyConditionExpression='user_id = :id AND created_at > :date',
    ExpressionAttributeValues={
        ':id': 'user123',
        ':date': '2024-01-01'
    },
    Limit=10
)

Scan (full table scan, slow):

Click to view code (python)
response = table.scan(
    FilterExpression='email = :email',
    ExpressionAttributeValues={
        ':email': 'user@example.com'
    }
)
# Avoid in production (scans entire table)

Batch Get (get multiple items):

Click to view code (python)
response = dynamodb.batch_get_item(
    RequestItems={
        'users': {
            'Keys': [
                {'user_id': 'user1'},
                {'user_id': 'user2'},
                {'user_id': 'user3'}
            ]
        }
    }
)

Throughput and Capacity Planning

Billing Modes

Provisioned Capacity (pay for reserved capacity):

Click to view code (python)
table = dynamodb.create_table(
    TableName='users',
    BillingMode='PROVISIONED',
    ProvisionedThroughput={
        'ReadCapacityUnits': 100,    # 100 strong consistent reads/sec
        'WriteCapacityUnits': 50     # 50 writes/sec
    }
)

# Cost calculation:
# Read: 100 RCU × $0.00013 per RCU-hour = $1.30/hour
# Write: 50 WCU × $0.00065 per WCU-hour = $3.25/hour
# Total: ~$4.55/hour = ~$3,276/month

On-Demand Capacity (pay per request):

Click to view code (python)
table = dynamodb.create_table(
    TableName='users',
    BillingMode='PAY_PER_REQUEST'
)

# Cost calculation:
# Read: $0.25 per 1M requests
# Write: $1.25 per 1M requests

# Example: 100M reads + 10M writes/month
# Cost: (100M × $0.25) + (10M × $1.25) / 1M = $25 + $12.50 = $37.50

When to use:

ModeBest For
ProvisionedPredictable traffic, cost-conscious
On-DemandUnpredictable spikes, rapid scaling

Capacity Units

Read Capacity Unit (RCU):

Click to view code
1 RCU = 1 strong consistent read/sec (4KB item)
      = 2 eventually consistent reads/sec (4KB item)

Example:
  Item size: 4KB
  Strong consistent: 100 reads/sec → 100 RCU
  Eventually consistent: 100 reads/sec → 50 RCU
  
  Item size: 12KB (>4KB)
  RCU needed: ceil(12KB / 4KB) = 3 RCU per read
  100 reads/sec → 300 RCU

Write Capacity Unit (WCU):

Click to view code
1 WCU = 1 write/sec (1KB item)

Example:
  Item size: 1KB
  100 writes/sec → 100 WCU
  
  Item size: 5KB (>1KB)
  WCU needed: ceil(5KB / 1KB) = 5 WCU per write
  100 writes/sec → 500 WCU

Capacity Estimation

Click to view code
Requirement: 10K reads/sec, 1K writes/sec, 5KB items

Read capacity:
  Strong consistent: ceil(5KB / 4KB) × 10K = 2 × 10K = 20K RCU
  Eventually consistent: 20K ÷ 2 = 10K RCU
  
Write capacity:
  ceil(5KB / 1KB) × 1K = 5 × 1K = 5K WCU
  
Total: 20K RCU + 5K WCU (provisioned mode)
  Cost: (20K × $0.00013) + (5K × $0.00065) ≈ $5.63/hour

Consistency Models

Strong Consistency vs Eventually Consistent

Click to view code (python)
# Eventually consistent (default, faster)
response = table.get_item(
    Key={'user_id': 'user123'},
    ConsistentRead=False  # Default
)
# Reads from any replica (might be stale)

# Strongly consistent (slower, latest data)
response = table.get_item(
    Key={'user_id': 'user123'},
    ConsistentRead=True
)
# Reads from primary replica only

Consistency model:

Click to view code
Write to DynamoDB:
  1. Write to primary partition
  2. Asynchronously replicate to 2 other replicas
  3. Return success to client

Immediately after write:
  Strong consistent read: See new value (from primary)
  Eventually consistent read: Might see old value (if reading replica)
  
After ~1ms:
  Both reads see new value (replication caught up)

Transactions

Single Item Transactions (update with conditions):

Click to view code (python)
table.update_item(
    Key={'user_id': 'user123'},
    UpdateExpression='SET balance = balance - :amount',
    ConditionExpression='balance >= :amount',
    ExpressionAttributeValues={
        ':amount': 100
    }
)

Multi-Item Transactions (ACID across items/tables):

Click to view code (python)
dynamodb.transact_write_items(
    TransactItems=[
        {
            'Put': {
                'TableName': 'accounts',
                'Item': {
                    'account_id': 'acc1',
                    'balance': 900
                }
            }
        },
        {
            'Put': {
                'TableName': 'accounts',
                'Item': {
                    'account_id': 'acc2',
                    'balance': 1100
                }
            }
        },
        {
            'Put': {
                'TableName': 'transactions',
                'Item': {
                    'transaction_id': 'txn1',
                    'from': 'acc1',
                    'to': 'acc2',
                    'amount': 100
                }
            }
        }
    ]
)
# All succeed or all fail (atomic)

Performance Optimization

Hot Partitions

Problem: Uneven distribution of traffic

Click to view code
Partition Key: user_id
Users: 1M
Requests/sec: 100K

If distribution is even:
  100K / 1M users = 100 requests per user on average
  
If one user is celebrity with 90K requests:
  Celebrity partition: 90K requests → HIGH LOAD
  Other partitions: 10K requests → LOW LOAD
  
Result: Single partition is bottleneck

Solution: Write Sharding

Click to view code (python)
# Before (hot partition)
user_id = 'celebrity'
# All writes go to same partition

# After (distribute across shards)
num_shards = 100
shard_id = random.randint(0, num_shards - 1)
partition_key = f'{user_id}#{shard_id}'

# Put item
table.put_item(
    Item={
        'user_id': partition_key,  # 'celebrity#42'
        'timestamp': '2024-01-05T10:00:00Z',
        'action': 'view'
    }
)

# Query all shards for followers
responses = []
for shard in range(num_shards):
    response = table.query(
        KeyConditionExpression='user_id = :id',
        ExpressionAttributeValues={
            ':id': f'celebrity#{shard}'
        }
    )
    responses.extend(response['Items'])

Query Optimization

Click to view code (python)
# ❌ SLOW: Full table scan
response = table.scan(
    FilterExpression='email = :email',
    ExpressionAttributeValues={':email': 'user@example.com'}
)

# ✅ FAST: Use GSI on email
response = table.query(
    IndexName='email-index',
    KeyConditionExpression='email = :email',
    ExpressionAttributeValues={':email': 'user@example.com'}
)

# ❌ SLOW: Fetch all attributes
response = table.query(
    KeyConditionExpression='user_id = :id',
    ExpressionAttributeValues={':id': 'user123'}
)

# ✅ FAST: Project only needed attributes
response = table.query(
    KeyConditionExpression='user_id = :id',
    ProjectionExpression='user_id,email,name',
    ExpressionAttributeValues={':id': 'user123'}
)

Batch Operations

Click to view code (python)
# ❌ SLOW: Individual writes (sequential)
for item in items:
    table.put_item(Item=item)
    # Each write = network round trip

# ✅ FAST: Batch write (25 items per request)
with table.batch_writer(batch_size=25) as batch:
    for item in items:
        batch.put_item(Item=item)

# ❌ SLOW: Individual reads
for user_id in user_ids:
    response = table.get_item(Key={'user_id': user_id})

# ✅ FAST: Batch get (up to 100 items per request)
response = dynamodb.batch_get_item(
    RequestItems={
        'users': {
            'Keys': [{'user_id': uid} for uid in user_ids]
        }
    }
)

Scalability and High Availability

Auto-Scaling (Provisioned Mode)

Click to view code (python)
autoscaling = boto3.client('application-autoscaling')

# Register DynamoDB table for auto-scaling
autoscaling.register_scalable_target(
    ServiceNamespace='dynamodb',
    ResourceId='table/users',
    ScalableDimension='dynamodb:table:WriteCapacityUnits',
    MinCapacity=10,
    MaxCapacity=10000
)

# Create scaling policy
autoscaling.put_scaling_policy(
    PolicyName='users-scaling',
    ServiceNamespace='dynamodb',
    ResourceId='table/users',
    ScalableDimension='dynamodb:table:WriteCapacityUnits',
    PolicyType='TargetTrackingScaling',
    TargetTrackingScalingPolicyConfiguration={
        'TargetValue': 70.0,  # Keep utilization at 70%
        'PredefinedMetricSpecification': {
            'PredefinedMetricType': 'DynamoDBWriteCapacityUtilization'
        }
    }
)

Global Tables (Multi-Region)

Click to view code (python)
# Create table in US-East
us_table = dynamodb.create_table(
    TableName='users',
    KeySchema=[{'AttributeName': 'user_id', 'KeyType': 'HASH'}],
    AttributeDefinitions=[{'AttributeName': 'user_id', 'AttributeType': 'S'}],
    BillingMode='PAY_PER_REQUEST',
    StreamSpecification={'StreamViewType': 'NEW_AND_OLD_IMAGES'}
)

# Add EU-West replica
dynamodb = boto3.client('dynamodb')
dynamodb.create_global_table(
    GlobalTableName='users',
    ReplicationGroup=[
        {'RegionName': 'us-east-1'},
        {'RegionName': 'eu-west-1'},
        {'RegionName': 'ap-southeast-1'}
    ]
)

# Benefits:
# - Local reads (< 10ms latency in each region)
# - Local writes (replicate asynchronously)
# - Automatic failover
# - Multi-region writes (last-write-wins)

Use Cases

1. User Sessions (High Read/Write)

Click to view code (python)
table = dynamodb.create_table(
    TableName='sessions',
    KeySchema=[
        {'AttributeName': 'session_id', 'KeyType': 'HASH'},
        {'AttributeName': 'created_at', 'KeyType': 'RANGE'}
    ],
    BillingMode='PAY_PER_REQUEST'
)

# Write session (fast, volatile data)
table.put_item(
    Item={
        'session_id': 'sess-abc123',
        'created_at': '2024-01-05T10:00:00Z',
        'user_id': 'user123',
        'data': {'cart': ['item1', 'item2']},
        'ttl': int(time.time()) + 3600  # Auto-expire
    }
)

# Read session (fast lookup)
response = table.get_item(Key={'session_id': 'sess-abc123'})

2. Real-time Analytics (Time-series)

Click to view code (python)
table = dynamodb.create_table(
    TableName='metrics',
    KeySchema=[
        {'AttributeName': 'metric_name', 'KeyType': 'HASH'},
        {'AttributeName': 'timestamp', 'KeyType': 'RANGE'}
    ],
    BillingMode='PAY_PER_REQUEST'
)

# Write metric
table.put_item(
    Item={
        'metric_name': 'cpu-usage#server1',
        'timestamp': '2024-01-05T10:00:00Z',
        'value': 85.5
    }
)

# Query metrics (range)
response = table.query(
    KeyConditionExpression='metric_name = :name AND #ts BETWEEN :start AND :end',
    ExpressionAttributeNames={'#ts': 'timestamp'},
    ExpressionAttributeValues={
        ':name': 'cpu-usage#server1',
        ':start': '2024-01-05T09:00:00Z',
        ':end': '2024-01-05T10:00:00Z'
    }
)

3. Document Store (Flexible Schema)

Click to view code (python)
table = dynamodb.create_table(
    TableName='documents',
    KeySchema=[
        {'AttributeName': 'doc_id', 'KeyType': 'HASH'}
    ],
    BillingMode='PAY_PER_REQUEST'
)

# Store flexible document
table.put_item(
    Item={
        'doc_id': 'doc-123',
        'title': 'Article',
        'content': 'Lorem ipsum...',
        'metadata': {
            'author': 'John',
            'tags': ['python', 'dynamodb'],
            'ratings': [5, 4, 5]
        }
    }
)

# Document can have any shape
table.put_item(
    Item={
        'doc_id': 'doc-456',
        'type': 'video',
        'url': 'https://example.com/video.mp4',
        'duration': 3600,
        'transcodes': ['360p', '720p', '1080p']
    }
)

Interview Questions & Answers

Q1: Design a ride-sharing backend for 1M daily active users, 100K concurrent rides

Requirements:

  • Real-time ride tracking
  • Driver and rider matching
  • Trip history
  • Payments
  • 99.99% uptime

Solution Architecture:

Click to view code (python)
# Rides table (current/active rides)
rides_table = dynamodb.create_table(
    TableName='rides',
    KeySchema=[
        {'AttributeName': 'ride_id', 'KeyType': 'HASH'},
        {'AttributeName': 'status', 'KeyType': 'RANGE'}
    ],
    GlobalSecondaryIndexes=[
        {
            'IndexName': 'driver-rides-index',
            'KeySchema': [
                {'AttributeName': 'driver_id', 'KeyType': 'HASH'},
                {'AttributeName': 'created_at', 'KeyType': 'RANGE'}
            ],
            'Projection': {'ProjectionType': 'ALL'}
        },
        {
            'IndexName': 'rider-rides-index',
            'KeySchema': [
                {'AttributeName': 'rider_id', 'KeyType': 'HASH'},
                {'AttributeName': 'created_at', 'KeyType': 'RANGE'}
            ],
            'Projection': {'ProjectionType': 'ALL'}
        }
    ],
    BillingMode='PAY_PER_REQUEST'
)

# Driver location (hot writes, use write sharding)
drivers_table = dynamodb.create_table(
    TableName='drivers',
    KeySchema=[
        {'AttributeName': 'driver_id#shard', 'KeyType': 'HASH'},
        {'AttributeName': 'timestamp', 'KeyType': 'RANGE'}
    ],
    BillingMode='PAY_PER_REQUEST'
)

# Trip history (archived)
history_table = dynamodb.create_table(
    TableName='trip_history',
    KeySchema=[
        {'AttributeName': 'user_id', 'KeyType': 'HASH'},
        {'AttributeName': 'trip_date', 'KeyType': 'RANGE'}
    ],
    BillingMode='PAY_PER_REQUEST'
)

Write path:

Click to view code (python)
# Update ride status (strong consistency)
rides_table.update_item(
    Key={'ride_id': 'ride-123', 'status': 'ACTIVE'},
    UpdateExpression='SET #loc = :loc, #ts = :ts',
    ExpressionAttributeNames={'#loc': 'location', '#ts': 'updated_at'},
    ExpressionAttributeValues={
        ':loc': {'lat': 40.7128, 'lng': -74.0060},
        ':ts': int(time.time())
    }
)

# Update driver location (write sharding for hot partition)
shard_id = random.randint(0, 99)
drivers_table.put_item(
    Item={
        'driver_id#shard': f'driver-123#{shard_id}',
        'timestamp': int(time.time()),
        'location': {'lat': 40.7128, 'lng': -74.0060},
        'status': 'available'
    }
)

Read path:

Click to view code (python)
# Get active ride
ride = rides_table.get_item(
    Key={'ride_id': 'ride-123', 'status': 'ACTIVE'},
    ConsistentRead=True
)

# Get driver's rides (via GSI)
driver_rides = rides_table.query(
    IndexName='driver-rides-index',
    KeyConditionExpression='driver_id = :id AND created_at > :date',
    ExpressionAttributeValues={
        ':id': 'driver-123',
        ':date': '2024-01-05T00:00:00Z'
    }
)

# Get driver location (query all shards)
locations = []
for shard in range(100):
    response = drivers_table.query(
        KeyConditionExpression=f'driver_id#shard = :id AND #ts = :ts',
        ExpressionAttributeNames={'#ts': 'timestamp'},
        ExpressionAttributeValues={
            ':id': f'driver-123#{shard}',
            ':ts': int(time.time())
        },
        Limit=1
    )
    if response['Items']:
        locations.append(response['Items'][0])

Capacity planning:

Click to view code
Active rides: 100K
Reads/write: 
  - Ride updates: 100K writes/sec (status changes)
  - Location updates: 1M writes/sec (every second)
  - Trip history reads: 100K reads/sec (riders checking history)

Writes: 
  Rides: 100K × 1KB = 100K WCU
  Location: 1M × 0.5KB = 500K WCU (use write sharding)
  
Total: ~600K WCU (on-demand pricing)

Q2: Hot partition bottleneck. How to scale writes?

Answer:

Diagnosis:

Click to view code (bash)
# Monitor write throttling
aws cloudwatch get-metric-statistics \
  --namespace AWS/DynamoDB \
  --metric-name UserErrors \
  --statistics Sum \
  --start-time 2024-01-05T00:00:00Z \
  --end-time 2024-01-05T01:00:00Z

Solution 1: Write Sharding

Click to view code (python)
# Before (hot key)
partition_key = 'celebrity'  # All writes here

# After (distribute across 100 shards)
num_shards = 100
shard_id = hash(user_id) % num_shards  # Deterministic
partition_key = f'celebrity#{shard_id:03d}'

# Write distributed across 100 partitions
table.put_item(Item={'user_id': partition_key, 'action': 'view'})

# Read from all shards
results = []
for shard in range(num_shards):
    response = table.query(
        KeyConditionExpression='user_id = :id',
        ExpressionAttributeValues={':id': f'celebrity#{shard:03d}'}
    )
    results.extend(response['Items'])

Solution 2: DynamoDB Accelerator (DAX)

Click to view code (python)
from amazondax.client import AmazonDaxClient

cluster_endpoint = 'dax-cluster.xxxxx.dax.amazonaws.com:8111'
client = AmazonDaxClient.resource(endpoint_url=cluster_endpoint)
table = client.Table('users')

# Writes bypass DAX (go to DynamoDB)
# Reads hit DAX cache first (100x faster)

Solution 3: Multiple Tables with Sharding

Click to view code
# Instead of sharding within table
tables = [
    'users-shard-0',
    'users-shard-1',
    ...
    'users-shard-99'
]

shard_id = hash(user_id) % 100
table = dynamodb.Table(f'users-shard-{shard_id}')
table.put_item(Item=item)

# Each table has separate throughput allocation

Key takeaway: "Use write sharding for hot partitions. Distribute writes across N shards (typically 100), query all shards on read."


Q3: Transaction across 3 tables fails midway. How to ensure consistency?

Answer:

Problem: Multi-item transaction needs ACID guarantees

Click to view code (python)
# Scenario: Transfer money between accounts + record transaction + update ledger
# If any step fails, all must roll back

def transfer_money(from_id, to_id, amount):
    try:
        dynamodb.transact_write_items(
            TransactItems=[
                {
                    'Update': {
                        'TableName': 'accounts',
                        'Key': {'account_id': from_id},
                        'UpdateExpression': 'SET balance = balance - :amt',
                        'ExpressionAttributeValues': {':amt': amount},
                        'ConditionExpression': 'balance >= :amt'
                    }
                },
                {
                    'Update': {
                        'TableName': 'accounts',
                        'Key': {'account_id': to_id},
                        'UpdateExpression': 'SET balance = balance + :amt',
                        'ExpressionAttributeValues': {':amt': amount}
                    }
                },
                {
                    'Put': {
                        'TableName': 'transactions',
                        'Item': {
                            'transaction_id': str(uuid4()),
                            'from': from_id,
                            'to': to_id,
                            'amount': amount,
                            'status': 'completed'
                        }
                    }
                }
            ]
        )
        return True
    except Exception as e:
        # Transaction failed → all rolled back automatically
        print(f"Transaction failed: {e}")
        return False

DynamoDB Transactions Guarantees:

Click to view code
✓ ACID (Atomicity, Consistency, Isolation, Durability)
✓ Atomicity: All or nothing (no partial updates)
✓ Consistency: Balance >= 0 always (condition checked)
✓ Isolation: No dirty reads (transaction locked until commit)
✓ Durability: Written to disk + replicated before returning

✗ Limitations:
  - Max 25 items per transaction
  - Max 4MB total size
  - No nested transactions
  - No automatic retry on conflict

Manual retry strategy:

Click to view code (python)
import time
import random

def transfer_with_retry(from_id, to_id, amount, max_retries=3):
    for attempt in range(max_retries):
        try:
            dynamodb.transact_write_items(TransactItems=[...])
            return True
        except ClientError as e:
            if e.response['Error']['Code'] == 'ValidationException':
                # Condition failed (balance < amount)
                return False
            elif e.response['Error']['Code'] == 'TransactionConflictException':
                # Retry with exponential backoff
                wait_time = (2 ** attempt) + random.uniform(0, 1)
                time.sleep(wait_time)
            else:
                raise
    
    raise Exception("Transaction failed after max retries")

Key takeaway: "DynamoDB transactions provide ACID guarantees across items. Use condition expressions to enforce business rules. Retry on conflict with exponential backoff."


Q4: Designing for 10 billion items. How to manage?

Answer:

Challenge: DynamoDB table size

Click to view code
10 billion items × 1KB average = 10TB storage
Query latency increases with partition count

Single table:
  10B items / 10GB per partition = 1M partitions
  Lookup involves: hash(key) → partition → seek

Solution: Table Sharding by Date

Click to view code (python)
# Time-series data: partition by month
table_name = f'events-{year}-{month:02d}'

# Write to current month's table
current_month = '2024-01'
table = dynamodb.Table(f'events-{current_month}')
table.put_item(Item={
    'event_id': str(uuid4()),
    'timestamp': '2024-01-05T10:00:00Z',
    'data': {...}
})

# Query specific month
response = dynamodb.Table('events-2024-01').query(
    KeyConditionExpression='event_type = :type AND #ts > :date',
    ExpressionAttributeNames={'#ts': 'timestamp'},
    ExpressionAttributeValues={
        ':type': 'purchase',
        ':date': '2024-01-01T00:00:00Z'
    }
)

# Query across months (if needed)
def query_events(event_type, start_date, end_date):
    all_items = []
    current = start_date
    
    while current <= end_date:
        table_name = f'events-{current:%Y-%m}'
        try:
            response = dynamodb.Table(table_name).query(
                KeyConditionExpression='event_type = :type AND #ts >= :start',
                ExpressionAttributeNames={'#ts': 'timestamp'},
                ExpressionAttributeValues={
                    ':type': event_type,
                    ':start': current
                }
            )
            all_items.extend(response['Items'])
        except:
            pass  # Table doesn't exist
        
        current += timedelta(months=1)
    
    return all_items

TTL for automatic cleanup:

Click to view code (python)
# Automatically delete old data
table.put_item(
    Item={
        'event_id': str(uuid4()),
        'timestamp': '2024-01-05T10:00:00Z',
        'ttl': int(time.time()) + (90 * 24 * 3600)  # Delete after 90 days
    }
)

# Enable TTL on table
dynamodb.update_time_to_live(
    TableName='events-2024-01',
    TimeToLiveSpecification={
        'AttributeName': 'ttl',
        'Enabled': True
    }
)

Archive strategy:

Click to view code
Hot data (current month):
  - Full throughput
  - On-Demand billing
  
Warm data (last 3 months):
  - Reduced throughput
  - Provisioned billing
  
Cold data (older):
  - Archive to S3
  - Use Athena for queries
  - Restore if needed

Q5: Global table has latency spike in EU. Diagnose and fix.

Answer:

Monitoring:

Click to view code (python)
import boto3

cloudwatch = boto3.client('cloudwatch')

# Get latency metrics per region
response = cloudwatch.get_metric_statistics(
    Namespace='AWS/DynamoDB',
    MetricName='UserErrors',
    Dimensions=[
        {'Name': 'TableName', 'Value': 'users'},
        {'Name': 'GlobalSecondaryIndexName', 'Value': 'ALL'}
    ],
    StartTime=datetime.utcnow() - timedelta(hours=1),
    EndTime=datetime.utcnow(),
    Period=60,
    Statistics=['Sum']
)

Diagnosis checklist:

Click to view code
1. Check replication lag
   - Primary region writes → Replica region lag
   - Normal: < 1 second
   - Issue: > 10 seconds → network/replication problem

2. Check throttling
   - EU table hitting capacity limit
   - Check ConsumedWriteCapacityUnits metric

3. Check cross-region network
   - High latency between regions
   - Check AWS Direct Connect health

4. Check item size
   - Large items take longer to replicate
   - Check average item size in metrics

5. Check hot partitions
   - Single key receiving all traffic
   - Use CloudWatch dimensions to identify

Solutions:

1. Increase capacity in EU region:

Click to view code (python)
# If using provisioned capacity
dynamodb = boto3.client('dynamodb')
dynamodb.update_table(
    TableName='users',
    ProvisionedThroughput={
        'ReadCapacityUnits': 500,
        'WriteCapacityUnits': 500
    }
)

# If using on-demand, nothing needed (auto-scales)

2. Add local index to reduce cross-region traffic:

Click to view code (python)
# Instead of global query (hits primary + replicas)
response = table.query(
    KeyConditionExpression='user_id = :id'
)

# Add local cache in EU region
eu_cache = redis_cluster_eu.get(f'user:{user_id}')
if not eu_cache:
    eu_cache = eu_dynamodb_table.get_item(Key={'user_id': user_id})
    redis_cluster_eu.set(f'user:{user_id}', eu_cache, ttl=3600)

3. Monitor replication lag:

Click to view code (bash)
# Check replication lag metric
aws cloudwatch get-metric-statistics \
  --namespace AWS/DynamoDB \
  --metric-name ReplicationLatency \
  --dimensions Name=TableName,Value=users \
             Name=Region,Value=eu-west-1 \
  --start-time 2024-01-05T00:00:00Z \
  --end-time 2024-01-05T01:00:00Z \
  --period 60 \
  --statistics Maximum,Average

4. Optimize write patterns:

Click to view code (python)
# Batch writes to reduce round trips
with eu_table.batch_writer(batch_size=25) as batch:
    for item in items:
        batch.put_item(Item=item)

# Use eventually consistent reads where possible
eu_table.get_item(
    Key={'user_id': 'user123'},
    ConsistentRead=False  # Eventually consistent (faster)
)

Key takeaway: "Monitor replication lag and capacity metrics. Add caching for frequently accessed data. Use eventually consistent reads to reduce latency."


DynamoDB vs Alternatives

SystemThroughputLatencyBest ForTrade-off
DynamoDB100K+/sec5-10msManaged NoSQL, high scaleEventual consistency, cost at scale
Cassandra1M+/sec10-20msHigh write, distributedOperational complexity
MongoDB100K+/sec5-20msDocument flexibility, ACIDSelf-managed, operational overhead
RDS (PostgreSQL)10K/sec1-5msRelational, transactionsScaling limitations
Redis1M+/sec1-5msCache, fast accessNo persistence (volatile)

Best Practices

Design Best Practices

Design tables around queries (not entities) ✓ Use partition and sort keys wisely (avoid hot partitions) ✓ Prefer read-heavy designs (use secondary indexes) ✓ Normalize strategically (some denormalization OK) ✓ Plan for growth (estimate capacity accurately) ✓ Use TTL for auto-cleanup (time-series data) ✓ Separate hot/cold data (different tables/sharding)

Operational Best Practices

Use on-demand for unpredictable traffic (simplifies planning) ✓ Monitor auto-scaling (set reasonable max capacity) ✓ Enable point-in-time recovery (disaster recovery) ✓ Use VPC endpoints (security, performance) ✓ Enable DynamoDB Streams (for change capture) ✓ Implement exponential backoff (for retries) ✓ Use DAX for caching (100x faster reads)

Cost Optimization

On-demand for bursty traffic (pay per request) ✓ Provisioned for predictable traffic (reserved capacity) ✓ Compress large items (reduce WCU usage) ✓ Delete old data via TTL (reduce storage costs) ✓ Use projection expressions (avoid fetching unneeded attributes) ✓ Batch operations (reduce API calls)


Disaster Recovery & Backup

Backup Options

Click to view code (python)
# Automated backups (point-in-time recovery)
dynamodb.update_continuous_backups(
    TableName='users',
    PointInTimeRecoverySpecification={
        'PointInTimeRecoveryEnabled': True
    }
)

# Manual snapshot
dynamodb.create_backup(
    TableName='users',
    BackupName='users-backup-2024-01-05'
)

# Restore from snapshot
dynamodb.restore_table_from_backup(
    TargetTableName='users-restored',
    BackupArn='arn:aws:dynamodb:...'
)

Multi-Region Strategy

Click to view code
Primary Region (US-East):
  ✓ Handles all writes
  ✓ Low latency for US users
  
Replica Regions (EU, AP):
  ✓ Eventually consistent reads
  ✓ Low latency for local users
  ✓ Automatic failover on primary failure
  
Failover:
  1. Monitor primary region health
  2. Detect failure (CloudWatch alarm)
  3. Switch writes to replica region
  4. Client redirects to replica
  5. RPO: < 1 second (replication lag)

Summary & Key Takeaways

DynamoDB excels at:

  • ✓ Fully managed database (no ops overhead)
  • ✓ High throughput (100K+/sec per table)
  • ✓ Low latency (single-digit milliseconds)
  • ✓ Auto-scaling (handle traffic spikes)
  • ✓ Multi-region (global tables)
  • ✓ ACID transactions (single/multi-item)

Key challenges:

  • ✗ Eventual consistency (not strong by default)
  • ✗ Cost at massive scale (pay per request)
  • ✗ Limited query flexibility (design tables per query)
  • ✗ Hot partition bottlenecks (need write sharding)
  • ✗ Vendor lock-in (AWS-specific)

Critical design questions:

  1. What's my throughput requirement (ops/sec)?
  2. Do I need strong consistency or eventually consistent is OK?
  3. What's my query pattern (design tables around it)?
  4. Will I have hot partitions (plan sharding)?
  5. Do I need multi-region (global tables)?
  6. What's my data retention (TTL auto-cleanup)?
  7. Cost: provisioned vs on-demand?

DynamoDB — Practical Guide & Deep Dive

DynamoDB is a fully-managed, highly scalable, key-value service provided by AWS.

  • Fully-Managed — AWS handles hardware provisioning, configuration, patching, and scaling
  • Highly Scalable — automatically scales up/down without downtime
  • Key-value — NoSQL database with flexible data storage and retrieval

DynamoDB supports transactions (neutralizing a past criticism), has just about everything you'd need from a database for system design interviews, and is incredibly easy to use.

"Can I use DynamoDB in an interview?" — Simply ask your interviewer. Many say yes; others prefer open-source alternatives to avoid vendor lock-in.


The Data Model

ConceptDescription
TablesTop-level structure. Defined by mandatory primary key. Support secondary indexes
ItemsRows in the table. Must have primary key. Up to 400KB including all attributes
AttributesKey-value pairs. Scalar types (strings, numbers, booleans), set types, nested objects

DynamoDB is schema-less — items in the same table can have different sets of attributes. New attributes can be added at any point without affecting existing items.

{
  "PersonID": 101,
  "LastName": "Smith",
  "FirstName": "Fred",
  "Phone": "555-4321"
},
{
  "PersonID": 102,
  "LastName": "Jones",
  "FirstName": "Mary",
  "Address": { "Street": "123 Main", "City": "Anytown", "State": "OH", "ZIPCode": 12345 }
},
{
  "PersonID": 103,
  "LastName": "Stephens",
  "FirstName": "Howard",
  "FavoriteColor": "Blue"
}

Notice how FavoriteColor exists only on one item — DynamoDB's flexibility in action.

Partition Key and Sort Key

ComponentDescription
Partition KeyHashed to determine physical storage location. Required
Sort Key (optional)Combined with partition key for composite primary key. Enables range queries and sorting within a partition

Example: Group chat app → chatid as partition key, messageid as sort key. Efficiently query all messages for a chat, sorted chronologically.

Use monotonically increasing IDs (Snowflake, UUID v7, ULID) rather than timestamps for sort keys — timestamps don't guarantee uniqueness.

Under the hood:

  1. Hash partitioning — partition key is hashed; a partition metadata service maps it to the correct storage node
  2. B-trees for sort keys — within each partition, items are organized in a B-tree indexed by sort key
  3. Composite key operations — find the node via partition key hash, then traverse B-tree via sort key

Secondary Indexes

FeatureGlobal Secondary Index (GSI)Local Secondary Index (LSI)
DefinitionDifferent partition key than main tableSame partition key, different sort key
When to useQuery on non-primary-key attributesAdditional sort keys within same partition
Size limitNo restrictions10 GB per partition key
ThroughputSeparate read/write capacityShares base table capacity
ConsistencyEventually consistent onlySupports strong consistency
CreationCan be added/removed anytimeMust be defined at table creation
Max count20 per table5 per table

GSI example: Chat table with chatid partition key. Need "show all messages a user sent across all chats" → create GSI with userid as partition key, message_id as sort key.

LSI example: Within a chat, find messages with most attachments → create LSI on num_attachments.

Under the hood:

  • GSIs are separate internal tables with their own partition scheme, updated asynchronously
  • LSIs are co-located with the base table, maintaining a separate B-tree within each partition, updated synchronously

Accessing Data

OperationDescriptionWhen to Use
QueryRetrieves items by primary key or secondary index. EfficientAlways prefer this
ScanReads every item in a table. PaginatedAvoid if possible (expensive)
// Query (efficient)
const params = {
  TableName: 'users',
  KeyConditionExpression: 'user_id = :id',
  ExpressionAttributeValues: { ':id': 101 }
};
dynamodb.query(params);

// Scan (avoid for large tables)
dynamodb.scan({ TableName: 'users' });

Important: DynamoDB reads the entire item from storage even with ProjectionExpression — you're charged full RCU based on item size. Normalize data appropriately to avoid reading more than necessary.

Example: Designing Yelp — don't store reviews inside the business item. Separate reviews table with business_id as partition key. Otherwise, every business read pulls all reviews.


CAP Theorem & Consistency

DynamoDB supports per-request consistency choice (not table-level):

ModeBehaviorCostSupported On
Eventually Consistent (default)Might not see most recent write0.5 RCU per 4KBBase table, GSI, LSI
Strongly Consistent (ConsistentRead=true)Reflects all prior writes1 RCU per 4KBBase table, LSI only

DynamoDB also supports ACID transactions via TransactWriteItems and TransactGetItems — serializable isolation across up to 100 items spanning multiple tables.

Under the hood:

  • Each partition has 3 replicas (1 leader + 2 followers) using Multi-Paxos consensus
  • Writes go through leader → WAL entry → quorum (2 of 3) → acknowledged
  • Strongly consistent reads are routed to the leader
  • Eventually consistent reads can be served by any replica

Architecture and Scalability

Auto-sharding: When a partition reaches capacity (size or throughput), DynamoDB automatically splits and redistributes data.

Partition limits:

  • Up to 3,000 RCU and 1,000 WCU per partition
  • 3,000 RCU = 12 MB/sec reads; 1,000 WCU = 1 MB/sec writes

Global Tables — real-time replication across AWS regions for local read/write operations worldwide. Simply mention this in your interview for cross-region replication.

Fault tolerance: Data automatically replicated across 3 Availability Zones within a region (not user-configurable). Encryption at rest by default, TLS enforced for all API calls.


Pricing Model

UnitCostDetails
RCU~$1.12 per million reads (4KB each)1 strongly consistent or 2 eventually consistent reads/sec
WCU~$5.62 per million writes (1KB each)1 write/sec for items up to 1KB

Back-of-envelope example: YouTube views at 10M writes/sec:

  • Each write ≥ 1 WCU (minimum 1KB)
  • Need ~10,000 partitions (1,000 WCU each)
  • Provisioned: ~$156,000/day. On-demand: significantly higher.

Understanding pricing helps gut-check whether your design is cost-feasible.


Advanced Features

DAX (DynamoDB Accelerator)

Purpose-built in-memory cache — microsecond response times for read-heavy workloads. No need for separate Redis/Memcached.

  • Read-through and write-through cache
  • Two caches: item cache (GetItem/BatchGetItem) and query cache (Query/Scan)
  • Does not cache strongly consistent reads (passes through to DynamoDB)
  • Only auto-invalidates for writes that go through DAX itself

Caveat: Direct DynamoDB writes bypassing DAX leave stale cache entries until TTL/eviction.

DynamoDB Streams (CDC)

Built-in Change Data Capture — captures inserts, updates, deletes as stream records.

Use CaseHow
Elasticsearch syncStream changes to keep search index consistent
Real-time analyticsEnable Kinesis Data Streams → Firehose → S3/Redshift
Change notificationsTrigger Lambda functions on data changes

DynamoDB in an Interview

When to use:

  • Almost any persistence layer (highly scalable, durable, supports transactions)
  • Single-digit millisecond latencies (microsecond with DAX)
  • Simple key-value or document storage patterns
  • When your interviewer allows AWS services

When NOT to use:

  • Cost — high-volume workloads (hundreds of thousands of writes/sec) get expensive fast
  • Complex queries — no JOINs, limited ad-hoc aggregation
  • Data modeling constraints — if you need many GSIs/LSIs, consider PostgreSQL
  • Vendor lock-in — some interviewers require vendor-neutral solutions

Additional Interview Questions & Answers

Q6: Explain the difference between partition key and sort key. How do they affect data distribution and query patterns?

Answer:

Partition key:

  • Hashed to determine which physical partition stores the item
  • All items with the same partition key are stored together on the same node
  • Must be specified in every query — you cannot query without the partition key (except via scan)

Sort key:

  • Orders items within a partition (B-tree index)
  • Enables range queries: begins_with, between, >, <, >=, <=
  • Combined with partition key forms a composite primary key

Data distribution impact:

  • High-cardinality partition keys → even distribution (good)
  • Low-cardinality partition keys → hot partitions (bad)
  • Sort key doesn't affect distribution — it only affects ordering within a partition

Query pattern examples:

# Partition key only: exact lookup
PK = "user_123"

# Composite key: range query
PK = "user_123" AND SK begins_with("order#2026")

# Composite key: between query
PK = "chat_456" AND SK BETWEEN "msg#1000" AND "msg#2000"

Single-table design pattern: Use prefixed sort keys to store multiple entity types in one table:

  • PK=USER#123, SK=PROFILE → user profile
  • PK=USER#123, SK=ORDER#001 → user's order
  • PK=USER#123, SK=ORDER#002 → another order

Q7: What are DynamoDB's item size limits and how do you work around them?

Answer:

Hard limit: 400 KB per item (including attribute names, values, and overhead).

Strategies when items might exceed 400 KB:

StrategyDescription
Store large data in S3Keep a pointer (S3 URL) in DynamoDB; fetch from S3 when needed
Compress attributesGzip large text/JSON before storing as binary attribute
Split into multiple itemsStore metadata in one item, chunks in others (e.g., PK=doc#1, SK=chunk#0)
Normalize dataMove large nested collections (reviews, comments) to separate items/tables
Use document DBIf items are frequently > 100 KB with complex nesting, consider MongoDB

Practical example — Yelp reviews:

  • Don't embed all reviews in the business item (could exceed 400 KB, wastes RCU)
  • Separate reviews table: PK=businessid, SK=reviewid
  • Business item stays small and cheap to read

Q8: How does DynamoDB handle hot partitions? What strategies prevent them?

Answer:

What causes hot partitions:

  • Uneven access patterns (one partition key gets far more traffic than others)
  • Low-cardinality partition keys (e.g., status with only 3 values)
  • Time-based keys where all current writes go to the same partition

DynamoDB's built-in mitigation:

  • Adaptive capacity — DynamoDB automatically reallocates unused capacity from cold partitions to hot ones
  • Burst capacity — partitions can temporarily exceed their allocation using reserved burst credits

Design-level strategies:

StrategyExample
Write shardingAppend random suffix: PK = "popular_item#" + random(0, N). Read with N parallel queries and merge
Composite keysAdd high-cardinality attribute to partition key: (date, user_id) instead of just (date)
Caching (DAX)Cache hot reads to reduce DynamoDB load
Time-bucketingFor time-series: PK = sensorid + "#" + date instead of just sensorid
Separate tablesMove hot entities to their own table with dedicated capacity

Back-of-envelope: Single partition = 1,000 WCU = 1,000 writes/sec. If one key gets 10,000 writes/sec, you need to shard into at least 10 logical partitions.


Q9: Compare DynamoDB Streams vs Kafka for event-driven architectures.

Answer:

FeatureDynamoDB StreamsKafka
SourceCDC from DynamoDB tables onlyAny producer
OrderingPer-partition-key orderedPer-partition ordered
Retention24 hours (fixed)Configurable (hours to forever)
Consumer modelLambda triggers or Kinesis adapterConsumer groups (pull-based)
ThroughputTied to DynamoDB table throughputIndependent, very high (1M+ msg/sec)
ReplayLimited (24-hour window)Full replay from any offset
Multi-consumerVia Kinesis Data Streams adapterNative (multiple consumer groups)
Operational overheadZero (fully managed)High (cluster management)

Choose DynamoDB Streams when:

  • You need CDC from DynamoDB specifically
  • Simple event processing (trigger Lambda)
  • Low operational overhead is priority

Choose Kafka when:

  • Events come from multiple sources
  • You need long retention / replay
  • High-throughput stream processing
  • Multiple independent consumer groups

Common pattern: DynamoDB Streams → Lambda → Kafka (bridge DynamoDB CDC into a broader event-driven architecture).


Q10: How do DynamoDB transactions work? What are the limitations?

Answer:

DynamoDB supports ACID transactions via two APIs:

// TransactWriteItems — atomic writes across tables
await dynamodb.transactWriteItems({
  TransactItems: [
    { Put: { TableName: 'orders', Item: { order_id: '001', status: 'confirmed' } } },
    { Update: { TableName: 'inventory', Key: { item_id: 'A1' },
                 UpdateExpression: 'SET stock = stock - :qty',
                 ExpressionAttributeValues: { ':qty': 1 } } },
    { Delete: { TableName: 'cart', Key: { user_id: 'U1', item_id: 'A1' } } }
  ]
});

Properties:

  • Serializable isolation — transactions appear to execute sequentially
  • All-or-nothing — all operations succeed or all are rolled back
  • Span multiple tables (up to 100 items per transaction)

Limitations:

LimitationDetail
100 items maxPer transaction (combined reads + writes)
4 MB maxTotal size of all items in transaction
25 WCU per itemEach transactional write costs 2x normal WCU
No cross-regionTransactions are single-region only
No cross-accountCannot span multiple AWS accounts
Conflict handlingTransaction fails if any item is modified concurrently (optimistic locking)

When to use:

  • Order placement (decrement inventory + create order + clear cart)
  • Account transfers (debit one account + credit another)
  • Conditional multi-item updates

When NOT to use:

  • High-contention scenarios (frequent conflicts → retries → cost)
  • Simple single-item updates (use condition expressions instead)

Q11: Explain DynamoDB's single-table design pattern. When is it appropriate?

Answer:

Single-table design stores multiple entity types in one DynamoDB table using carefully designed partition and sort keys.

Example — E-commerce:

PKSKAttributes
USER#123PROFILEname, email, created_at
USER#123ORDER#2026-001total, status, items
USER#123ORDER#2026-002total, status, items
ORDER#2026-001ITEM#A1product_name, qty, price
PRODUCT#A1METADATAname, category, price
PRODUCT#A1REVIEW#R1rating, comment, user_id

Benefits:

  • Fetch related data in one query (user + their orders in a single request)
  • Reduces table count and GSI overhead
  • Mimics JOINs via careful key design

Drawbacks:

  • Complex to design and maintain
  • Harder to evolve as access patterns change
  • GSI overloading can be confusing
  • Not suitable when entities have very different access patterns or scaling needs

When appropriate:

  • Well-understood, stable access patterns
  • Need to minimize read operations (cost/latency)
  • Microservice with bounded context (few entity types)

When to avoid:

  • Rapidly evolving query patterns
  • Team unfamiliar with DynamoDB data modeling
  • Many entity types with independent scaling needs

Q12: How would you design a URL shortener (like bit.ly) using DynamoDB?

Answer:

Requirements: Create short URLs, redirect to original URL, track click analytics.

Table design:

Table: urls
PK: short_code (string)    — e.g., "abc123"
Attributes: long_url, created_at, user_id, click_count, ttl

GSI: user-urls-index
PK: user_id, SK: created_at

Operations:

OperationDynamoDB Call
Create short URLPutItem with condition attributenotexists(short_code) to prevent overwrites
RedirectGetItem(shortcode) → return longurl (strongly consistent for accuracy)
List user's URLsQuery on GSI user-urls-index with user_id
Track clicksUpdateItem with ADD click_count :1 (atomic increment)
Auto-expireSet ttl attribute → DynamoDB TTL auto-deletes expired items

Scaling considerations:

  • Short codes are high-cardinality → excellent partition distribution
  • Hot URLs (viral links): use DAX to cache popular redirects (microsecond latency)
  • Click analytics at high volume: write to DynamoDB Streams → Lambda → analytics pipeline (avoid hot partition on popular URLs)

Cost estimate:

  • 10M redirects/day = ~116 reads/sec average (burst much higher)
  • With DAX: most reads served from cache, minimal RCU needed
  • Storage: 1KB per URL × 100M URLs = ~100 GB

Q13: What is TTL in DynamoDB and how does it work under the hood?

Answer:

TTL (Time to Live) automatically deletes expired items from a table at no additional cost (no WCU consumed).

How to use:

  1. Enable TTL on a table, specifying which attribute holds the expiration timestamp
  2. Set the attribute to a Unix epoch timestamp (seconds) on each item
  3. DynamoDB periodically scans and deletes items past their TTL
// Set item with TTL (expires in 30 days)
const item = {
  user_id: 'U123',
  session_token: 'abc...',
  ttl: Math.floor(Date.now() / 1000) + (30 * 86400) // 30 days from now
};

Under the hood:

  • Background process scans for expired items (not instant — can take up to 48 hours after expiration)
  • Expired items may still appear in queries/scans until actually deleted
  • Deletions are captured in DynamoDB Streams (with eventName: "REMOVE")
  • No WCU cost for TTL deletions

Use cases:

  • Session management (expire inactive sessions)
  • Temporary data (OTPs, verification codes)
  • Log/event data retention
  • Shopping cart expiration

Gotcha: Don't rely on TTL for time-critical deletions. If you need items gone immediately at expiration, implement application-level filtering (WHERE ttl > current_time) in queries.


Q14: How do Global Tables work and what are the consistency implications?

Answer:

Global Tables provide multi-region, fully replicated DynamoDB tables with active-active configuration.

How it works:

  • Create a table in one region, then add replicas in other regions
  • DynamoDB automatically replicates writes to all regions (typically < 1 second)
  • Each region can serve both reads AND writes independently

Consistency implications:

ScenarioBehavior
Write in us-east-1, read in us-east-1Strongly consistent read available
Write in us-east-1, read in eu-west-1Eventually consistent only (replication lag)
Concurrent writes to same item in different regionsLast writer wins (based on timestamp)

Conflict resolution: DynamoDB uses last-writer-wins reconciliation based on the item's timestamp. There's no built-in conflict detection or merge logic — the most recent write (by wall clock) overwrites previous versions.

When to use:

  • Global user base needing low-latency reads/writes
  • Disaster recovery (automatic failover)
  • Read-heavy workloads that benefit from local replicas

Pitfalls:

  • Cross-region write conflicts can silently overwrite data
  • Strongly consistent reads are region-local only — can't guarantee cross-region consistency
  • Cost: pay for replicated WCU in each region
  • Transactions are single-region only — cannot span global table replicas

Q15: Design a leaderboard system using DynamoDB. How does it compare to using Redis?

Answer:

Challenge: DynamoDB doesn't have a native sorted set like Redis. You need to design around this.

Approach 1 — GSI with score as sort key:

Table: leaderboard
PK: game_id
SK: user_id
Attributes: score, username, updated_at

GSI: game-score-index
PK: game_id
SK: score
  • Top 10: Query GSI WHERE game_id = X ORDER BY score DESC LIMIT 10
  • Update score: UpdateItem SET score = :new_score
  • User's rank: Not efficiently queryable (must scan all items with higher scores)

Approach 2 — Write sharding for high-write games:

PK: game_id#shard_N (N = hash(user_id) % 10)
SK: score#user_id
  • Parallel query 10 shards, merge top results client-side

DynamoDB vs Redis for leaderboards:

AspectDynamoDBRedis (Sorted Sets)
Top-N queryGSI query (ms latency)ZREVRANGE (sub-ms)
Get rankExpensive (scan/count)ZREVRANK O(log N)
Update scoreUpdateItem + GSI async updateZADD atomic O(log N)
DurabilityBuilt-in (3 AZ replication)Requires persistence config
ScaleAutomatic partitioningManual sharding for large sets
CostPay per operationPay for memory

Recommendation:

  • Redis for real-time leaderboards needing rank lookups and sub-millisecond latency
  • DynamoDB when leaderboard is part of a larger DynamoDB-based system and rank queries are infrequent
  • Hybrid: Redis for hot leaderboard data, DynamoDB for persistence and historical records