Apache Cassandra - Complete Deep Dive

What is Cassandra?

Apache Cassandra is a distributed NoSQL database designed for high availability and horizontal scalability that enables:

Linear scalability: Add nodes to increase throughput
No single point of failure: Peer-to-peer architecture (no master)
High write throughput: Millions of writes/second
Eventual consistency: Trade strong consistency for availability
Multi-datacenter support: Built-in geographic distribution
Tunable consistency: Choose per query (consistency level)

Core Characteristics

Aspect	Benefit
Distributed	Peer-to-peer, all nodes equal
Fault-tolerant	Lose nodes, keep operating (RF=3)
Scalable	Linear throughput increase per node
Durable	Write-ahead logging + compaction
Multi-DC	Replicate across datacenters natively
Eventually consistent	High availability over strong consistency

Cassandra Architecture Overview

Click to view code

Client (via CQL driver)
       ↓
Cassandra Cluster
  ├─ Node 1 (Replicates: A-H)
  ├─ Node 2 (Replicates: I-P)
  ├─ Node 3 (Replicates: Q-Z)
  ├─ Node 4 (Replicates: A-H, I-P)
  └─ ...
       ↓
   [Token Ring] → [Hash Partitioning]
       ↓
   [SSTable (commit log + memtable)]

Key layers:

Cluster: Collection of nodes sharing same cluster name
Node: Single Cassandra instance
Keyspace: Database-like container (schema definition)
Table: Similar to relational database table
Partition Key: Determines which node stores data
Token Ring: Maps keys to nodes via consistent hashing
Replica: Copy of partition on multiple nodes (replication factor)

Core Components

1. Consistent Hashing & Token Ring

Problem: How to distribute data across nodes?

Solution: Consistent hashing with token ring

Click to view code

Token Ring (0 to 2^63-1):

      Node 1 (Token: 100)
             /\
       Node 2   Node 4
     (Token:    (Token:
      200)      400)
         \        /
        Node 3 (Token: 300)

Key "alice" → Hash → Token 150
  Belongs to: Node 1 (100-200)
  
Key "bob" → Hash → Token 250
  Belongs to: Node 3 (200-300)
  
Key "charlie" → Hash → Token 350
  Belongs to: Node 4 (300-400)

Benefits:

Adding/removing nodes = minimal rebalancing
Load balanced: Each node owns ~1/N of token space
Replication: RF=3 means data on 3 consecutive nodes

Replication placement:

Click to view code

Replication Factor = 3

Key "user:123" → Token 150
  Primary (Node 1): Token 100-200
  Replica 1 (Node 2): Token 200-300
  Replica 2 (Node 3): Token 300-400

2. Keyspaces and Tables

Keyspace: Schema definition (similar to database)

Click to view code (cql)

CREATE KEYSPACE user_keyspace
WITH replication = {
  'class': 'NetworkTopologyStrategy',
  'us-east-1': 3,
  'eu-west-1': 2
}
AND durable_writes = true;

Table: Data structure (similar to table)

Click to view code (cql)

CREATE TABLE users (
  user_id UUID PRIMARY KEY,
  email TEXT,
  name TEXT,
  created_at TIMESTAMP,
  updated_at TIMESTAMP
);

3. Data Model: Primary Key

Simple primary key (single column):

Click to view code (cql)

PRIMARY KEY (user_id)
  → Partition key: user_id
  → Clustering key: (none)
  
// All data with same user_id on same partition

Composite primary key (partition key + clustering key):

Click to view code (cql)

CREATE TABLE user_activity (
  user_id UUID,
  timestamp TIMESTAMP,
  action_type TEXT,
  action_data TEXT,
  PRIMARY KEY (user_id, timestamp)
);

WHERE user_id = ? AND timestamp >= ? AND timestamp < ?

How it's stored:

Click to view code

Partition key (user_id): Determines node
Clustering key (timestamp): Sort order within partition

Data on node:
  Partition: user_id=alice
    ├─ timestamp=2024-01-05 10:00:00 → "login"
    ├─ timestamp=2024-01-05 10:15:30 → "view_product"
    └─ timestamp=2024-01-05 10:30:45 → "logout"
  
  Partition: user_id=bob
    ├─ timestamp=2024-01-05 10:02:00 → "login"
    └─ timestamp=2024-01-05 10:05:15 → "purchase"

Advantages:

Range queries: Get activity between timestamps efficiently
Sorted results: Clustering key order preserved
Efficient pagination: Skip to timestamp and fetch next N rows

4. Write Path

Write operation flow:

Click to view code

Client sends INSERT/UPDATE
       ↓
1. Write to commitlog (durability)
       ↓
2. Write to memtable (in-memory, sorted)
       ↓
3. Return success to client (write durability achieved)
       ↓
4. Periodically flush memtable → SSTable (on disk)
       ↓
5. Compaction merges SSTables

Write acknowledgment flow:

Click to view code

Client sends write with consistency_level = QUORUM
       ↓
Coordinator node:
  - Sends to all RF replicas
  - Waits for QUORUM responses (RF/2 + 1)
  - Returns success
       ↓
Consistency achieved:
  QUORUM = 3 nodes → need 2 acks
  If 1 node slow/down → still succeed
  Trade-off: Eventual consistency

5. Read Path

Read operation flow:

Click to view code

Client sends SELECT with consistency_level = QUORUM
       ↓
Coordinator node:
  - Queries primary replica + (RF-1) other replicas
  - Compares timestamps (read repair)
  - Returns latest version
       ↓
Read consistency guarantees:
  ONE: Fastest (1 node)
  QUORUM: Balanced (RF/2 + 1 nodes)
  ALL: Slowest but consistent (all RF nodes)

Read repair:

Click to view code

Scenario: Write to replica A succeeds, B is slow

Write to A (timestamp 100): SUCCESS
Write to B (timestamp 100): SLOW/TIMEOUT

Later, read from A and B:
  A: timestamp 100
  B: timestamp 90 (stale)
  
Read repair:
  Coordinator sees A > B
  Sends write to B to fix stale data
  Returns A's value

Consistency & Availability Trade-offs

Consistency Levels

Level	Read from	Write to	Availability	Consistency
ONE	1 node	1 node	Highest	Lowest (stale)
QUORUM	RF/2 + 1	RF/2 + 1	Medium	Medium
LOCAL_QUORUM	Local DC quorum	Local DC quorum	Medium	Medium
ALL	All replicas	All replicas	Lowest (1 failure = fail)	Highest

Write consistency example (RF=3):

Click to view code

Consistency ONE:
  Write to node A → Return success
  Risk: Nodes B,C haven't written yet (data loss if A fails)

Consistency QUORUM:
  Write to nodes A,B,C → Wait for 2 acks → Return success
  Risk: 1 node can fail safely

Consistency ALL:
  Write to nodes A,B,C → Wait for all 3 acks → Return success
  Risk: 1 node down = write fails (low availability)

Hybrid consistency (write QUORUM, read ONE):

Click to view code

Write QUORUM:
  Ensure at least 2 replicas have data

Read ONE:
  Read from fastest replica (very fast)
  Risk: Might be stale
  Mitigation: Read repair fixes stale data in background

Replication & Distribution

Replication Factor (RF)

Click to view code

RF=1: No replication (data loss on node failure)
RF=2: One replica (lose 1 node, still serve)
RF=3: Two replicas (lose 2 nodes, still serve)
RF=5: Four replicas (lose 4 nodes, still serve)

Production: Use RF=3 minimum
Multi-DC: Distribute replicas across DCs
  us-east: RF=2
  eu-west: RF=1

Multi-Datacenter Replication

Click to view code (cql)

CREATE KEYSPACE events
WITH replication = {
  'class': 'NetworkTopologyStrategy',
  'us-east-1': 3,      // 3 replicas in US
  'eu-west-1': 2       // 2 replicas in EU
};

Data flow:

Click to view code

Write in US-East:
  1. Write to primary + replicas (US-East)
  2. Replicates asynchronously to EU-West
  3. EU replicas eventually consistent

Benefits:
  - Local reads (low latency)
  - Local writes (high throughput)
  - Automatic cross-DC replication

Trade-off:
  - Eventual consistency (EU lags behind US)
  - Cross-DC bandwidth usage

Compaction Strategy

Problem: Many writes create many SSTables (slow reads)

Solution: Compaction merges SSTables

Click to view code

Initial state (4 SSTables):
  SSTable1: Keys A, M, Z
  SSTable2: Keys A, G, M
  SSTable3: Keys B, K, X
  SSTable4: Keys C, D, L

Read request for key M:
  Check SSTable1 (found)
  Check SSTable2 (might have newer)
  Check SSTable3 (no)
  Check SSTable4 (no)
  → Query multiple SSTables (slow)

After compaction:
  SSTable-merged: Keys A, B, C, D, G, K, L, M, X, Z (sorted)
  → Single SSTable (fast)

Compaction strategies:

Strategy	Use Case	Trade-off
Size-tiered	Default, write-heavy	More read overhead
Leveled	Read-heavy	More write overhead
Time-window	Time-series data	Good balance for events

Performance Optimization

Write Optimization

Click to view code (cql)

-- Batch writes for throughput
BEGIN BATCH
  INSERT INTO users (user_id, email) VALUES (?, ?)
  INSERT INTO user_index (email, user_id) VALUES (?, ?)
APPLY BATCH;

-- Async writes (fire-and-forget)
consistency_level = ONE
timeout = 100ms

-- Tune commit log settings
commitlog_sync = batch
commitlog_sync_batch_window_in_ms = 10

Read Optimization

Click to view code (cql)

-- Use secondary indexes for filtering
CREATE INDEX idx_email ON users (email);

-- Projection: Select only needed columns
SELECT user_id, email FROM users WHERE user_id = ?;
-- Not: SELECT * FROM users WHERE user_id = ?;

-- Pagination
SELECT * FROM users WHERE user_id = ? LIMIT 100;
-- Fetch next page using last key

-- Caching
SELECT * FROM users WHERE user_id = ? USING CACHE;

Hardware Configuration

Click to view code (properties)

# Memory allocation
-Xms8g -Xmx8g

# Data directory (fast SSD)
data_file_directories: ["/mnt/data/cassandra"]
commitlog_directory: "/mnt/commitlog"

# Network tuning
max_hints_window_in_ms: 10800000
seed_provider:
  - class_name: org.apache.cassandra.locator.SimpleSeedProvider
    parameters:
      - seeds: "192.168.1.1,192.168.1.2"

# Concurrency
concurrent_reads: 32
concurrent_writes: 32

Scalability & High Availability

Cluster Sizing

Small cluster (3-5 nodes):

Click to view code

Handles: 100K-1M ops/sec
Per node: SSD 500GB, 16GB RAM
Use case: Development, testing

Medium cluster (6-20 nodes):

Click to view code

Handles: 1M-10M ops/sec
Per node: SSD 2TB, 32GB RAM
Use case: Production, single DC

Large cluster (20+ nodes):

Click to view code

Handles: 10M+ ops/sec
Per node: SSD 4TB+, 64GB RAM
Use case: High-scale, multi-DC

High Availability Strategy

Click to view code

Single node failure:
  ✓ No impact (RF=3 has 2 other replicas)
  ✓ Automatic failover (read from other replicas)
  
Multiple node failure:
  ✓ Still operational (if RF > number of failures)
  ✗ Reduced throughput (fewer replicas)
  ✗ Possible data loss (if failures > RF-1)

Entire DC failure:
  ✓ Application fails over to other DC
  ✓ Data preserved (multi-DC replication)
  ✗ Possible write losses (async replication)

Repair Mechanism

Click to view code

Repair ensures replicas are consistent:

Command:
  nodetool repair -pr user_keyspace

Process:
  1. Merkle tree of local node
  2. Compare with replica nodes
  3. Stream missing/stale data
  4. Replicas become consistent

Schedule:
  Weekly or after RF-1 nodes fail
  Run during low-traffic window

Use Cases

1. Time-Series Data (Metrics, Logs)

Click to view code (cql)

CREATE TABLE metrics (
  metric_name TEXT,
  timestamp TIMESTAMP,
  host TEXT,
  value FLOAT,
  PRIMARY KEY ((metric_name, host), timestamp)
);

-- Insert millions/sec
INSERT INTO metrics VALUES ('cpu', now(), 'server1', 85.5);

-- Range query (efficient with clustering)
SELECT * FROM metrics 
WHERE metric_name = 'cpu' 
  AND host = 'server1'
  AND timestamp >= ? AND timestamp < ?;

2. User Profiles & Activity

Click to view code (cql)

CREATE TABLE users (
  user_id UUID PRIMARY KEY,
  email TEXT,
  name TEXT,
  bio TEXT
);

CREATE TABLE user_activity (
  user_id UUID,
  timestamp TIMESTAMP,
  action TEXT,
  PRIMARY KEY (user_id, timestamp)
);

-- Fast user lookup + activity range query
SELECT * FROM users WHERE user_id = ?;
SELECT * FROM user_activity 
WHERE user_id = ? AND timestamp >= ? AND timestamp < ?;

3. Message Queue (Kafka-like)

Click to view code (cql)

CREATE TABLE messages (
  topic TEXT,
  partition INT,
  offset BIGINT,
  timestamp TIMESTAMP,
  payload BLOB,
  PRIMARY KEY ((topic, partition), offset)
);

-- Write millions/sec (high throughput)
INSERT INTO messages VALUES ('events', 0, 1000, now(), payload);

-- Consumer reads offset range
SELECT * FROM messages 
WHERE topic = 'events' AND partition = 0 
  AND offset >= ? AND offset <= ?
LIMIT 1000;

4. Real-time Analytics

Click to view code (cql)

CREATE TABLE events (
  event_type TEXT,
  day TEXT,
  hour TEXT,
  timestamp TIMESTAMP,
  user_id UUID,
  data MAP<TEXT, TEXT>,
  PRIMARY KEY ((event_type, day, hour), timestamp)
);

-- Aggregate within partition (efficient)
SELECT COUNT(*), AVG(price) FROM events
WHERE event_type = 'purchase' AND day = '2024-01-05'
GROUP BY hour;

Interview Questions & Answers

Q1: Design a real-time analytics system for 1M events/sec

Requirements:

Ingest 1M events/sec
Aggregate by event type, time windows (hourly, daily)
Query: "Count purchases in last hour"
99.99% uptime

Solution:

Click to view code (cql)

CREATE KEYSPACE analytics 
WITH replication = {
  'class': 'NetworkTopologyStrategy',
  'us-east': 3,
  'eu-west': 2
};

-- Time-series partitioning (efficient range queries)
CREATE TABLE events (
  event_type TEXT,
  bucket TEXT,           -- Partition by hour (2024-01-05-10)
  timestamp TIMESTAMP,
  event_id UUID,
  user_id UUID,
  data TEXT,
  PRIMARY KEY ((event_type, bucket), timestamp, event_id)
);

-- Aggregation table (pre-computed)
CREATE TABLE event_counts (
  event_type TEXT,
  bucket_hour TEXT,      -- Hour bucket
  count_value BIGINT,
  PRIMARY KEY (event_type, bucket_hour)
);

Write path:

Click to view code (python)

# High-throughput insert
from cassandra.cluster import Cluster
from datetime import datetime
from uuid import uuid4

cluster = Cluster(['node1', 'node2', 'node3'])
session = cluster.connect('analytics')

# Batch writes for throughput
prepared = session.prepare('''
  INSERT INTO events (
    event_type, bucket, timestamp, event_id, user_id, data
  ) VALUES (?, ?, ?, ?, ?, ?)
''')

for event in events_stream:
  bucket = event['timestamp'].strftime('%Y-%m-%d-%H')
  session.execute(prepared, [
    event['type'],
    bucket,
    event['timestamp'],
    uuid4(),
    event['user_id'],
    event['data']
  ], consistency_level=ConsistencyLevel.QUORUM)

Read path (queries):

Click to view code (cql)

-- Count events in last hour
SELECT COUNT(*) FROM events
WHERE event_type = 'purchase'
  AND bucket = '2024-01-05-10'
  AND timestamp >= ? AND timestamp < ?;

-- Top products (via aggregation)
SELECT product_id, COUNT(*) as count
FROM events
WHERE event_type = 'view' AND bucket = '2024-01-05-10'
GROUP BY product_id
LIMIT 10;

Cluster setup:

Click to view code

3 nodes per DC × 2 DCs = 6 nodes
Per node: 
  - 4 CPU
  - 32GB RAM
  - 2TB SSD (write-optimized)
  
Throughput: 1M events/sec ÷ 6 nodes = 167K/sec per node
(Cassandra handles 100K+/sec per node)

Q2: Node fails. How to recover without data loss?

Answer:

Failure scenarios with RF=3:

Click to view code

Scenario 1: 1 node fails
  ✓ No impact (2 replicas still have data)
  ✓ Automatic failover (read from other replicas)
  ✓ No data loss
  
Scenario 2: 2 nodes fail simultaneously
  ⚠️ Depends on replication placement
  - If failures are on different DC: OK
  - If failures are on same DC: Possible data loss
  
Scenario 3: Node recovers after being down
  1. Node rejoins cluster
  2. Detects lag (missing writes while down)
  3. Repair fills in missing data

Recovery process:

Click to view code (bash)

# 1. Node comes back online
# Cassandra detects missed writes (hints)

# 2. Run repair (forces consistency)
nodetool repair -pr user_keyspace

# 3. Monitor repair progress
nodetool netstats

# 4. Verify data consistency
nodetool ring

Hint system (prevents write loss):

Click to view code

Write to A, B (C is down):
  A: success
  B: success
  C: (down, store hint locally)

When C recovers:
  Hints replay → C catches up with missed writes
  
Trade-off:
  - Hints stored locally on coordinator
  - If coordinator fails before replay → loss possible
  - Mitigation: Enable hint window (replay within N hours)

Key takeaway: "With RF=3, lose 1 node = no impact. Lose 2+ nodes = possible data loss. Always run nodetool repair weekly."

Q3: Strong consistency requirement (financial transactions). How to achieve?

Answer:

Challenge: Cassandra is eventually consistent by default

Solution: Hybrid consistency

Click to view code (cql)

-- Use QUORUM consistency on writes
INSERT INTO accounts (account_id, balance) VALUES (?, ?)
USING CONSISTENCY QUORUM;

-- Use QUORUM consistency on reads
SELECT balance FROM accounts 
WHERE account_id = ?
USING CONSISTENCY QUORUM;

-- With RF=3:
--   QUORUM = 2 nodes
--   Write to 2 nodes + read from 2 nodes = strong consistency

Trade-offs:

Click to view code

QUORUM consistency:
  ✓ Strong consistency (within same node set)
  ✓ Withstand 1 node failure (RF=3)
  ✗ Slower than ONE (need to reach 2 nodes)
  ✗ Fails if 2+ nodes down (can't get quorum)

Availability impact:
  ONE: Always available (unless all replicas down)
  QUORUM: Available if > RF/2 replicas up
    RF=3, need 2 alive: Can lose 1 node
    RF=5, need 3 alive: Can lose 2 nodes

Read-before-write pattern:

Click to view code (cql)

-- For balance updates (prevent race conditions)
SELECT balance FROM accounts 
WHERE account_id = ? 
USING CONSISTENCY QUORUM;

-- Compute new balance
new_balance = balance - withdrawal;

-- Update with conditional write (lightweight transaction)
UPDATE accounts SET balance = ?
WHERE account_id = ?
IF balance = ?;  -- Only update if balance unchanged

Lightweight transactions (LWT):

Click to view code (cql)

-- Atomic compare-and-set
UPDATE accounts SET balance = 950
WHERE account_id = 'user1'
IF balance = 1000;  -- Conditional

// Under the hood:
// 1. Read current balance from QUORUM (consensus)
// 2. If matches condition, write to QUORUM
// 3. Atomic within cluster

Key takeaway: "Use QUORUM consistency + lightweight transactions for strong consistency guarantees, but accept higher latency."

Q4: Query latency spike. How to diagnose?

Answer:

Diagnosis checklist:

Click to view code (bash)

# 1. Check cluster health
nodetool status

# 2. Check node load
nodetool info

# 3. Monitor GC pauses
nodetool gcstats

# 4. Check compaction queue
nodetool compactionstats

# 5. Monitor disk I/O
iostat -x 1

# 6. Check slow queries (enable query logging)
nodetool settraceprobability 1.0

# 7. Analyze trace
SELECT * FROM system_traces.sessions 
WHERE session_id = ?;

Common causes and solutions:

Problem	Cause	Fix
High read latency	Bloom filter miss (SSTable scan)	Reduce number of SSTables (compact)
High write latency	Too many SSTables	Run compaction
GC pauses	Large heap allocations	Reduce batch size, increase heap
Disk I/O bottleneck	Random reads from disk	Ensure hot data in cache
Unbalanced cluster	Uneven token distribution	Rebalance tokens
Hot partition	Single key receiving all traffic	Use partition sharding (add dimension)

Hot partition mitigation:

Click to view code

Problem: User 'celebrity' has 1M followers
         Everyone writes to their followers_list partition
         
Before (hot):
  CREATE TABLE followers (
    user_id UUID PRIMARY KEY,
    follower_id UUID,
    PRIMARY KEY (user_id, follower_id)
  );
  // All writes go to celebrity's partition (bottleneck)

After (sharded):
  CREATE TABLE followers (
    user_id UUID,
    shard_id INT,
    follower_id UUID,
    PRIMARY KEY ((user_id, shard_id), follower_id)
  );
  // Distribute across 10 shards (10x improvement)
  
  INSERT INTO followers VALUES (celebrity, shard_id=0, user1);
  INSERT INTO followers VALUES (celebrity, shard_id=1, user2);
  // Writes distribute across 10 partitions

Q5: Design a distributed cache (like Redis) with Cassandra durability

Answer:

Architecture:

Click to view code

Cache layer + Cassandra durability:
  - Memcached/Redis layer (fast, in-memory)
  - Cassandra layer (durable, persistent)
  
Write flow:
  Client → Memcached (confirm) → Write to Cassandra (background)

Read flow:
  Client → Memcached (cache hit, fast) 
        → Cassandra (cache miss, slower)
        → Memcached (populate cache)

Implementation:

Click to view code (cql)

CREATE TABLE cache (
  cache_key TEXT PRIMARY KEY,
  value BLOB,
  ttl INT,
  created_at TIMESTAMP
);

-- Insert with TTL (auto-delete after expiry)
INSERT INTO cache (cache_key, value, created_at) 
VALUES (?, ?, now())
USING TTL 3600;  // Auto-delete after 1 hour

-- Read with consistency ONE (fast, from cache)
SELECT value FROM cache 
WHERE cache_key = ?
USING CONSISTENCY ONE;

Cache invalidation strategy:

Click to view code (python)

# Write-through cache
def get_user(user_id):
    # Check Memcached first
    cached = memcached.get(f"user:{user_id}")
    if cached:
        return cached
    
    # Cache miss, read from Cassandra
    user = cassandra.select("user_id = ?", user_id)
    
    # Populate cache
    memcached.set(f"user:{user_id}", user, ttl=1hour)
    return user

def update_user(user_id, data):
    # Update Cassandra
    cassandra.update("user_id = ?", user_id, data)
    
    # Invalidate cache
    memcached.delete(f"user:{user_id}")

Durability guarantees:

Click to view code

Write QUORUM + async Cassandra write:
  1. Write to Memcached (in-memory, fast)
  2. Write to Cassandra QUORUM (durable)
  3. Memcached loss ≠ data loss (Cassandra has it)
  
If Memcached fails:
  Cassandra still has data
  Repopulate Memcached on read (cache miss)

Cassandra vs Alternatives

System	Throughput	Latency	Best For	Trade-off
Cassandra	1M+/sec	5-20ms	High-write, distributed	Eventual consistency
DynamoDB	100K+/sec	5-10ms	Managed, serverless	AWS vendor lock-in
HBase	100K+/sec	10-50ms	Hadoop ecosystem	Operational complexity
MongoDB	100K+/sec	5-20ms	Document flexibility	ACID trade-offs
Redis	1M+/sec	1-5ms	In-memory cache	No persistence (custom)

Best Practices

Data Modeling

✓ Denormalize (opposite of relational databases) ✓ Design tables around queries (not entities) ✓ Avoid hot partitions (use sharding if needed) ✓ Keep partition size < 100MB (optimal SSTable size) ✓ Use composite keys (partition key + clustering key) ✓ Plan for time-series (bucket by hour/day)

Operational Best Practices

✓ Always use RF ≥ 3 (production minimum) ✓ Monitor compaction (queue shouldn't grow unbounded) ✓ Run repairs weekly (prevents data divergence) ✓ Monitor GC pauses (tune heap if > 100ms) ✓ Use local_quorum for multi-DC (reduce latency) ✓ Plan capacity (monitor disk growth) ✓ Enable client-side caching (reduce read load)

Query Optimization

✓ Partition key must be in WHERE clause (required) ✓ Use clustering key for range queries (efficient) ✓ Avoid large result sets (use LIMIT) ✓ Batch writes (higher throughput) ✓ Use prepared statements (prevent re-parsing) ✓ Fetch only needed columns (reduce network I/O)

Disaster Recovery Strategy

Single-Datacenter Failure

Failure Mode	RF=3	Impact	Recovery
1 node down	✓ Safe	No impact	Auto (replicas serve)
2 nodes down	⚠️ Risky	Partial data loss possible	Repair after recovery
3+ nodes down	✗ Loss	Complete data loss	Restore from backup

Multi-Datacenter Failover

Click to view code

Primary DC (US-East): 3 nodes, RF=3
Secondary DC (EU-West): 2 nodes

Failure: Entire US-East goes down
  - EU still has RF=2 replicas
  - All data preserved
  - Failover time: seconds (client retries)
  - Trade-off: Possible data loss if RF < DC failures
  
Solution: Use RF=5, distribute replicas
  US-East: 3 replicas
  EU-West: 2 replicas
  → Lose entire DC, data survives

Backup and Recovery

Click to view code (bash)

# Snapshot (consistent backup)
nodetool snapshot user_keyspace

# Upload to S3
tar -czf snapshot.tar.gz /var/lib/cassandra/snapshots/
aws s3 cp snapshot.tar.gz s3://cassandra-backups/

# Restore from snapshot
# 1. Restore SSTables to node
# 2. Run repair to sync
nodetool repair -pr user_keyspace

Summary & Key Takeaways

Cassandra excels at:

✓ High write throughput (1M+ writes/sec)
✓ Linear scalability (add nodes, increase capacity)
✓ High availability (distributed, no single point of failure)
✓ Multi-datacenter support (native geo-replication)
✓ Time-series data (efficient range queries)

Key challenges:

✗ Eventual consistency (not strong by default)
✗ Operational complexity (cluster management, repairs)
✗ Query flexibility (must design tables per query)
✗ Learning curve (CQL, data modeling, tuning)
✗ No transactions (no ACID, eventual consistency)

Critical design questions:

What's my write throughput requirement (ops/sec)?
Do I need strong consistency or eventual is OK?
How long must I retain data?
What's my acceptable data loss (RPO)?
Do I need multi-datacenter replication?
What queries must I support (design tables)?
Can I handle operational complexity?

Cassandra — Practical Guide & Deep Dive

Apache Cassandra is an open-source, distributed NoSQL database originally built by Facebook for inbox search. It implements a partitioned wide-column storage model with eventually consistent semantics. Cassandra combines elements of Dynamo (partitioning, replication) and Bigtable (column-oriented storage) to handle massive data footprints, query volume, and flexible storage requirements.

Used by Discord, Netflix, Apple, and Bloomberg — Cassandra is here to stay.

Cassandra Basics

Data Model

Concept	Description
Keyspace	Top-level unit (like a "database"). Defines replication strategies and owns UDTs
Table	Lives within a keyspace. Organizes data into rows with a defined schema
Row	Single record identified by a primary key
Column	Data storage unit with name, type, value, and timestamp metadata. Columns can vary per row (wide-column)

At a basic level, Cassandra's data looks like nested JSON:

{
  "keyspace1": {
    "table1": {
      "row1": { "col1": 1, "col2": "2" },
      "row2": { "col1": 10, "col3": 3.0 },
      "row3": { "col4": { "company": "Hello Interview", "city": "Seattle" } }
    }
  }
}

Wide-column flexibility: Unlike relational databases that require an entry for every column per row (even NULL), Cassandra allows columns to vary per row.

Conflict resolution: Every column has a timestamp. Write conflicts between replicas are resolved via "last write wins".

Primary Key

Every row is uniquely identified by a primary key consisting of:

Component	Purpose
Partition Key	One or more columns determining which partition stores the row
Clustering Key	Zero or more columns determining sorted order within a partition

-- Partition key: a, no clustering keys
CREATE TABLE t (a text, b text, c text, PRIMARY KEY (a));

-- Partition key: a, clustering key: b ascending
CREATE TABLE t (a text, b text, c text, PRIMARY KEY ((a), b))
WITH CLUSTERING ORDER BY (b ASC);

-- Composite partition key: a+b, clustering key: c
CREATE TABLE t (a text, b text, c text, d text, PRIMARY KEY ((a, b), c));

-- Partition key: a, clustering keys: b+c
CREATE TABLE t (a text, b text, c text, d text, PRIMARY KEY ((a), b, c));

Key Concepts

Partitioning — Consistent Hashing

Cassandra achieves horizontal scalability by partitioning data across nodes using consistent hashing.

Traditional hashing problem: hash(value) % num_nodes causes massive data remapping when nodes are added/removed, and can produce uneven distribution.

Consistent hashing solution:

Values are hashed to positions on a ring of integers
Walking clockwise from the hash position finds the first node → that node stores the data
Adding/removing a node only affects the adjacent node's data

Virtual nodes (vnodes): To address uneven load, Cassandra maps multiple virtual nodes on the ring to physical nodes. Benefits:

More even distribution across the cluster
Larger physical machines can own more vnodes
Adding a node redistributes load from many existing nodes (not just one)

Replication

Keyspaces specify replication configuration. Cassandra replicates by scanning clockwise from the assigned vnode to find additional replica nodes (skipping vnodes on the same physical node).

Strategy	Description
SimpleStrategy	Scans clockwise for replicas. Good for testing/simple deployments
NetworkTopologyStrategy	Data-center and rack aware. Recommended for production. Ensures replicas span physical failure domains

-- SimpleStrategy: 3 replicas
ALTER KEYSPACE hello_interview
WITH REPLICATION = { 'class': 'SimpleStrategy', 'replication_factor': 3 };

-- NetworkTopologyStrategy: 3 replicas in dc1, 2 in dc2
ALTER KEYSPACE hello_interview
WITH REPLICATION = { 'class': 'NetworkTopologyStrategy', 'dc1': 3, 'dc2': 2 };

Consistency

Cassandra gives users flexibility over consistency levels for reads and writes — a tunable consistency vs. availability trade-off.

No ACID guarantees. Cassandra only supports atomic and isolated writes at the row level within a partition.

Consistency Level	Behavior
ONE	Single replica must respond
QUORUM	Majority (n/2 + 1) of replicas must respond
ALL	All replicas must respond

QUORUM on both reads and writes guarantees that writes are visible to reads because at least one overlapping node participates in both operations. With 3 replicas: 3/2 + 1 = 2 nodes must participate in each operation.

Cassandra aims for eventual consistency — all replicas converge to the latest data given enough time.

Query Routing

Any node can be a coordinator. Cassandra nodes know about other alive nodes via gossip protocol. When a client issues a query:

Client selects a node → it becomes the coordinator
Coordinator determines which nodes store the data (consistent hashing + replication strategy)
Coordinator issues queries to relevant replica nodes
Responses are returned to the client

Storage Model — LSM Tree

Cassandra uses a Log Structured Merge Tree (LSM tree) instead of a B-tree, favoring write speed over read speed.

Key insight: Every create/update/delete is a new entry. Cassandra uses ordering of entries to determine row state. Deletes write a tombstone marker.

Three core constructs:

Construct	Description
Commit Log	Write-ahead log for durability
Memtable	In-memory sorted structure (sorted by primary key)
SSTable	Immutable on-disk file flushed from Memtable

Write path:

Write is issued to a node
Written to commit log (durability guarantee)
Written to Memtable (in-memory)
When Memtable reaches threshold → flushed to disk as immutable SSTable
Corresponding commit log entries are removed

Read path:

Check Memtable first (latest data)
If not found, use bloom filter to identify candidate SSTables
Read SSTables newest-to-oldest to find latest data (sorted by primary key)

Compaction: Periodically merges SSTables to consolidate updates and remove tombstones, keeping read performance healthy.

SSTable Indexing: Files mapping keys to byte offsets in SSTables for fast on-disk retrieval.

Gossip Protocol

Cassandra nodes communicate cluster information via peer-to-peer gossip:

Nodes track generation (bootstrap timestamp) and version (logical clock) for each known node
These form a vector clock — nodes ignore stale state information
Seed nodes act as guaranteed gossip hotspots, preventing cluster fragmentation
Every node being aware of the cluster eliminates single points of failure

Fault Tolerance

Failure detection: Phi Accrual Failure Detector — each node independently decides if another node is available. Nodes aren't considered truly "down" unless an admin decommissions them.

Hinted handoffs: When a coordinator can't reach a replica node:

Coordinator temporarily stores the write data as a hint
Write proceeds successfully
When the offline node comes back, hints are forwarded to it

Hinted handoffs are short-term. Long-offline nodes require read repairs or rebuilds.

Data Modeling for Cassandra

Cassandra data modeling is query-driven (not entity-relationship-driven like relational DBs).

Key principles:

No JOINs, no foreign keys, no referential integrity
Design tables around access patterns first
Denormalize data across tables as needed
Consider: partition key choice, partition size limits, clustering key ordering

Example: Discord Messages

Discord uses Cassandra for message storage. Access pattern: users query recent messages for a channel.

First iteration:

CREATE TABLE messages (
  channel_id bigint,
  message_id bigint,
  author_id bigint,
  content text,
  PRIMARY KEY (channel_id, message_id)
) WITH CLUSTERING ORDER BY (message_id DESC);

Partition key: channel_id → single partition per channel, efficient reads
Clustering key: message_id DESC → reverse chronological order
Uses Snowflake IDs (chronologically sortable UUIDs) instead of timestamps to avoid collisions

Problem: Busy channels create huge partitions → performance degradation. Partitions grow monotonically.

Solution — Time bucketing:

CREATE TABLE messages (
  channel_id bigint,
  bucket int,
  message_id bigint,
  author_id bigint,
  content text,
  PRIMARY KEY ((channel_id, bucket), message_id)
) WITH CLUSTERING ORDER BY (message_id DESC);

bucket represents 10-day windows aligned to Discord's epoch
Limits partition size even for busiest channels
Most queries hit a single partition (latest bucket)
New buckets created automatically as time progresses

Example: Ticketmaster Ticket Browsing

Access pattern: users view event venue sections, then drill into individual seats.

First iteration (all seats per event):

CREATE TABLE tickets (
  event_id bigint,
  seat_id bigint,
  price bigint,
  PRIMARY KEY (event_id, seat_id)
);

Problem: Events with 10k+ tickets → large partitions, expensive aggregations for section summaries.

Better design — partition by section:

CREATE TABLE tickets (
  event_id bigint,
  section_id bigint,
  seat_id bigint,
  price bigint,
  PRIMARY KEY ((event_id, section_id), seat_id)
);

Denormalized summary table for venue overview:

CREATE TABLE event_sections (
  event_id bigint,
  section_id bigint,
  num_tickets bigint,
  price_floor bigint,
  PRIMARY KEY (event_id, section_id)
);

Venue overview → query event_sections (single partition, < 100 sections)
Section detail → query tickets (single partition per section)
Section stats don't need to be exact (Ticketmaster shows "100+" not exact counts)

Advanced Features

Feature	Description
Storage Attached Indexes (SAI)	Global secondary indexes on columns. Flexible querying with good (not optimal) performance. Avoids excess denormalization for infrequent queries
Materialized Views	Cassandra automatically materializes denormalized tables from a source table. Reduces application-level complexity for multi-table writes
Search Indexing	Wire up to Elasticsearch or Apache Solr via plugins (e.g., Stratio Lucene Index)

Cassandra in an Interview

When to use:

Systems prioritizing availability over consistency
High write throughput needs (LSM tree optimized writes)
High scalability requirements (linear horizontal scaling)
Flexible/sparse schemas (wide-column model)
Clear, well-defined access patterns the schema can revolve around

When NOT to use:

Strict consistency requirements (financial transactions, inventory)
Complex query patterns (multi-table JOINs, ad-hoc aggregations)
Small datasets that don't need distributed scaling
Rapidly evolving query patterns (each new query may need a new table)

Additional Interview Questions & Answers

Q6: Explain consistent hashing and virtual nodes. Why does Cassandra use them?

Answer:

Consistent hashing maps both data and nodes onto a ring of integers. Data is stored on the first node encountered when walking clockwise from the data's hash position.

Why not traditional hashing (hash % num_nodes)?

Adding/removing a node remaps ~all data (catastrophic in a distributed DB)
Consistent hashing: adding/removing a node only affects adjacent nodes' data

Virtual nodes (vnodes):

Each physical node owns multiple positions on the ring (default: 256 vnodes per node)
Benefits:

- Even distribution — more ring positions = more uniform data spread - Heterogeneous hardware — larger machines can own more vnodes - Faster rebalancing — adding a node takes small slices from many nodes instead of a large chunk from one - Faster rebuilds — data streams from many nodes in parallel

Interview tip: Consistent hashing is useful beyond Cassandra — apply it to cache distribution, load balancing, or any sharding problem.

Q7: How does Cassandra's write path differ from a traditional relational database? Why is Cassandra faster for writes?

Answer:

Aspect	Relational DB (B-tree)	Cassandra (LSM tree)
Write pattern	Random I/O (find and update in-place)	Sequential I/O (append to commit log + memtable)
On-disk structure	Mutable B-tree pages	Immutable SSTables
Write amplification	Low per write, but random seeks are slow	Higher (compaction), but writes themselves are fast
Read performance	Optimized (B-tree lookups)	Requires checking memtable + multiple SSTables

Why Cassandra is faster for writes:

Commit log is append-only → sequential disk I/O (fastest possible disk operation)
Memtable is in-memory → no disk I/O for the actual data write
No need to read-before-write (no in-place updates)
SSTable flushes are sequential bulk writes
No locking or transaction overhead

Tradeoff: Reads may need to check multiple SSTables → bloom filters and compaction mitigate this.

Q8: What are tombstones in Cassandra? Why can they be problematic?

Answer:

Tombstones are markers that Cassandra writes when data is deleted. Since SSTables are immutable, Cassandra can't remove data in-place — instead it writes a tombstone that says "this data is deleted."

How they work:

Delete issued → tombstone written to memtable → flushed to SSTable
During reads, tombstones override earlier versions of the data
During compaction, tombstoned data is permanently removed (after gcgraceseconds, default 10 days)

Why they're problematic:

Read performance degradation — reads must scan through tombstones before finding live data
Memory pressure — tombstones consume memory during reads
"Tombstone storm" — deleting many rows at once creates massive numbers of tombstones
If gcgraceseconds is too short, deleted data can resurrect via repair from a replica that missed the delete

Best practices:

Avoid frequent deletes — use TTLs for time-expiring data instead
Use appropriate compaction strategies (e.g., TimeWindowCompactionStrategy for time-series data with TTLs)
Monitor tombstone counts per read (TombstoneScannedHistogram)
Don't set gcgraceseconds lower than your repair cycle

Q9: How would you model a time-series IoT sensor system in Cassandra?

Answer:

Requirements: Millions of sensors, each reporting every few seconds. Query patterns: latest readings per sensor, readings in a time range.

Schema:

CREATE TABLE sensor_readings (
  sensor_id uuid,
  day date,
  reading_time timestamp,
  temperature double,
  humidity double,
  pressure double,
  PRIMARY KEY ((sensor_id, day), reading_time)
) WITH CLUSTERING ORDER BY (reading_time DESC)
  AND default_time_to_live = 2592000;  -- 30-day TTL

Design decisions:

Decision	Reasoning
Partition key: `(sensor_id, day)`	Bounds partition size to one day per sensor; prevents unbounded growth
Clustering key: `reading_time` DESC	Latest readings returned first (most common query pattern)
TTL: 30 days	Auto-expiration avoids tombstone accumulation from manual deletes
Compaction: `TimeWindowCompactionStrategy`	Optimized for time-series data where entire windows expire together

Query examples:

-- Latest readings for sensor today
SELECT * FROM sensor_readings
WHERE sensor_id = ? AND day = '2026-03-30'
LIMIT 10;

-- Readings in a time range
SELECT * FROM sensor_readings
WHERE sensor_id = ? AND day = '2026-03-30'
AND reading_time >= '2026-03-30 10:00:00'
AND reading_time <= '2026-03-30 12:00:00';

Scaling: With 1M sensors reporting every 5 seconds = 200k writes/sec. At ~100 bytes per reading, daily storage ≈ 1.7 TB. Cassandra handles this comfortably with a well-sized cluster (10-15 nodes).

Q10: Compare Cassandra vs DynamoDB. When would you choose each?

Answer:

Feature	Cassandra	DynamoDB
Deployment	Self-managed or managed (Astra DB)	Fully managed (AWS)
Data model	Wide-column (flexible schema)	Key-value + document
Primary key	Partition key + clustering key	Partition key + sort key
Consistency	Tunable (ONE to ALL)	Eventually consistent or strongly consistent
Scaling	Manual (add nodes, rebalance)	Automatic (on-demand or provisioned)
Multi-region	NetworkTopologyStrategy	Global Tables
Secondary indexes	SAI, materialized views	GSI, LSI
Cost model	Infrastructure + ops team	Pay-per-request or provisioned capacity
Vendor lock-in	None (open source)	AWS only
Throughput	Very high (you control hardware)	High (AWS manages limits)

Choose Cassandra when:

You need vendor independence
You want full control over tuning and infrastructure
Your team has operational expertise
You need extreme write throughput with custom hardware

Choose DynamoDB when:

You want zero operational overhead
You're already in the AWS ecosystem
You need automatic scaling without capacity planning
Your team is small and can't dedicate resources to DB operations

Q11: Explain Cassandra's gossip protocol. How does it prevent split-brain?

Answer:

Gossip is Cassandra's peer-to-peer protocol for sharing cluster state:

Every second, each node randomly selects 1-3 other nodes to gossip with
Nodes exchange heartbeat state — generation (bootstrap timestamp) + version (incrementing counter)
These form a vector clock — newer information supersedes older
Nodes probabilistically bias gossip toward seed nodes

Why seed nodes matter:

Seed nodes are guaranteed gossip targets that prevent cluster fragmentation
Without them, sub-clusters could form where groups of nodes only know about each other
New nodes use seed nodes for initial cluster discovery

Split-brain prevention:

Cassandra doesn't use leader election (no single point of failure)
Every node can serve requests independently
Phi Accrual Failure Detector provides probabilistic failure detection (not binary up/down)
Nodes don't make irreversible decisions about cluster membership — a "convicted" node simply re-enters the cluster when it starts heartbeating again
Hinted handoffs ensure writes aren't lost during temporary network partitions

Key difference from consensus-based systems (Raft/Paxos): Cassandra favors availability — it doesn't require quorum agreement for cluster membership changes. This makes it more resilient to network partitions but means it relies on eventual consistency.

Q12: How would you handle a "hot partition" in Cassandra?

Answer:

Hot partition: A single partition receives disproportionate read/write traffic, overloading the node(s) hosting it.

Detection:

Monitor org.apache.cassandra.metrics:type=Table metrics per table
Look for uneven load across nodes (nodetool tablehistograms)
Client-side latency spikes for specific partition keys

Solutions:

Strategy	How It Works	Tradeoff
Add bucketing to partition key	Split `(userid)` into `(user``id, bucket)` where bucket = `hash(timestamp) % N`	Reads must query N buckets and merge
Application-level caching	Cache hot partition data in Redis/Memcached	Added infrastructure, cache invalidation complexity
Scatter-gather with salt	Add random salt to partition key, fan-out reads	Higher read latency, more complex application logic
Redesign data model	Re-examine access patterns; maybe the partition key choice is wrong	May require migration
Rate limiting	Throttle writes/reads to hot keys at application layer	May degrade user experience

Prevention (at design time):

Avoid low-cardinality partition keys (e.g., country for a global app)
Estimate max partition size and access frequency during data modeling
Use compound partition keys to distribute load (like Discord's (channel_id, bucket))

Q13: What compaction strategies does Cassandra offer and when should you use each?

Answer:

Strategy	Best For	How It Works
SizeTieredCompactionStrategy (STCS)	Write-heavy workloads	Merges similarly-sized SSTables. Simple but can cause temporary space amplification (needs 2x disk)
LeveledCompactionStrategy (LCS)	Read-heavy workloads	Organizes SSTables into levels of increasing size. Guarantees at most 1 SSTable per partition per level. Better read performance, higher write amplification
TimeWindowCompactionStrategy (TWCS)	Time-series data with TTL	Groups SSTables by time window. Entire windows drop when all data expires. Very efficient for TTL-based expiration
UnifiedCompactionStrategy (UCS)	General purpose (Cassandra 5.0+)	Adaptive strategy that combines benefits of STCS and LCS based on workload characteristics

Decision guide:

Write-heavy, rarely read → STCS
Read-heavy, frequent updates → LCS
Time-series with TTL → TWCS
Not sure / mixed workload → UCS (if available)

Interview tip: Mentioning compaction strategy shows deep understanding. The key insight is that compaction is the price Cassandra pays for fast writes — it defers the "cleanup" work to background processes.

Q14: How does Cassandra compare to MongoDB for a system design interview?

Answer:

Feature	Cassandra	MongoDB
Data model	Wide-column (rows with varying columns)	Document (JSON/BSON)
Query language	CQL (SQL-like, limited)	MQL (rich query language, aggregation pipeline)
Schema	Defined per table (flexible columns)	Schema-less (flexible documents)
Consistency	Tunable (eventual by default)	Strong by default (single-document ACID)
Scaling	Peer-to-peer (no single point of failure)	Primary-secondary (primary handles writes)
Write performance	Excellent (LSM tree, sequential I/O)	Good (WiredTiger B-tree)
JOINs	Not supported	Not native (use `$lookup` for basic joins)
Secondary indexes	SAI (limited)	Rich secondary indexes
Multi-region	Native (NetworkTopologyStrategy)	Atlas Global Clusters

Choose Cassandra when:

Availability > consistency
Extreme write throughput
Well-defined, predictable access patterns
Multi-datacenter replication is critical

Choose MongoDB when:

Flexible/evolving query patterns
Rich querying and aggregation needed
Single-document transactions suffice
Developer productivity is a priority (richer query language)

Answer:

Access patterns:

View a user's own posts (profile page)
View a user's home timeline (posts from people they follow)
Post a new message

Schema:

-- User's own posts (profile view)
CREATE TABLE user_posts (
  user_id bigint,
  post_id bigint,        -- Snowflake ID (chronologically sortable)
  content text,
  media_urls list<text>,
  like_count counter,     -- Separate counter table in practice
  PRIMARY KEY (user_id, post_id)
) WITH CLUSTERING ORDER BY (post_id DESC);

-- Home timeline (fan-out on write)
CREATE TABLE home_timeline (
  user_id bigint,
  post_id bigint,
  author_id bigint,
  content text,
  author_name text,       -- Denormalized for read efficiency
  PRIMARY KEY (user_id, post_id)
) WITH CLUSTERING ORDER BY (post_id DESC);

Fan-out on write strategy:

User posts a message → write to user_posts
Look up all followers of the user
For each follower, write a copy to their home_timeline

Design considerations:

Concern	Solution
Celebrity fan-out	Hybrid approach — fan-out on write for normal users, fan-out on read for celebrities (> 1M followers)
Partition size	Use TTL or bucketing `(user_id, bucket)` to prevent unbounded timeline growth
Stale denormalized data	Author name changes are infrequent; eventual consistency is acceptable
Timeline truncation	Only keep latest 1000 posts per timeline; older posts fetched on-demand

Query:

-- Home timeline (latest 20 posts)
SELECT * FROM home_timeline
WHERE user_id = ?
LIMIT 20;

-- Pagination via search_after
SELECT * FROM home_timeline
WHERE user_id = ? AND post_id < ?
LIMIT 20;

AWS services

DyanmoDB

Cassandra

Apache Cassandra - Complete Deep Dive

What is Cassandra?

Core Characteristics

Cassandra Architecture Overview

Core Components

1. Consistent Hashing & Token Ring

2. Keyspaces and Tables

3. Data Model: Primary Key

4. Write Path

5. Read Path

Consistency & Availability Trade-offs

Consistency Levels

Replication & Distribution

Replication Factor (RF)

Multi-Datacenter Replication

Compaction Strategy

Performance Optimization

Write Optimization

Read Optimization

Hardware Configuration

Scalability & High Availability

Cluster Sizing

High Availability Strategy

Repair Mechanism

Use Cases

1. Time-Series Data (Metrics, Logs)

2. User Profiles & Activity

3. Message Queue (Kafka-like)

4. Real-time Analytics

Interview Questions & Answers

Q1: Design a real-time analytics system for 1M events/sec

Q2: Node fails. How to recover without data loss?

Q3: Strong consistency requirement (financial transactions). How to achieve?

Q4: Query latency spike. How to diagnose?

Q5: Design a distributed cache (like Redis) with Cassandra durability

Cassandra vs Alternatives

Best Practices

Data Modeling

Operational Best Practices

Query Optimization

Disaster Recovery Strategy

Single-Datacenter Failure

Multi-Datacenter Failover

Backup and Recovery

Summary & Key Takeaways

Cassandra — Practical Guide & Deep Dive

Cassandra Basics

Data Model

Primary Key

Key Concepts

Partitioning — Consistent Hashing

Replication

Consistency

Query Routing

Storage Model — LSM Tree

Gossip Protocol

Fault Tolerance

Data Modeling for Cassandra

Example: Discord Messages

Example: Ticketmaster Ticket Browsing

Advanced Features

Cassandra in an Interview

Additional Interview Questions & Answers

Q6: Explain consistent hashing and virtual nodes. Why does Cassandra use them?

Q7: How does Cassandra's write path differ from a traditional relational database? Why is Cassandra faster for writes?

Q8: What are tombstones in Cassandra? Why can they be problematic?

Q9: How would you model a time-series IoT sensor system in Cassandra?

Q10: Compare Cassandra vs DynamoDB. When would you choose each?

Q11: Explain Cassandra's gossip protocol. How does it prevent split-brain?

Q12: How would you handle a "hot partition" in Cassandra?

Q13: What compaction strategies does Cassandra offer and when should you use each?

Q14: How does Cassandra compare to MongoDB for a system design interview?

Q15: Design a Cassandra schema for a social media feed (like Twitter/X timeline).