Caching Strategies

Cache Aside vs Write-Through vs Write-Behind

StrategyHow It WorksProsConsWhen to Use
Cache Aside (Lazy Loading)App checks cache first; on miss, loads from DB and populates cacheSimple; cache contains only requested data; resilient to cache failuresInitial request is slow (cache miss); stale data possibleRead-heavy with unpredictable access patterns (user profiles, product pages)
Write-ThroughApp writes to cache and DB synchronouslyCache always consistent with DB; read hits are fastWrite latency (two writes); wasted cache space for unread dataRead-heavy with critical consistency (inventory, pricing)
Write-Behind (Write-Back)App writes to cache; cache asynchronously writes to DBFast writes; reduced DB load; batch writes possibleRisk of data loss if cache fails; eventual consistencyWrite-heavy with acceptable data loss risk (analytics events, logs)
Refresh-AheadCache proactively refreshes before expirationReduced latency; no cache miss penaltyWasted resources if data not accessed; complex to implementPredictable access patterns (homepage, trending content)

Why Choose Each:

  • Cache Aside: Most flexible and common pattern; works when you don't know what to cache upfront
  • Write-Through: Need strong consistency between cache and DB
  • Write-Behind: Extremely high write throughput needed, can tolerate data loss
  • Refresh-Ahead: Predictable access patterns, zero cache miss tolerance

Tradeoff Summary:

  • Cache Aside: Flexibility + Resilience ↔ Cache misses, stale data
  • Write-Through: Consistency ↔ Write latency
  • Write-Behind: Write performance ↔ Data loss risk, complexity

Caching Strategies Deep Dive

Cache Aside (Lazy Loading) Pattern

Click to view code
GET request → Check Cache → Hit → Return
                         ↓
                       Miss → Query DB → Store in Cache → Return

Pros:

  • Simplest to implement; no special cache setup
  • Cache only contains accessed data (no waste)
  • Resilient: DB failures don't block (stale cache still works)
  • Great for unpredictable access patterns

Cons:

  • First request always slow (cache miss)
  • Stale data possible (no cache coherency)
  • "Cache Stampede": Multiple requests for same miss key → multiple DB hits

Best for: User profiles, product pages, search results

Implementation (Redis + Python):

Click to view code (python)
def get_user(user_id):
    # Check cache
    cached = redis.get(f"user:{user_id}")
    if cached:
        return json.loads(cached)
    
    # Cache miss: load from DB
    user = db.query(f"SELECT * FROM users WHERE id={user_id}")
    
    # Store in cache (1-hour TTL)
    redis.setex(f"user:{user_id}", 3600, json.dumps(user))
    
    return user

Write-Through Pattern

Click to view code
SET request → Write to Cache → Write to DB → Return
              (synchronous)

Pros:

  • Cache always consistent with DB
  • No stale data
  • Read hits are ultra-fast (cache warmed)

Cons:

  • Write latency doubled (two writes sequentially)
  • Cache fills with unread data (memory waste)
  • If cache fails, requests block until DB responds

Best for: Inventory, pricing, financial data (consistency > latency)

Implementation (Redis + Python):

Click to view code (python)
def update_product_price(product_id, new_price):
    # Write to cache first
    redis.set(f"product:{product_id}:price", new_price)
    
    # Then write to DB
    db.execute(f"UPDATE products SET price={new_price} WHERE id={product_id}")
    
    return {"status": "success"}

Tradeoff: Users wait ~10-50ms extra per write, but reads are instant.


Write-Behind (Write-Back) Pattern

Click to view code
SET request → Write to Cache → Return (asynchronously write to DB)
                              ↓
                     Background job syncs to DB

Pros:

  • Extremely fast writes (cache write only)
  • Reduced DB load (batch writes, deduplication)
  • Can reorder/optimize writes

Cons:

  • Cache failure = data loss
  • Consistency lag (DB lags cache by seconds/minutes)
  • Requires reliable queue/persistence (RDB, AOF)

Best for: Analytics events, logs, non-critical metrics

Implementation (Redis + async job):

Click to view code (python)
# Fast write to cache
def log_event(user_id, event_type):
    redis.lpush(f"events:{user_id}", json.dumps({
        "type": event_type,
        "timestamp": time.time()
    }))
    return {"status": "queued"}

# Background worker (runs every 10 seconds)
def flush_events_to_db():
    for user_id in redis.keys("events:*"):
        events = redis.lrange(f"events:{user_id}", 0, -1)
        db.batch_insert("events", events)
        redis.delete(f"events:{user_id}")

Risk: If Redis crashes before flush, events are lost. Use RDB/AOF to mitigate.


Refresh-Ahead Pattern

Click to view code
User requests data → Cache always has fresh copy (proactively refreshed)
                   ↓
            Background job refreshes before expiry

Pros:

  • Zero cache miss latency (always fresh data)
  • Predictable performance
  • Ideal for high-traffic pages

Cons:

  • Wasted computation if data not accessed
  • Complex implementation (need scheduler)
  • Hard to predict optimal refresh interval

Best for: Homepage, trending content, dashboards

Implementation:

Click to view code (python)
# Refresh-Ahead for trending articles
def refresh_trending_articles():
    articles = db.query("SELECT * FROM articles ORDER BY views DESC LIMIT 10")
    redis.set("trending:articles", json.dumps(articles))
    redis.expire("trending:articles", 300)  # 5-minute TTL

# Schedule every 4.5 minutes (refresh before 5-minute expiry)
schedule.every(4.5).minutes.do(refresh_trending_articles)

# On read: always fast
def get_trending():
    return redis.get("trending:articles")  # Guaranteed hit

Cache Invalidation Strategies

TTL-Based (Simple)

Click to view code
SET key value EX 3600  # Expires in 1 hour

Pro: Simple, no coordination Con: Stale data for up to TTL duration

Event-Based (Accurate)

Click to view code
On data update:
  1. Update database
  2. Publish "user:123:updated" event
  3. Cache subscribers delete key

Pro: Instant invalidation, no stale data Con: Complex, requires pub/sub

Hybrid (Best Practice)

Click to view code
TTL + Event-based:
  1. Set 1-hour TTL (safety net)
  2. On write, publish invalidation event (immediate)
  3. If event missed, TTL catches it eventually

Interview Questions & Answers

Q1: Design a caching strategy for Uber's surge pricing. What pattern would you use?

Answer: Use Write-Through because:

  • Consistency critical: incorrect pricing loses money
  • Write frequency: moderate (updated every 5-10 minutes by algorithm)
  • Read frequency: extremely high (every ride request)

Implementation:

Click to view code
Surge pricing update:
  1. Algorithm calculates new price_multiplier for zone
  2. Write to cache: SET surge:zone:123 1.5x
  3. Write to database: UPDATE surge_pricing SET multiplier=1.5
  4. Return confirmation

Rider requests ride:
  1. GET surge:zone:123 → Returns 1.5x instantly (cache hit)
  2. Zero latency, always consistent with DB

Why not Write-Behind? If cache loses surge pricing, system would quote wrong prices, losing revenue.


Q2: You're designing a Twitter-like feed cache. The feed is read-heavy but updates aren't critical. Which pattern?

Answer: Use Cache Aside because:

  • Unpredictable access: users request different feeds (friends vary)
  • Stale data acceptable: tweets 5 minutes old are fine
  • Simple implementation: no write coordination needed

Implementation:

Click to view code (python)
def get_home_feed(user_id):
    cache_key = f"feed:{user_id}"
    
    # Check cache first
    feed = redis.get(cache_key)
    if feed:
        return json.loads(feed)
    
    # Cache miss: query DB
    feed = db.query("""
        SELECT tweets FROM tweets 
        WHERE author_id IN (SELECT following_id FROM follows WHERE follower_id = ?)
        ORDER BY created_at DESC LIMIT 50
    """, user_id)
    
    # Store for 10 minutes
    redis.setex(cache_key, 600, json.dumps(feed))
    return feed

Cache Stampede Protection:

Click to view code (python)
def get_feed_with_lock(user_id):
    cache_key = f"feed:{user_id}"
    feed = redis.get(cache_key)
    if feed:
        return json.loads(feed)
    
    # Use lock to prevent multiple DB queries
    lock_key = f"feed:{user_id}:lock"
    if redis.set(lock_key, "1", NX=True, EX=5):
        # Got lock, fetch from DB
        feed = fetch_from_db(user_id)
        redis.setex(cache_key, 600, json.dumps(feed))
    else:
        # Wait for lock holder to populate cache
        time.sleep(0.1)
        feed = redis.get(cache_key)
    
    return json.loads(feed)

Q3: Your system writes 1 million events/second. Which caching strategy handles this?

Answer: Use Write-Behind because:

  • Write latency matters: 1M events/sec = can't afford 2 writes (Write-Through)
  • Eventual consistency acceptable: logs/events can lag
  • Batch efficiency: Collect 100K events, write once = 10x throughput

Architecture:

Click to view code
Events → Redis queue → Batch processor → PostgreSQL
  ↓
  1M writes/sec (Cache side only, instant)
  ↓
  Background job every 100ms
  ↓
  10K-100K events/batch → PostgreSQL (optimized bulk insert)

Implementation:

Click to view code (python)
# Fast write (cache only)
def log_event(event):
    redis.lpush("events:pending", json.dumps(event))
    # Optional: if queue > threshold, trigger flush
    if redis.llen("events:pending") > 50000:
        flush_to_db()

# Batch flush (runs every 100ms)
def flush_to_db():
    events = redis.lrange("events:pending", 0, -1)
    if not events:
        return
    
    # Batch insert
    db.execute("""
        INSERT INTO events (user_id, type, timestamp) 
        VALUES (?, ?, ?)
    """, [(e['user_id'], e['type'], e['ts']) for e in events])
    
    redis.delete("events:pending")

Tradeoff: If Redis crashes, lose ~100ms of events. Mitigate with RDB persistence (fsync every 100ms).


Q4: When would you NOT use caching? When is it counterproductive?

Answer: Don't cache when:

  1. Data changes constantly (> 50% update:read ratio)
  2. - Example: Real-time stock prices, live auction bids - Cache becomes stale immediately - Better: Stream data directly

  1. Data is tiny (< 1KB)
  2. - Example: Single boolean flags - Cache overhead > benefit - Better: Store in application state or database

  1. First-access is critical
  2. - Example: Medical records (must be current) - Cache lag is dangerous - Better: No cache, strong consistency

  1. Memory is constrained
  2. - Example: IoT devices with 512MB RAM - Cache eviction thrashing - Better: Optimize database queries instead

  1. Data has complex dependencies
  2. - Example: Calculated fields depending on 5+ tables - Invalidation is complex and error-prone - Better: Compute on-demand or use materialized views

Example scenario:

Click to view code
User updates profile name → cascades to:
- Comment author names
- Post author names  
- Mention notifications
- Search indexes

Caching this = invalidation nightmare.
Better: Store only in DB, compute names on-demand

Q5: How would you choose a TTL (time-to-live) for cached data?

Answer: Formula: TTL = max(acceptablestaleness, computationcost)

Guidelines by use case:

Use CaseTTLReason
Homepage1-5 minutesContent changes frequently, load is predictable
User profile10-30 minutesRarely changes, acceptable to be 10min stale
Product catalog1 hourChanges during business hours, not real-time
Leaderboard5 secondsCompetitions require near-real-time accuracy
Static assets24 hoursNever changes after upload
Auth tokensToken lifetimeSecurity critical, don't extend arbitrarily
DB query resultVariableDepends on data freshness requirement

Dynamic TTL example:

Click to view code (python)
def get_product(product_id):
    if is_flash_sale_active():
        ttl = 10  # 10 seconds (prices changing fast)
    else:
        ttl = 3600  # 1 hour (normal case)
    
    cache_key = f"product:{product_id}"
    product = redis.get(cache_key)
    if not product:
        product = db.query(f"SELECT * FROM products WHERE id={product_id}")
        redis.setex(cache_key, ttl, json.dumps(product))
    return product

Rule of thumb:

  • If data updates hourly → TTL = 15 minutes (cache 4 stale versions)
  • If data updates daily → TTL = 4 hours
  • If data updates on-demand → TTL = 0 (cache-aside with event invalidation)