Caching Systems (Part 1): Layers, Strategies, and Failure Modes

Caching is one of the highest-leverage tools in backend engineering: it can turn expensive work into cheap work, flatten latency spikes, and reduce infrastructure cost. It can also quietly introduce correctness bugs, inconsistent user experiences, and operational incidents.

This article is not a “use Redis” tutorial. It is a practical deep dive into how caching works as a system—across layers, time, and failure modes—so you can design caches you can actually trust.


What Caching Really Is

A cache is a time-bounded copy of data or computation that trades one resource for another:

  • Trade CPU / database I/O for memory
  • Trade network round trips for locality
  • Trade strict freshness for speed

Every cache makes an implicit promise:

“This value is probably good enough until time T or until event E.”

You are not just choosing a technology. You are choosing a consistency model.


The Caching Stack: Where Caches Live

Most production systems have multiple caches stacked on top of each other. Understanding them as a pipeline helps you reason about behavior.

Client
  
  
[ Browser Cache ]
  
  
[ CDN / Edge Cache ]
  
  
[ Reverse Proxy Cache ]   (Nginx / Varnish)
  
  
[ Application Server ]
      ├─ Request-scope Cache
      ├─ In-process Cache
      └─ Shared Server Cache (Redis / Memcached)
  
  
[ Database ]
      ├─ Query / Plan Cache
      └─ Buffer / Page Cache

Think of caching as progressively closer copies of data and computation.


1) Request-scope cache (per-request)

This cache lives only for the lifetime of a single request.

Examples:

  • Memoization inside a request handler
  • Django cached_property
  • ORM identity maps
  • GraphQL DataLoader-style batching

Why it matters:

  • Eliminates duplicate work inside one request
  • Collapses N+1 query patterns

Properties:

  • No invalidation needed
  • Zero consistency risk
  • Often the highest ROI cache

2) In-process server cache

Cached in application memory.

Examples:

  • LRU caches
  • Module-level dictionaries
  • Framework-level local caches

Pros:

  • Extremely fast (no network)
  • Simple mental model

Cons:

  • Not shared across instances
  • Evicted on deploy or restart

Best used for:

  • Configuration
  • Feature flags
  • Pure, deterministic computations

3) Shared server-side cache (Redis / Memcached)

This is the classic backend cache layer.

Characteristics:

  • Shared across instances
  • Network hop required
  • Centralized invalidation point

Common responsibilities:

  • Data caching (objects, query results)
  • Computation caching
  • Coordination primitives (locks, rate limits)

Design requirement:

  • Cache failure must degrade gracefully

4) Reverse proxy cache

Sits in front of application servers.

Effective when:

  • Responses are identical across users
  • Authentication is handled upstream
  • You want to absorb traffic spikes

Trade-offs:

  • Limited visibility into per-user context
  • Invalidation can be coarse-grained

5) Database-internal caches

Always present, often forgotten.

Includes:

  • Buffer pools
  • Page cache
  • Query / execution plan caches

Important implication:

You may already benefit from caching even without Redis.

In many cases, fixing queries outperforms adding a new cache layer.

Key idea: you cannot reason about caching by looking at Redis alone. Your system already caches in multiple places.


What Should Be Cached?

A useful way to decide is to categorize what you want to cache:

A) Content caching (responses)

Cache the full HTTP response.

  • Fastest end-to-end
  • Best for read-heavy endpoints
  • Requires careful consideration of auth and personalization

B) Data caching (objects, query results)

Cache the underlying data representation.

  • More flexible
  • Can be shared across multiple endpoints
  • Usually requires more invalidation logic

C) Computation caching (derived results)

Cache expensive computations.

  • Ranking, aggregation, recommendations, reports
  • Often benefits most from “stale-while-revalidate” patterns

D) Negative caching

Cache “nothing here.”

  • 404s, empty results, permission-denied checks
  • Prevents repeated expensive misses
  • Must be used carefully (avoid caching temporary failures)

TTL Is Not Invalidation

TTL (time-to-live) is the most common caching tool because it’s easy:

  • Put value in cache for 60 seconds
  • Recompute after it expires

But TTL is not invalidation. It is a bounded staleness guarantee.

Use TTL when:

  • Data changes frequently, and you can tolerate being “a bit behind”
  • Exact freshness is not critical (feeds, counts, trending lists)

Avoid TTL-only caches when:

  • Users expect strong correctness (billing, permissions, inventory)
  • A single stale value can cause harm

Invalidation Strategies That Scale

The “two hard things” joke exists for a reason: invalidation is where systems fail.

1) Write-through

Update cache at the same time you write to the source of truth.

  • Pros: reads are fast and fresh
  • Cons: writes become slower, and failures are tricky (what if cache update fails?)

2) Write-back (rare)

Write to cache first, flush to DB later.

  • Pros: extremely fast writes
  • Cons: complexity and data-loss risk; usually not worth it outside specialized systems

3) Cache-aside (lazy loading)

On read: check cache; if missing, load from DB and store. On write: update DB, then invalidate cache.

  • Pros: simple and popular
  • Cons: invalidation is your job; race conditions exist

4) Event-driven invalidation

Publish events on writes; consumers invalidate relevant keys.

  • Pros: good decoupling and scale
  • Cons: requires a reliable event pipeline and careful key design

Correctness Pitfalls and How to Avoid Them

Cache key design errors

Bad cache keys cause incorrect data to leak across users.

Rules of thumb:

  • Include all dimensions that affect the result (user id, locale, permissions, feature flags)
  • Normalize inputs (lowercase, stable sorting, canonical query params)
  • Version your keys (v1: prefix) so you can invalidate by migration

Example key shape:

v2:product:{id}:locale:{locale}:currency:{currency}

The thundering herd (cache stampede)

When a hot key expires, many requests recompute at once.

Mitigations:

  • Probabilistic early refresh: refresh before TTL hits zero
  • Soft TTL + hard TTL: serve stale value briefly while refreshing in background
  • Single-flight locking: only one worker recomputes; others wait or serve stale

Caching errors and timeouts

Caching failures should degrade gracefully.

  • If cache is down, your app must still work (maybe slower)
  • Treat the cache as an optimization, not a dependency—unless you’ve designed it to be one

Caching authorization and permissions

A classic incident pattern:

  1. Cache a response for user A
  2. Serve it to user B by accident

If your response depends on auth context, never share keys across users unless you have a robust permission-aware key strategy.


Stale-While-Revalidate: The Most Practical “Advanced” Pattern

Many teams aim for perfect freshness and end up with brittle systems. A better pattern for many endpoints is:

  • Serve cached value immediately (even if slightly stale)
  • Refresh asynchronously when it approaches expiration

This pattern smooths load spikes and improves tail latency.

A simple mental model:

  • Hard TTL: maximum staleness you will ever serve
  • Soft TTL: when you start refreshing

When soft TTL is hit:

  • If cache has value: serve it
  • Trigger a background refresh

When hard TTL is hit:

  • Block and recompute (or return fallback)

This turns cache expiry from a cliff into a ramp.


Observability: If You Can’t Measure It, You Can’t Trust It

You should be able to answer these questions from dashboards:

  • Cache hit ratio (overall and per key group)
  • Miss latency and backend load during misses
  • Evictions (memory pressure or policy)
  • Hot keys (top N keys by QPS)
  • Stampede symptoms (miss spikes, lock contention)

Be cautious with “hit ratio” as a vanity metric: a high hit ratio can still hide correctness bugs or stampedes.


When Not to Cache

Caching is not free. Don’t cache when:

  • Data is highly personalized and low-reuse
  • Results must be correct in real time (payments, stock levels, permissions)
  • You can achieve the performance goal by optimizing queries, indexes, or batching
  • The operational complexity outweighs the savings

A good heuristic:

Cache after you understand the bottleneck, not before.


A Practical Checklist

Before adding a cache, confirm:

  • What are you caching? response / data / computation / negative
  • What is the acceptable staleness? seconds? minutes? must be exact?
  • What triggers invalidation? TTL only? writes? events?
  • What is the key shape? versioned, canonical, permission-aware
  • How do you prevent stampedes? soft/hard TTL, single-flight, jitter
  • What happens when cache fails? fallback path and timeouts
  • How will you measure success? p95 latency, DB QPS, error rate

Final Takeaway

Caching is not about making things fast. It is about deciding where you are willing to be stale, and for how long.

A concrete conclusion you can apply:

  • Start with request-scope caching to eliminate duplicate work safely.
  • Add shared caches (Redis/CDN) only after you understand reuse patterns.
  • Use TTL for tolerance, not correctness.
  • Treat invalidation as a design requirement, not an afterthought.
  • Prefer stale-while-revalidate over hard expiration for user-facing reads.
  • Design cache keys and observability before you ship the cache.

If you cannot clearly answer what can be stale, for how long, and what breaks if it is, you are not ready to add a cache.

Caching works best when it is deliberate, bounded, and measurable—not when it is added reactively in response to slow queries.

Prev
Next