Caching Systems (Part 1): Layers, Strategies, and Failure Modes

December 21, 2025

Caching is one of the highest-leverage tools in backend engineering: it can turn expensive work into cheap work, flatten latency spikes, and reduce infrastructure cost. It can also quietly introduce correctness bugs, inconsistent user experiences, and operational incidents.

This article is not a “use Redis” tutorial. It is a practical deep dive into how caching works as a system—across layers, time, and failure modes—so you can design caches you can actually trust.

What Caching Really Is

A cache is a time-bounded copy of data or computation that trades one resource for another:

Trade CPU / database I/O for memory
Trade network round trips for locality
Trade strict freshness for speed

Every cache makes an implicit promise:

“This value is probably good enough until time T or until event E.”

You are not just choosing a technology. You are choosing a consistency model.

The Caching Stack: Where Caches Live

Most production systems have multiple caches stacked on top of each other. Understanding them as a pipeline helps you reason about behavior.

Client
  │
  ▼
[ Browser Cache ]
  │
  ▼
[ CDN / Edge Cache ]
  │
  ▼
[ Reverse Proxy Cache ]   (Nginx / Varnish)
  │
  ▼
[ Application Server ]
  │    ├─ Request-scope Cache
  │    ├─ In-process Cache
  │    └─ Shared Server Cache (Redis / Memcached)
  │
  ▼
[ Database ]
  │    ├─ Query / Plan Cache
  │    └─ Buffer / Page Cache

Think of caching as progressively closer copies of data and computation.

1) Request-scope cache (per-request)

This cache lives only for the lifetime of a single request.

Examples:

Memoization inside a request handler
Django cached_property
ORM identity maps
GraphQL DataLoader-style batching

Why it matters:

Eliminates duplicate work inside one request
Collapses N+1 query patterns

Properties:

No invalidation needed
Zero consistency risk
Often the highest ROI cache

2) In-process server cache

Cached in application memory.

Examples:

LRU caches
Module-level dictionaries
Framework-level local caches

Pros:

Extremely fast (no network)
Simple mental model

Cons:

Not shared across instances
Evicted on deploy or restart

Best used for:

Configuration
Feature flags
Pure, deterministic computations

3) Shared server-side cache (Redis / Memcached)

This is the classic backend cache layer.

Characteristics:

Shared across instances
Network hop required
Centralized invalidation point

Common responsibilities:

Data caching (objects, query results)
Computation caching
Coordination primitives (locks, rate limits)

Design requirement:

Cache failure must degrade gracefully

4) Reverse proxy cache

Sits in front of application servers.

Effective when:

Responses are identical across users
Authentication is handled upstream
You want to absorb traffic spikes

Trade-offs:

Limited visibility into per-user context
Invalidation can be coarse-grained

5) Database-internal caches

Always present, often forgotten.

Includes:

Buffer pools
Page cache
Query / execution plan caches

Important implication:

You may already benefit from caching even without Redis.

In many cases, fixing queries outperforms adding a new cache layer.

Key idea: you cannot reason about caching by looking at Redis alone. Your system already caches in multiple places.

What Should Be Cached?

A useful way to decide is to categorize what you want to cache:

A) Content caching (responses)

Cache the full HTTP response.

Fastest end-to-end
Best for read-heavy endpoints
Requires careful consideration of auth and personalization

B) Data caching (objects, query results)

Cache the underlying data representation.

More flexible
Can be shared across multiple endpoints
Usually requires more invalidation logic

C) Computation caching (derived results)

Cache expensive computations.

Ranking, aggregation, recommendations, reports
Often benefits most from “stale-while-revalidate” patterns

D) Negative caching

Cache “nothing here.”

404s, empty results, permission-denied checks
Prevents repeated expensive misses
Must be used carefully (avoid caching temporary failures)

TTL Is Not Invalidation

TTL (time-to-live) is the most common caching tool because it’s easy:

Put value in cache for 60 seconds
Recompute after it expires

But TTL is not invalidation. It is a bounded staleness guarantee.

Use TTL when:

Data changes frequently, and you can tolerate being “a bit behind”
Exact freshness is not critical (feeds, counts, trending lists)

Avoid TTL-only caches when:

Users expect strong correctness (billing, permissions, inventory)
A single stale value can cause harm

Invalidation Strategies That Scale

The “two hard things” joke exists for a reason: invalidation is where systems fail.

1) Write-through

Update cache at the same time you write to the source of truth.

Pros: reads are fast and fresh
Cons: writes become slower, and failures are tricky (what if cache update fails?)

2) Write-back (rare)

Write to cache first, flush to DB later.

Pros: extremely fast writes
Cons: complexity and data-loss risk; usually not worth it outside specialized systems

3) Cache-aside (lazy loading)

On read: check cache; if missing, load from DB and store. On write: update DB, then invalidate cache.

Pros: simple and popular
Cons: invalidation is your job; race conditions exist

4) Event-driven invalidation

Publish events on writes; consumers invalidate relevant keys.

Pros: good decoupling and scale
Cons: requires a reliable event pipeline and careful key design

Correctness Pitfalls and How to Avoid Them

Cache key design errors

Bad cache keys cause incorrect data to leak across users.

Rules of thumb:

Include all dimensions that affect the result (user id, locale, permissions, feature flags)
Normalize inputs (lowercase, stable sorting, canonical query params)
Version your keys (v1: prefix) so you can invalidate by migration

Example key shape:

v2:product:{id}:locale:{locale}:currency:{currency}

The thundering herd (cache stampede)

When a hot key expires, many requests recompute at once.

Mitigations:

Probabilistic early refresh: refresh before TTL hits zero
Soft TTL + hard TTL: serve stale value briefly while refreshing in background
Single-flight locking: only one worker recomputes; others wait or serve stale

Caching errors and timeouts

Caching failures should degrade gracefully.

If cache is down, your app must still work (maybe slower)
Treat the cache as an optimization, not a dependency—unless you’ve designed it to be one

Caching authorization and permissions

A classic incident pattern:

Cache a response for user A
Serve it to user B by accident

If your response depends on auth context, never share keys across users unless you have a robust permission-aware key strategy.

Stale-While-Revalidate: The Most Practical “Advanced” Pattern

Many teams aim for perfect freshness and end up with brittle systems. A better pattern for many endpoints is:

Serve cached value immediately (even if slightly stale)
Refresh asynchronously when it approaches expiration

This pattern smooths load spikes and improves tail latency.

A simple mental model:

Hard TTL: maximum staleness you will ever serve
Soft TTL: when you start refreshing

When soft TTL is hit:

If cache has value: serve it
Trigger a background refresh

When hard TTL is hit:

Block and recompute (or return fallback)

This turns cache expiry from a cliff into a ramp.

Observability: If You Can’t Measure It, You Can’t Trust It

You should be able to answer these questions from dashboards:

Cache hit ratio (overall and per key group)
Miss latency and backend load during misses
Evictions (memory pressure or policy)
Hot keys (top N keys by QPS)
Stampede symptoms (miss spikes, lock contention)

Be cautious with “hit ratio” as a vanity metric: a high hit ratio can still hide correctness bugs or stampedes.

When Not to Cache

Caching is not free. Don’t cache when:

Data is highly personalized and low-reuse
Results must be correct in real time (payments, stock levels, permissions)
You can achieve the performance goal by optimizing queries, indexes, or batching
The operational complexity outweighs the savings

A good heuristic:

Cache after you understand the bottleneck, not before.

A Practical Checklist

Before adding a cache, confirm:

What are you caching? response / data / computation / negative
What is the acceptable staleness? seconds? minutes? must be exact?
What triggers invalidation? TTL only? writes? events?
What is the key shape? versioned, canonical, permission-aware
How do you prevent stampedes? soft/hard TTL, single-flight, jitter
What happens when cache fails? fallback path and timeouts
How will you measure success? p95 latency, DB QPS, error rate

Final Takeaway

Caching is not about making things fast. It is about deciding where you are willing to be stale, and for how long.

A concrete conclusion you can apply:

Start with request-scope caching to eliminate duplicate work safely.
Add shared caches (Redis/CDN) only after you understand reuse patterns.
Use TTL for tolerance, not correctness.
Treat invalidation as a design requirement, not an afterthought.
Prefer stale-while-revalidate over hard expiration for user-facing reads.
Design cache keys and observability before you ship the cache.

If you cannot clearly answer what can be stale, for how long, and what breaks if it is, you are not ready to add a cache.

Caching works best when it is deliberate, bounded, and measurable—not when it is added reactively in response to slow queries.