Caching Systems (Part 1): Layers, Strategies, and Failure Modes
Caching is one of the highest-leverage tools in backend engineering: it can turn expensive work into cheap work, flatten latency spikes, and reduce infrastructure cost. It can also quietly introduce correctness bugs, inconsistent user experiences, and operational incidents.
This article is not a “use Redis” tutorial. It is a practical deep dive into how caching works as a system—across layers, time, and failure modes—so you can design caches you can actually trust.
What Caching Really Is
A cache is a time-bounded copy of data or computation that trades one resource for another:
- Trade CPU / database I/O for memory
- Trade network round trips for locality
- Trade strict freshness for speed
Every cache makes an implicit promise:
“This value is probably good enough until time T or until event E.”
You are not just choosing a technology. You are choosing a consistency model.
The Caching Stack: Where Caches Live
Most production systems have multiple caches stacked on top of each other. Understanding them as a pipeline helps you reason about behavior.
Client
│
▼
[ Browser Cache ]
│
▼
[ CDN / Edge Cache ]
│
▼
[ Reverse Proxy Cache ] (Nginx / Varnish)
│
▼
[ Application Server ]
│ ├─ Request-scope Cache
│ ├─ In-process Cache
│ └─ Shared Server Cache (Redis / Memcached)
│
▼
[ Database ]
│ ├─ Query / Plan Cache
│ └─ Buffer / Page Cache
Think of caching as progressively closer copies of data and computation.
1) Request-scope cache (per-request)
This cache lives only for the lifetime of a single request.
Examples:
- Memoization inside a request handler
- Django
cached_property - ORM identity maps
- GraphQL DataLoader-style batching
Why it matters:
- Eliminates duplicate work inside one request
- Collapses N+1 query patterns
Properties:
- No invalidation needed
- Zero consistency risk
- Often the highest ROI cache
2) In-process server cache
Cached in application memory.
Examples:
- LRU caches
- Module-level dictionaries
- Framework-level local caches
Pros:
- Extremely fast (no network)
- Simple mental model
Cons:
- Not shared across instances
- Evicted on deploy or restart
Best used for:
- Configuration
- Feature flags
- Pure, deterministic computations
3) Shared server-side cache (Redis / Memcached)
This is the classic backend cache layer.
Characteristics:
- Shared across instances
- Network hop required
- Centralized invalidation point
Common responsibilities:
- Data caching (objects, query results)
- Computation caching
- Coordination primitives (locks, rate limits)
Design requirement:
- Cache failure must degrade gracefully
4) Reverse proxy cache
Sits in front of application servers.
Effective when:
- Responses are identical across users
- Authentication is handled upstream
- You want to absorb traffic spikes
Trade-offs:
- Limited visibility into per-user context
- Invalidation can be coarse-grained
5) Database-internal caches
Always present, often forgotten.
Includes:
- Buffer pools
- Page cache
- Query / execution plan caches
Important implication:
You may already benefit from caching even without Redis.
In many cases, fixing queries outperforms adding a new cache layer.
Key idea: you cannot reason about caching by looking at Redis alone. Your system already caches in multiple places.
What Should Be Cached?
A useful way to decide is to categorize what you want to cache:
A) Content caching (responses)
Cache the full HTTP response.
- Fastest end-to-end
- Best for read-heavy endpoints
- Requires careful consideration of auth and personalization
B) Data caching (objects, query results)
Cache the underlying data representation.
- More flexible
- Can be shared across multiple endpoints
- Usually requires more invalidation logic
C) Computation caching (derived results)
Cache expensive computations.
- Ranking, aggregation, recommendations, reports
- Often benefits most from “stale-while-revalidate” patterns
D) Negative caching
Cache “nothing here.”
- 404s, empty results, permission-denied checks
- Prevents repeated expensive misses
- Must be used carefully (avoid caching temporary failures)
TTL Is Not Invalidation
TTL (time-to-live) is the most common caching tool because it’s easy:
- Put value in cache for 60 seconds
- Recompute after it expires
But TTL is not invalidation. It is a bounded staleness guarantee.
Use TTL when:
- Data changes frequently, and you can tolerate being “a bit behind”
- Exact freshness is not critical (feeds, counts, trending lists)
Avoid TTL-only caches when:
- Users expect strong correctness (billing, permissions, inventory)
- A single stale value can cause harm
Invalidation Strategies That Scale
The “two hard things” joke exists for a reason: invalidation is where systems fail.
1) Write-through
Update cache at the same time you write to the source of truth.
- Pros: reads are fast and fresh
- Cons: writes become slower, and failures are tricky (what if cache update fails?)
2) Write-back (rare)
Write to cache first, flush to DB later.
- Pros: extremely fast writes
- Cons: complexity and data-loss risk; usually not worth it outside specialized systems
3) Cache-aside (lazy loading)
On read: check cache; if missing, load from DB and store. On write: update DB, then invalidate cache.
- Pros: simple and popular
- Cons: invalidation is your job; race conditions exist
4) Event-driven invalidation
Publish events on writes; consumers invalidate relevant keys.
- Pros: good decoupling and scale
- Cons: requires a reliable event pipeline and careful key design
Correctness Pitfalls and How to Avoid Them
Cache key design errors
Bad cache keys cause incorrect data to leak across users.
Rules of thumb:
- Include all dimensions that affect the result (user id, locale, permissions, feature flags)
- Normalize inputs (lowercase, stable sorting, canonical query params)
- Version your keys (
v1:prefix) so you can invalidate by migration
Example key shape:
v2:product:{id}:locale:{locale}:currency:{currency}
The thundering herd (cache stampede)
When a hot key expires, many requests recompute at once.
Mitigations:
- Probabilistic early refresh: refresh before TTL hits zero
- Soft TTL + hard TTL: serve stale value briefly while refreshing in background
- Single-flight locking: only one worker recomputes; others wait or serve stale
Caching errors and timeouts
Caching failures should degrade gracefully.
- If cache is down, your app must still work (maybe slower)
- Treat the cache as an optimization, not a dependency—unless you’ve designed it to be one
Caching authorization and permissions
A classic incident pattern:
- Cache a response for user A
- Serve it to user B by accident
If your response depends on auth context, never share keys across users unless you have a robust permission-aware key strategy.
Stale-While-Revalidate: The Most Practical “Advanced” Pattern
Many teams aim for perfect freshness and end up with brittle systems. A better pattern for many endpoints is:
- Serve cached value immediately (even if slightly stale)
- Refresh asynchronously when it approaches expiration
This pattern smooths load spikes and improves tail latency.
A simple mental model:
- Hard TTL: maximum staleness you will ever serve
- Soft TTL: when you start refreshing
When soft TTL is hit:
- If cache has value: serve it
- Trigger a background refresh
When hard TTL is hit:
- Block and recompute (or return fallback)
This turns cache expiry from a cliff into a ramp.
Observability: If You Can’t Measure It, You Can’t Trust It
You should be able to answer these questions from dashboards:
- Cache hit ratio (overall and per key group)
- Miss latency and backend load during misses
- Evictions (memory pressure or policy)
- Hot keys (top N keys by QPS)
- Stampede symptoms (miss spikes, lock contention)
Be cautious with “hit ratio” as a vanity metric: a high hit ratio can still hide correctness bugs or stampedes.
When Not to Cache
Caching is not free. Don’t cache when:
- Data is highly personalized and low-reuse
- Results must be correct in real time (payments, stock levels, permissions)
- You can achieve the performance goal by optimizing queries, indexes, or batching
- The operational complexity outweighs the savings
A good heuristic:
Cache after you understand the bottleneck, not before.
A Practical Checklist
Before adding a cache, confirm:
- What are you caching? response / data / computation / negative
- What is the acceptable staleness? seconds? minutes? must be exact?
- What triggers invalidation? TTL only? writes? events?
- What is the key shape? versioned, canonical, permission-aware
- How do you prevent stampedes? soft/hard TTL, single-flight, jitter
- What happens when cache fails? fallback path and timeouts
- How will you measure success? p95 latency, DB QPS, error rate
Final Takeaway
Caching is not about making things fast. It is about deciding where you are willing to be stale, and for how long.
A concrete conclusion you can apply:
- Start with request-scope caching to eliminate duplicate work safely.
- Add shared caches (Redis/CDN) only after you understand reuse patterns.
- Use TTL for tolerance, not correctness.
- Treat invalidation as a design requirement, not an afterthought.
- Prefer stale-while-revalidate over hard expiration for user-facing reads.
- Design cache keys and observability before you ship the cache.
If you cannot clearly answer what can be stale, for how long, and what breaks if it is, you are not ready to add a cache.
Caching works best when it is deliberate, bounded, and measurable—not when it is added reactively in response to slow queries.