Programming

How I Cut API Response Time by 73% With a Redis Strategy Nobody Talks About

Cache hit rate is a vanity metric. Here is the response-level caching strategy that actually cut our p95 from 380ms to 102ms — and the 5 anti-patterns most teams miss.

Md. Rony Ahmed · 9 min read
How I Cut API Response Time by 73% With a Redis Strategy Nobody Talks About
Everyone knows Redis caches data. Nobody tells you the caching strategy that makes your API actually fast.

I spent six months watching our cache hit rate sit at 94% while our p95 response time barely moved. The data was cached. The API was still slow. The problem was not Redis — it was how we used it.

Here is the strategy that cut our API response time from 380ms to 102ms — and the anti-patterns that were silently killing performance.

The Problem: Cache Hit Rate Is a Vanity Metric



We had a typical setup:

- API receives request
- Check Redis → hit? return data
- Miss? Query Postgres, write to Redis, return data
- TTL: 5 minutes

Cache hit rate: 94%. Looked great on dashboards. But our p95 latency? 380ms. The fast cache hits were still taking 80-120ms. Something was wrong.

The hidden issue: We were caching database query results, not API responses. Every cache hit still required JSON serialization, object mapping, and response construction. The cache saved us the Postgres round-trip — but all the CPU work remained.

The Real Fix: Response-Level Caching with Stale-While-Revalidate



We flipped the approach. Instead of caching raw query results, we cache the final rendered API response — complete JSON payload, ready to serve.

Before (Data-Level Cache)



// Cache stores raw SQL results
const userData = await redis.get(`user:${id}`);
if (userData) {
  // Still need to: parse, filter fields, serialize JSON
  return formatResponse(JSON.parse(userData));
}
const result = await db.query('SELECT * FROM users WHERE id = $1', [id]);
await redis.setex(`user:${id}`, 300, JSON.stringify(result));
return formatResponse(result);


After (Response-Level Cache)



// Cache stores FINAL API response
const cached = await redis.get(`api:user:${id}`);
if (cached) {
  // Direct response — zero processing
  res.setHeader('Content-Type', 'application/json');
  return res.send(cached); // Already stringified JSON
}

const result = await db.query('SELECT * FROM users WHERE id = $1', [id]);
const response = JSON.stringify(formatResponse(result));

// Write-through: cache the final response
await redis.setex(`api:user:${id}`, 300, response);
res.send(response);


Result: Cache hits dropped from 80ms to 8ms. The difference? We eliminated JSON parsing, object mapping, and response formatting on every single request.

The Staleness Budget: How We Handle Cache Invalidation



Just invalidate the cache when data changes sounds simple until you have 47 cache keys referencing the same user across different endpoints.

We implemented a staleness budget instead of chasing perfect invalidation:

const STALENESS_BUDGET = {
  'user:profile': 30,      // 30 seconds max staleness
  'user:dashboard': 120,   // 2 minutes — less critical
  'user:analytics': 600    // 10 minutes — historical data
};

// Write with context-aware TTL
async function cacheResponse(key, response, context) {
  const ttl = STALENESS_BUDGET[context] || 60;
  await redis.setex(key, ttl, response);
}


Why this works: Instead of complex invalidation chains, we accept bounded staleness. A user profile might be 30 seconds behind reality. Their analytics dashboard? Up to 10 minutes. Every context gets a tolerance budget.

Trade-off accepted: Slightly stale data for massive performance gains. We document these budgets. Users (internal teams) know the freshness guarantee per endpoint.

The 73% Improvement: Real Production Numbers



Here is what happened when we rolled this out:

Before (Data-Level Cache)
- p50: 85ms
- p95: 380ms
- p99: 890ms
- Cache hit rate: 94%

After (Response-Level + Stale-While-Revalidate)
- p50: 12ms
- p95: 102ms
- p99: 245ms
- Effective cache hit rate: 97% (includes stale-while-revalidate serves)

The 73% p95 improvement came from three changes working together:

1. Response-level caching (biggest impact): Eliminated per-request processing
2. Stale-while-revalidate: Background refresh prevented stampede effects
3. Connection pooling: Persistent Redis connections (not createClient per request)

Implementation: The Complete Pattern



const redis = new Redis({
  host: process.env.REDIS_HOST,
  port: 6379,
  maxRetriesPerRequest: 3,
  // CRITICAL: Connection pooling, not new connections per request
  lazyConnect: false
});

async function getCachedResponse(cacheKey, fetchFn, context = 'default') {
  // 1. Try cache
  const cached = await redis.get(cacheKey);
  if (cached) {
    // Check staleness budget
    const ttl = await redis.ttl(cacheKey);
    const budget = STALENESS_BUDGET[context] || 60;
    
    // If still within budget, serve immediately
    if (ttl > (budget * 0.1)) { // Within 90% of budget
      return { data: cached, cached: true };
    }
    
    // Near expiry: serve stale, trigger background refresh
    // Do not await — let it happen in background
    refreshCache(cacheKey, fetchFn, context);
    return { data: cached, cached: true, stale: true };
  }
  
  // 2. Cache miss: fetch and cache
  const fresh = await fetchFn();
  const response = JSON.stringify(fresh);
  await cacheResponse(cacheKey, response, context);
  
  return { data: response, cached: false };
}

// Background refresh — no await, fire-and-forget
function refreshCache(cacheKey, fetchFn, context) {
  fetchFn().then(data => {
    const response = JSON.stringify(data);
    return cacheResponse(cacheKey, response, context);
  }).catch(err => {
    console.error('Background cache refresh failed:', err);
  });
}


5 Caching Anti-Patterns That Cost You Performance



1. Caching database rows instead of responses
- Saves network round-trip but keeps CPU work
- Fix: Cache the final serialized response

2. Using the same TTL everywhere
- User profile (changes often) and analytics (changes rarely) get same expiry
- Fix: Context-aware TTL budgets per endpoint

3. Cache stampede on expiry
- 50 requests hit at once when key expires, all query database
- Fix: Stale-while-revalidate with background refresh

4. Creating new Redis connections per request
- Connection overhead: 5-15ms per request
- Fix: Persistent connection pool, reuse across requests

5. Not monitoring cache efficiency
- Cache hit rate is vanity metric. Monitor response time distribution for cached vs uncached hits
- Fix: Tag metrics by cached: true/false and stale: true/false

When NOT to Use This Pattern



- Frequently mutating data: If data changes every second, caching adds complexity without benefit
- Large payloads (>1MB): Redis single-threaded, large values block other operations
- Strict consistency requirements: Financial transactions, real-time bidding — accept the database cost

Monitoring What Actually Matters



We stopped watching cache hit rate and started tracking:

- Cached response time (target: <15ms)
- Stale serve rate (target: <5% of total requests)
- Background refresh failure rate (target: <0.1%)
- Redis memory fragmentation (large values = fragmentation)

// Prometheus metrics example
const cacheMetrics = {
  hitDuration: new Histogram('cache_hit_seconds', 'Response time for cache hits'),
  staleServes: new Counter('cache_stale_serves_total', 'Served stale while refreshing'),
  refreshFailures: new Counter('cache_refresh_failures_total', 'Background refresh failures')
};


The Bottom Line



Redis is fast. But most implementations only use 20% of its potential. The gap between we use Redis and Redis makes our API fast is in what you cache and how you invalidate.

Cache the final response, not the raw data. Accept bounded staleness. Monitor response times, not hit rates. That is the 73% difference.