- Published on
p99 Latency Tuning in Redis-Based Applications
- Authors

- Name
- Parminder Singh
Lately, I have been working on a high-throughput API where most reads are served from Redis. Redis is already fast, but my goal was to understand how low I could push p99 latency, and more importantly, how stable I could make the tail. This post is about end-to-end p99 tuning for a Redis-backed API. Some optimizations target Redis directly. Others target the client, request shape, and network behavior. While I use Redis here, these optimizations targeting the network stack, request shape, and client-side caching apply to almost any distributed system.

Photo by Antoine Gravier on Unsplash
p99
p99 (99th percentile) means that 99% of requests complete faster than this number. The remaining 1% are the slowest requests, also called the tail. At 10,000 requests per second, that 1% translates to 100 slow requests every second.
Tracking p99 helps understand consistency. It shows how often users hit delays and how unpredictable the system feels. Average latency hides this behavior and can look healthy even when a subset of users is experiencing slow responses.
Baseline
The API server is written in NodeJS. Redis and the API run in separate containers.
Please note that all latency numbers below are end-to-end API latency and not Redis command execution times. Redis GET itself is sub-millisecond. The tail comes from scheduling, networking, client behavior, and request shape.
Docker setup
version: '3.8'
services:
redis-p99:
image: redis:8-alpine
container_name: redis-p99
ports:
- "9899:6379"
command: ["redis-server", "--loglevel", "warning"]
api-server:
build: .
container_name: api-p99
environment:
- REDIS_URL=redis://redis-p99:9899
- NODE_ENV=production
depends_on:
- redis-p99
ports:
- "3000:3000"
API handler
const app = express();
const redis = new Redis();
app.get('/api/data', async (req, res) => {
const userId = req.query.userId;
const user = await redis.get(`user:${userId}`);
const userPosts = await redis.get(`user:${userId}:posts`);
res.json({ user, userPosts });
});
Benchmark script
function runTest(name) {
const instance = autocannon({
url: 'http://localhost:3000/api/data',
connections: 100,
duration: 30,
pipelining: 1,
title: name
}, (err, result) => {
if (err) console.error(err);
console.log(`>>Results for: ${name}`);
console.log(`Avg Latency: ${result.latency.average} ms`);
console.log(`p99 Latency: ${result.latency.p99} ms`);
console.log(`Total Requests: ${result.requests.total}`);
console.log(`Requests/Sec: ${result.requests.average}`);
});
autocannon.track(instance, { renderProgressBar: true });
}
Baseline results
┌─────────┬──────┬──────┬───────┬──────┬─────────┬─────────┬────────┐
│ Stat │ 2.5% │ 50% │ 97.5% │ 99% │ Avg │ Stdev │ Max │
├─────────┼──────┼──────┼───────┼──────┼─────────┼─────────┼────────┤
│ Latency │ 1 ms │ 1 ms │ 3 ms │ 3 ms │ 1.33 ms │ 2.28 ms │ 107 ms │
└─────────┴──────┴──────┴───────┴──────┴─────────┴─────────┴────────┘
That 107 ms is not just Redis and includes end-to-end scheduling noise, GC pauses, and network jitter. The goal from here is to minimize max latency and stabilize the tail, which directly improves consistency in user experience.
Network path optimization (if co-location is possible)
Because Redis and the API run on the same host, the first thing I tested was switching from TCP to Unix Domain Sockets (UDS).
UDS avoids TCP/IP overhead and reduces jitter introduced by parts of the network stack. It does not bypass the kernel, but it removes several sources of variance.
Code change
const redis = new Redis({
path: process.env.REDIS_SOCKET_PATH
});
Redis config
services:
redis-p99:
image: redis:8-alpine
user: "999:1000"
...
command: ["redis-server", "--unixsocket", "/dev/shm/redis.sock", "--unixsocketperm", "700"]
...
volumes:
- /dev/shm:/dev/shm
api-server:
user: "1000:1000"
...
environment:
- REDIS_URL=redis://redis-p99:9899
- REDIS_SOCKET_PATH=/dev/shm/redis.sock
- NODE_ENV=production
...
volumes:
- /dev/shm:/dev/shm
volumes:
redis_socket:
using /dev/shm is just convenient, but the main win here comes from UDS itself.
The key changes:
- Shared volume for the socket file
- Compatible user permissions between containers
Results
┌─────────┬──────┬──────┬───────┬──────┬─────────┬─────────┬───────┐
│ Stat │ 2.5% │ 50% │ 97.5% │ 99% │ Avg │ Stdev │ Max │
├─────────┼──────┼──────┼───────┼──────┼─────────┼─────────┼───────┤
│ Latency │ 1 ms │ 1 ms │ 3 ms │ 3 ms │ 1.33 ms │ 2.08 ms │ 99 ms │
└─────────┴──────┴──────┴───────┴──────┴─────────┴─────────┴───────┘
The Standard Deviation dropped from 2.28 ms to 2.08 ms (more predictablity), and the absolute ceiling (Max) fell below 100ms.
I understand that colocation is not always acceptable as it couples failure domains. In those cases reducing and controlling network variance is key.
Some strategies:
- Avoid cross-AZ traffic
- Reduce round trips aggressively
- Use stable, long-lived connections
- Bound retries and timeouts
- Pack data to reduce chattiness
Pipelining
Using pipelining helps make 2 changes:
- In the earlier code, I had two sequential await calls. Each await redis.get() yields back to the Node.js event loop. If the loop is busy with other tasks, the next Redis call is delayed. With pipelining, we only await once.
Old code:
const user = await redis.get(`user:${userId}`);
const posts = await redis.get(`user:${userId}:posts`);
New code:
const [user, posts] = await redis.pipeline()
.get(`user:${userId}`)
.get(`user:${userId}:posts`)
.exec();
- Pipelining collapses multiple commands into a single RTT (Round Trip Time). Meaning only one network interaction is required and gets are batched.
Pipelining doesn't just save RTTs, it also reduces interrupt frequency. Processing one large buffer is more CPU-efficient than processing multiple smaller ones. This lowers the load on the Node.js event loop, preventing the loop from being blocked while the next request is waiting.
RESP3 and Client-Side Caching
The fastest network call is the one you never make. Server-Assisted Client-Side Caching (via RESP3) allows the client to keep a local copy of the data while Redis tracks which keys the client has "subscribed" to. When a key changes, Redis pushes an invalidation message to the client. This allows for near-zero latency reads without the classic stale cache problem.
If the connection between Redis and the API drops, invalidation messages may be lost. This can be mitigated by implementing a max age/TTL for local entries. Also memory use has to be efficient (LRU or TTL-based eviction).
To use RESP3, configure the client:
const redis = new Redis({
path: process.env.REDIS_SOCKET_PATH,
protocol: 3
});
Also, RESP3 returns native data types (like Maps), which means the client process spends fewer CPU cycles parsing data.
Enabling tracking
await redis.client('TRACKING', 'on');
Handling invalidations
redis.on('push', (message) => {
if (message.type === 'invalidate') {
message.data.forEach(key => localCache.delete(key));
}
});
Updated request flow
if (localCache.has(userKey) && localCache.has(postsKey)) {
return res.json({
user: localCache.get(userKey),
posts: localCache.get(postsKey),
source: 'local-memory'
});
}
const results = await redis.pipeline()
.get(userKey)
.get(postsKey)
.exec();
localCache.set(userKey, results[0][1]);
localCache.set(postsKey, results[1][1]);
Results
┌─────────┬──────┬──────┬───────┬──────┬─────────┬─────────┬───────┐
│ Stat │ 2.5% │ 50% │ 97.5% │ 99% │ Avg │ Stdev │ Max │
├─────────┼──────┼──────┼───────┼──────┼─────────┼─────────┼───────┤
│ Latency │ 1 ms │ 1 ms │ 3 ms │ 3 ms │ 1.27 ms │ 1.51 ms │ 41 ms │
└─────────┴──────┴──────┴───────┴──────┴─────────┴─────────┴───────┘
The system is now significantly more deterministic. By removing the network from the critical path for repeated reads, the Max latency dropped by 60% (from 107ms to 41ms), and the Standard Deviation reached a much healthier 1.51 ms.
On a high level
- Reduce round trips and batch network calls
- Reduce interrupt frequency and event loop blocking
- Use client-side caching
- Control network variance