Lately, I have been working on a high-throughput API where most reads are served from Redis. Redis is already fast, but my goal was to understand how low I could push **p99 latency**, and more importantly, how stable I could make the tail. This post is about end-to-end p99 tuning for a Redis-backed API. Some optimizations target Redis directly. Others target the client, request shape, and network behavior. While I use Redis here, these optimizations targeting the network stack, request shape, and client-side caching apply to almost any distributed system.