There are few things more damaging to a business than an application that becomes unresponsive exactly when traffic peaks. Not during quiet hours, not in the middle of the night — but during a product launch, a marketing campaign, or the busiest trading period of the year. The timing feels cruel, but it’s not a coincidence. Peak load is precisely when hidden architectural problems surface.
This is the story of one such system, and what was actually causing it to fail.
The architecture that looked fine on paper
The infrastructure in question was built on Azure and, by most measures, looked solid. Redis was in place as a caching layer to reduce database pressure. The database itself was properly indexed. Servers had comfortable headroom on both CPU and memory — no resource exhaustion, no obvious bottlenecks. On a quiet afternoon, everything worked beautifully. Response times sat around 200ms, well within acceptable range.
Then peak traffic would arrive. Response times would climb from 200ms to 2 seconds, then 5, then 10. Eventually the application would stop responding altogether. And then, as traffic subsided, it would recover on its own — as if nothing had happened.
This pattern is particularly disorienting for engineering teams. The system heals itself, so there’s no crash to investigate, no error log with a clear smoking gun. Just a recurring window of failure that’s hard to reproduce and even harder to explain.
Why the obvious suspects weren’t guilty
The natural instinct when an application slows under load is to look at the most visible resources: CPU, memory, database query times. All of them were fine. Redis, the caching layer specifically designed to handle this kind of load, was responding in microseconds. The database wasn’t under unusual pressure. The servers weren’t breaking a sweat.
This is where many investigations stall. If the database is fine, Redis is fine, and the servers have headroom — what’s left?
The answer was in a place most teams don’t think to look: thread pool metrics and connection pool utilization.
The actual problem: threads waiting for nothing
Modern web applications handle concurrent requests using thread pools — a fixed set of worker threads that process incoming requests. When a request comes in, it gets assigned to an available thread. If all threads are busy, the request waits in a queue.
The application was running with default thread pool settings. Those defaults are reasonable for low-to-moderate traffic, but they set a relatively low ceiling on how many threads are available at any given moment. Under normal load, there were always enough threads to go around. Under peak load, every thread was occupied — not doing heavy work, but waiting. Waiting to make a call to Redis. Waiting for a Redis response that would arrive in microseconds.
Here’s the paradox: Redis was fast. The problem wasn’t Redis performance. The problem was that the threads making Redis calls were blocking while they waited, even for those microseconds, and there weren’t enough of them to keep up with the incoming request volume. Requests piled up in the queue. Response times climbed. Eventually the queue filled and the application became unresponsive.
The infrastructure was fine. The bottleneck was a configuration default that nobody had revisited since the system was first deployed.
What the fix looked like
The solution involved three targeted changes, none of which required re-architecting the system.
Thread pool reconfiguration. We analyzed the expected concurrent load and pre-allocated a sufficient number of worker threads to handle peak traffic without queuing. This meant the application could process many more simultaneous requests without threads blocking each other.
Proper connection pooling for Redis. Related to the thread problem was how connections to Redis were being managed. Without a proper connection pool, the application was creating and tearing down Redis connections more frequently than necessary, adding latency and overhead to every cache interaction. A well-configured connection pool meant connections were reused efficiently, and Redis calls became as fast as they should have been all along.
Monitoring for thread pool utilization. Perhaps as importantly as fixing the immediate problem, we added visibility into thread pool metrics going forward. CPU and memory graphs are standard in most monitoring setups. Thread pool saturation almost never is — which is exactly why this problem had gone undetected for so long. If thread pool utilization starts climbing toward its ceiling, the team now knows before users feel it.
The results
Response times stabilized at under 300ms even during peak traffic periods. The infrastructure was able to handle five times the previous concurrent load without degradation. The underlying hardware, the database, and Redis itself didn’t change. Only the configuration did.
What this means for your team
If your application behaves well under normal conditions but degrades or fails under peak load, the problem is almost certainly not the thing you’re measuring most. CPU and memory are easy to monitor, so teams watch them closely. Thread pools, connection pools, and queue depths are harder to instrument, so they go unmonitored — and that’s precisely where these failure modes hide.
Before reaching for more servers or a bigger cache layer, it’s worth asking: do we actually know what’s happening inside our application at the thread level during peak load? In most cases, the answer is no. And in many cases, that’s where the answer is.
Infrastructure problems are rarely about infrastructure. They’re about configuration, visibility, and knowing where to look.
At Bitgloss, we help engineering teams find the real cause of performance failures — not just the obvious suspects. If your application is struggling under load, get in touch.