Identifying Bottlenecks¶

Here is the situation. You're running a load test. Virtual users are experiencing slow response times. You check the server metrics: low-to-moderate CPU utilization, plenty of memory, low disk activity. Everything looks good. But as you apply more load, response times degrade a little. Then a lot. Soon they've blown past acceptable thresholds and kept going.

What's going on? The server metrics "clearly show" your servers aren't busy, yet your site is slow. We encounter this scenario frequently, and the answer is almost always the same: you need to look deeper to find the hidden bottleneck.

This guide explains how to identify bottlenecks using systematic investigation and why different bottleneck types occur, so you can diagnose root causes and take the right corrective actions.

Why Averages Hide Bottlenecks¶

If you've been using average page load data to evaluate your website's performance, you could be missing important bottlenecks that are reducing conversion rates and causing customers to have a frustrating experience.

Averages are easy to understand, which is precisely why they're dangerous. If the average load time of a page is 2 seconds, many users are seeing load times slower than that, but the average tells you nothing about how much slower.

The Scatter Plot Advantage¶

Here's a real example. A high-end retailer showed excellent average page load times. No visible slowdown even at 2,250 concurrent users.

What the average hid was a much wider range of response times than you'd imagine. That's why scatter plots are essential: they plot every single response time in the test, revealing the distribution that averages cover up.

The average said "fast." The scatter plot showed that the all-important checkout and shopping cart process had a substantial number of responses in the 5-15 second range, even at low load levels.

Don't Rely on Averages Alone

If you had relied on just average load times, you would have missed that all-important detail that could be driving down sales conversion rates and costing the company money.

Use the Embedded Analytics Dashboard to view scatter plots and response time distributions. Look beyond the average to see the full picture.

Common Bottleneck Patterns¶

Understanding what bottlenecks look like helps you recognize them quickly.

Pattern 1: Servers Aren't Busy But Site Is Slow¶

Symptoms:

Response times degrade significantly under load
CPU utilization is low to moderate (40-60%)
Memory usage is fine, disk activity is low
As load increases, CPU actually decreases
Errors start appearing (500 Internal Server Error)

Example from a real test:

As load increased from 25 to 200 users, CPU utilization climbed to ~55%. But as load continued to increase, CPU utilization did not. In fact, the trend was downward.

Why this happens: The database (or application server) has a concurrency limit. Requests are being queued, waiting their turn, so CPU goes idle even though the system is overloaded.

Where to look next: Almost always, look at the database.

Pattern 2: Vertical Stack Pattern in Scatter Plots¶

Symptoms:

Scattered load times bunch up in recognizable vertical lines
Multiple requests complete at exactly the same time
Response times span 10-30+ seconds
Server CPU is moderate (40-60%) when blocks occur

Why this happens: This is an artifact of how web page response times are measured. When a page is blocked (by a locked database table, for example):

First user requests the blocked page → starts waiting
Second user requests the same page → also waits
More users pile up, all waiting
After 35 seconds (for example), the block clears
All waiting requests complete simultaneously → vertical stack pattern

The first user to request the blocked page waits the longest, the second a little less, and the most recent user may not wait much at all. But no matter how long they waited, they all saw the page update at approximately the same time.

Typical causes of blocking:

Locked database tables (most common)
Load balancer sending too much load to one server while others are lightly loaded
One server in a cluster has crashed
Dynamic server scaling taking too long
Severe network congestion or failure

AI Pattern Recognition

Ask the AI to identify blocking patterns:

"Show me scatter plot patterns indicating database locks"
"Identify vertical stack patterns in response times"
"Why are requests completing in synchronized bursts?"
"Explain the blocking pattern at 300 users"

Pattern 3: Gradual Degradation with Spare Capacity¶

Symptoms:

Response times slowly increase as load increases
Server resources show plenty of spare capacity
No obvious spikes or errors
Throughput plateaus even though CPU/memory are not maxed

Why this happens: Parallelism limits. The system can only process N operations simultaneously, even on a 16-core server. If each operation takes 20ms and you can only run one at a time, your maximum throughput is 50 ops/sec regardless of CPU cores.

Where to look next: Database parallelism settings, connection pool sizes, thread pool limits.

Systematic Bottleneck Investigation¶

Follow this three-step workflow to identify bottlenecks methodically.

Step 1: Long-Running Queries¶

Start with the database. Finding long-running queries is DBA 101. That's always the first place to look.

What to check:

Are there any queries that are running slowly?
Are they used frequently in the workflows being tested?
Do slow queries correlate with response time spikes?

How to prioritize:

High priority: Queries used frequently in tested workflows
Lower priority: Slow queries not in critical paths
Optimize high-priority queries first, then re-test

Typical slow query causes:

Missing database indexes
Full table scans instead of indexed lookups
Complex joins on large tables
Inefficient WHERE clauses
N+1 query problems (repeated queries in loops)

Work with Your DBA

Have a DBA present during testing so they can inspect the system while it's under load and capture data needed for later analysis.

After optimizing, re-test: Did it help? Are there more long-running queries to be optimized? Repeat until you're satisfied that no more gains can be made here.

AI Query Analysis

Ask the AI to correlate slow queries with performance:

"Which database queries correlate with slow response times?"
"Show me the top 5 slowest queries and their impact"
"Compare query performance at 100 vs 300 users"
"Explain why this query is slow under load"

Step 2: Locking and Blocking¶

Locking prevents simultaneous access to database data to ensure consistent results. Blocking is what happens when one user locks data and a second user needs the same data. The second user waits. They are blocked.

Why blocking doesn't show up in small tests:

If you don't have simultaneous access, you won't have blocking
Even with a handful of users, blocking is highly unlikely to impact performance
Blocking frequently doesn't show up until the system is under load from many simultaneous users

How to detect blocking:

Look for vertical stack patterns in scatter plots
Check database monitoring for lock waits
Review database wait statistics (PAGEIOLATCH, LCK_M_X, etc. in SQL Server)
Examine query execution plans for blocking indicators

Common causes of blocking:

Read locks held too long: Transactions that read data and hold locks while doing other work
Write locks on hot tables: High-concurrency updates to the same rows (e.g., inventory counts, sequence generators)
Isolation level too strict: SERIALIZABLE isolation when READ COMMITTED would suffice
Long-running transactions: Transactions that span multiple user interactions

Approaches to reduce blocking:

Shorten transaction duration (commit faster)
Use optimistic locking instead of pessimistic locking
Lower isolation levels where consistency allows
Partition hot tables to spread contention
Use read replicas for read-heavy workloads

After reducing blocking, re-test and iterate as long as opportunities to reduce blocking persist.

AI Blocking Analysis

Ask the AI to identify locking issues:

"Are there database blocking issues in this test?"
"Show me lock wait times correlated with response time spikes"
"Which tables are causing blocking contention?"
"Recommend solutions for the observed blocking patterns"

Step 3: Parallelism Limits¶

Sometimes you get to this point: the queries are fast, blocking has been minimized, but the system is still slow under load and the database hardware does not appear stressed. CPU and disk I/O both have capacity to spare. How can that be?

The answer is parallelism: specifically, the number of simultaneous operations the database can perform at the same time.

Example scenario:

Queries are fast (average 20ms completion time)
Without parallelization, when incoming rate exceeds 50/sec, the system will start to slow
Why? At 20ms per operation, 50 operations will take a full second
When load increases to 100/sec, those 100 operations will take 2 seconds to complete
Some users will be waiting a second longer than they did at 50 ops/sec
With each passing second, the system falls further behind and response times continue to degrade

The system is now overloaded, and if it's running on a 16-core server, CPU utilization may be as little as 7%. Fifteen cores sitting idle while one does all the work.

Allowing 16 operations to run in parallel essentially multiplies system capacity by a factor of 16, assuming no other limits are reached (disk I/O, memory, bandwidth, or blocking).

Parallelism in databases is complex:

Many queries can run in parallel on a single core (while one waits for disk, another can execute)
Queries may be split into multiple tasks that run in parallel
Modern databases have parallelization turned on with default settings
Defaults are chosen to be safe, not optimal

Tuning parallelism requires understanding:

Max degree of parallelism (MAXDOP) settings
Cost threshold for parallelism
Whether individual queries benefit from parallelism
Interaction with other settings (memory grants, thread pools)

Parallelism Is Not a Silver Bullet

Just changing the amount of parallelism, without understanding the underlying causes, could result in no improvement, or even worse performance. Some queries don't benefit from parallelism and may perform worse with it enabled.

AI Parallelism Analysis

Ask the AI to diagnose parallelism issues:

"Is this a parallelism bottleneck?"
"Why are servers not busy despite slow response times?"
"Explain the relationship between query throughput and CPU usage"
"Recommend parallelism tuning for this database workload"

Bottleneck Types by Resource¶

Different resources bottleneck in different ways. Here's how to identify each type.

CPU Bottlenecks¶

Symptoms:

CPU utilization at or near 100%
Response times increase linearly with CPU usage
Throughput plateaus as CPU maxes out
Load average (Linux) or processor queue length (Windows) is high

Causes:

Application CPU-intensive operations: Complex business logic, encryption, compression
Inefficient algorithms: O(n²) loops, regex processing, XML/JSON parsing
Web server overhead: Static file serving, SSL/TLS handshakes
Lack of caching: Recomputing results instead of caching

Solutions:

Profile application to find CPU-intensive code paths
Optimize algorithms (better data structures, caching)
Add CPU cores (vertical scaling)
Add web servers (horizontal scaling)
Use CDN for static content
Enable caching at multiple levels

AI CPU Analysis

"Analyze CPU bottlenecks in this test"
"Which pages are CPU-intensive?"
"Compare CPU usage across user levels"

Memory Bottlenecks¶

Symptoms:

High memory usage (>90% in use)
Disk paging/swapping activity increases
Response times degrade as memory fills
Out of memory errors or crashes

Causes:

Memory leaks: Application fails to release memory
Session state bloat: Too much data stored in user sessions
Large result sets: Queries returning millions of rows
Inefficient caching: Cache grows unbounded without eviction

Solutions:

Fix memory leaks (profile to find allocations)
Reduce session state size
Paginate large result sets
Configure cache eviction policies
Add more RAM
Use external cache (Redis, Memcached)

AI Memory Analysis

"Is memory a bottleneck?"
"Show memory trends correlated with errors"
"Identify memory leaks in the application"

Disk I/O Bottlenecks¶

Symptoms:

High disk queue length
Disk utilization at 100%
Database response times increase
Disk wait time is significant portion of response time

Causes:

Slow disks: Spinning HDDs instead of SSDs
Missing indexes: Queries doing full table scans
High write volume: Logging, temp tables, sort operations
Insufficient IOPS: Storage system can't handle request rate

Solutions:

Add database indexes to reduce scans
Upgrade to SSDs or faster storage
Increase IOPS capacity (cloud storage tiers)
Tune database buffer pool/cache
Use read replicas to distribute load
Partition large tables

AI Disk Analysis

"Identify disk I/O bottlenecks"
"Show correlation between disk waits and response times"
"Which queries are causing high disk I/O?"

Network Bottlenecks¶

Symptoms:

High network utilization (>80% of capacity)
Packet loss or retransmissions
Response times increase with payload size
Latency increases under load

Causes:

Bandwidth saturation: Traffic exceeds link capacity
Chatty protocols: Too many round trips (N+1 API calls)
Large payloads: Uncompressed responses, inefficient serialization
Network congestion: Shared infrastructure, routing issues

Solutions:

Enable compression (gzip, Brotli)
Reduce payload sizes (pagination, field filtering)
Batch API calls to reduce round trips
Use CDN for static content
Upgrade network capacity
Optimize routing

AI Network Analysis

"Are there network bottlenecks?"
"Show bandwidth usage correlated with response times"
"Identify pages with large payloads"

Database-Specific Bottlenecks¶

Beyond general resource bottlenecks, databases have specific limitations:

Connection pool exhaustion:

Symptom: Errors like "max connections reached"
Cause: Too few connections in pool for concurrent load
Solution: Increase connection pool size (but watch for other limits)

Query execution plan issues:

Symptom: Queries slow despite available CPU/disk
Cause: Bad execution plans (table scans, missing statistics)
Solution: Update statistics, rebuild indexes, hint optimizer

Tempdb contention (SQL Server):

Symptom: PAGELATCH_UP waits on tempdb
Cause: High concurrency on temp tables/sort operations
Solution: Add tempdb files, reduce temp table usage

Undo/redo log contention:

Symptom: Log file I/O is bottleneck
Cause: High transaction volume, large transactions
Solution: Batch commits, faster log disk, tune log size

AI Database-Specific Analysis

"Identify database-specific bottlenecks"
"Show connection pool statistics"
"Analyze query execution plans"
"Diagnose tempdb contention"

Using the Dashboard to Find Bottlenecks¶

The Embedded Analytics Dashboard provides powerful correlation capabilities for bottleneck identification.

Correlation Workflow¶

1. Start with the Metrics tab:

Identify when response times degraded
Note the user level or time period

2. Switch to the Servers tab:

Look at server metrics for the same time period
Check for resources at or near capacity

3. Ask: "What's maxed out?"

CPU at 100%? → CPU bottleneck
Memory at 100% with paging? → Memory bottleneck
Disk queue length high? → Disk I/O bottleneck
Network utilization high? → Network bottleneck
Nothing maxed out? → Database concurrency/locking/parallelism issue

4. Drill down to Pages tab:

Identify which pages are slowest
Check if specific pages correlate with resource spikes

5. Examine Errors tab (if errors exist):

Look for error patterns (timeouts, 500 errors, connection refused)
Correlate errors with resource exhaustion

Visual Correlation¶

The dashboard overlays response times on server metrics so you can see cause-and-effect relationships:

Response times spike → What server resource spiked at same moment?
Database CPU hits 98% → Response times jump from 1.2s to 4.5s at same time
Strong correlation = likely bottleneck

AI Dashboard Correlation

Ask the AI to analyze correlations automatically:

"What's the bottleneck in this test?"
"Correlate server metrics with response time degradation"
"Identify which resource limits capacity"
"Compare bottlenecks at different user levels"
"Explain why response times increased at 350 users"

Bottleneck Investigation Checklist¶

Use this systematic checklist when investigating performance problems:

Initial Assessment¶

[ ] Review response time trends (time-based and user-level views)
[ ] Identify when degradation started (user level or time)
[ ] Check for errors (type, frequency, timing)
[ ] Note server resource utilization at degradation point

Server Resource Check¶

[ ] CPU: Is any server at >90% CPU?
[ ] Memory: Is any server at >90% memory with paging?
[ ] Disk: Is disk queue length high? Disk utilization at 100%?
[ ] Network: Is network bandwidth >80% in use?

Database Investigation¶

[ ] Long-running queries: Are any queries taking >1s under load?
[ ] Locking/blocking: Check for vertical stack patterns, lock waits
[ ] Parallelism: Is CPU low despite degradation? Check parallelism settings
[ ] Connection pool: Check for connection exhaustion errors
[ ] Execution plans: Review query plans for inefficiencies

Application Investigation¶

[ ] Slow pages: Which pages have highest response times?
[ ] Component breakdown: Is wait time, receive time, or processing time highest?
[ ] Error patterns: Do errors correlate with resource exhaustion?
[ ] Caching: Is caching effective? Cache hit rates?

Next Steps¶

[ ] Prioritize bottlenecks by business impact
[ ] Test fixes one at a time (so you know what worked)
[ ] Re-run load test after each fix
[ ] Document findings and improvements

Common Mistakes in Bottleneck Analysis¶

Mistake 1: Trusting Averages¶

Problem: Averages hide the distribution. Some users experience 10s response times while the average is 2s.

Solution: Always check percentiles (95th, 99th) and scatter plots.

Mistake 2: Ignoring "Low" Resource Utilization¶

Problem: Seeing 40% CPU and concluding "server has capacity" when it's actually a concurrency bottleneck.

Solution: When servers aren't busy but site is slow, look at database locking, blocking, and parallelism.

Mistake 3: Changing Multiple Things at Once¶

Problem: Making several optimizations simultaneously, then not knowing which one helped.

Solution: Test one fix at a time. Re-run load test after each change.

Mistake 4: Optimizing the Wrong Thing¶

Problem: Spending days optimizing a query that's only called once per session when the real problem is a query called 50 times per page.

Solution: Use profiling data and load test results to prioritize. Optimize high-frequency operations first.

Mistake 5: Not Re-Testing¶

Problem: Assuming the fix worked without measuring the actual improvement.

Solution: Always re-run load tests after changes. Measure the improvement (or regression).

What to Do After Finding the Bottleneck¶

Once you've identified the bottleneck, you need to decide: optimize or scale?

Optimize First¶

Before adding more hardware, optimize what you have:

Fix slow queries (add indexes, rewrite logic)
Reduce locking/blocking (shorten transactions, lower isolation levels)
Tune parallelism (MAXDOP, connection pools)
Enable caching (application, database, CDN)
Optimize algorithms (better data structures, fewer loops)

Benefits: Often free or low cost, improves efficiency at all scales.

Scale When Optimization Plateaus¶

After optimizing, if you still need more capacity:

Vertical scaling (bigger servers):

More CPU cores, more RAM, faster disks
Simple but limited by maximum instance size
Good for: CPU, memory, disk-bound workloads

Horizontal scaling (more servers):

Add web servers, app servers, database read replicas
Requires load balancing and distributed architecture
Good for: Read-heavy workloads, stateless applications

Database-specific scaling:

Read replicas for read-heavy workloads
Sharding for write scalability
Caching layers (Redis, Memcached) to reduce database load

Re-Test and Iterate¶

After each change, run the same load test scenario. Compare results to baseline. Verify the bottleneck is resolved (or at least reduced). Then check for new bottlenecks that emerge at higher capacity.

Performance optimization is iterative: fix one bottleneck, test, find the next one, repeat. You keep going until you meet your capacity goals or run out of things to optimize.

Performance Analysis Workflow - Systematic approach to analyzing results
Embedded Analytics Dashboard - Interactive correlation and drill-down
Understanding Metrics - Detailed metric explanations and interpretation
Server Monitoring Introduction - Why server metrics matter
AI for Analysis - AI prompts for bottleneck identification

Identifying Bottlenecks¶

Why Averages Hide Bottlenecks¶

The Scatter Plot Advantage¶

Common Bottleneck Patterns¶

Pattern 1: Servers Aren't Busy But Site Is Slow¶

Pattern 2: Vertical Stack Pattern in Scatter Plots¶

Pattern 3: Gradual Degradation with Spare Capacity¶

Systematic Bottleneck Investigation¶

Step 1: Long-Running Queries¶

Step 2: Locking and Blocking¶

Step 3: Parallelism Limits¶

Bottleneck Types by Resource¶

CPU Bottlenecks¶

Memory Bottlenecks¶

Disk I/O Bottlenecks¶

Network Bottlenecks¶

Database-Specific Bottlenecks¶

Using the Dashboard to Find Bottlenecks¶

Correlation Workflow¶

Visual Correlation¶

Bottleneck Investigation Checklist¶

Initial Assessment¶

Server Resource Check¶

Database Investigation¶

Application Investigation¶

Next Steps¶

Common Mistakes in Bottleneck Analysis¶

Mistake 1: Trusting Averages¶

Mistake 2: Ignoring "Low" Resource Utilization¶

Mistake 3: Changing Multiple Things at Once¶

Mistake 4: Optimizing the Wrong Thing¶

Mistake 5: Not Re-Testing¶

What to Do After Finding the Bottleneck¶

Optimize First¶

Scale When Optimization Plateaus¶

Re-Test and Iterate¶

Related Topics¶