Skip to content

Identifying Bottlenecks

Here is the situation. You're running a load test. Virtual users are experiencing slow response times. You check the server metrics: low-to-moderate CPU utilization, plenty of memory, low disk activity. Everything looks good. But as you apply more load, response times degrade a little. Then a lot. Soon they've blown past acceptable thresholds and kept going.

What's going on? The server metrics "clearly show" your servers aren't busy, yet your site is slow. We encounter this scenario frequently, and the answer is almost always the same: you need to look deeper to find the hidden bottleneck.

This guide explains how to identify bottlenecks using systematic investigation and why different bottleneck types occur, so you can diagnose root causes and take the right corrective actions.


Why Averages Hide Bottlenecks

If you've been using average page load data to evaluate your website's performance, you could be missing important bottlenecks that are reducing conversion rates and causing customers to have a frustrating experience.

Averages are easy to understand, which is precisely why they're dangerous. If the average load time of a page is 2 seconds, many users are seeing load times slower than that, but the average tells you nothing about how much slower.

The Scatter Plot Advantage

Here's a real example. A high-end retailer showed excellent average page load times. No visible slowdown even at 2,250 concurrent users.

What the average hid was a much wider range of response times than you'd imagine. That's why scatter plots are essential: they plot every single response time in the test, revealing the distribution that averages cover up.

The average said "fast." The scatter plot showed that the all-important checkout and shopping cart process had a substantial number of responses in the 5-15 second range, even at low load levels.

Don't Rely on Averages Alone

If you had relied on just average load times, you would have missed that all-important detail that could be driving down sales conversion rates and costing the company money.

Use the Embedded Analytics Dashboard to view scatter plots and response time distributions. Look beyond the average to see the full picture.


Common Bottleneck Patterns

Understanding what bottlenecks look like helps you recognize them quickly.

Pattern 1: Servers Aren't Busy But Site Is Slow

Symptoms:

  • Response times degrade significantly under load
  • CPU utilization is low to moderate (40-60%)
  • Memory usage is fine, disk activity is low
  • As load increases, CPU actually decreases
  • Errors start appearing (500 Internal Server Error)

Example from a real test:

As load increased from 25 to 200 users, CPU utilization climbed to ~55%. But as load continued to increase, CPU utilization did not. In fact, the trend was downward.

Why this happens: The database (or application server) has a concurrency limit. Requests are being queued, waiting their turn, so CPU goes idle even though the system is overloaded.

Where to look next: Almost always, look at the database.

Pattern 2: Vertical Stack Pattern in Scatter Plots

Symptoms:

  • Scattered load times bunch up in recognizable vertical lines
  • Multiple requests complete at exactly the same time
  • Response times span 10-30+ seconds
  • Server CPU is moderate (40-60%) when blocks occur

Why this happens: This is an artifact of how web page response times are measured. When a page is blocked (by a locked database table, for example):

  1. First user requests the blocked page → starts waiting
  2. Second user requests the same page → also waits
  3. More users pile up, all waiting
  4. After 35 seconds (for example), the block clears
  5. All waiting requests complete simultaneously → vertical stack pattern

The first user to request the blocked page waits the longest, the second a little less, and the most recent user may not wait much at all. But no matter how long they waited, they all saw the page update at approximately the same time.

Typical causes of blocking:

  • Locked database tables (most common)
  • Load balancer sending too much load to one server while others are lightly loaded
  • One server in a cluster has crashed
  • Dynamic server scaling taking too long
  • Severe network congestion or failure

AI Pattern Recognition

Ask the AI to identify blocking patterns:

  • "Show me scatter plot patterns indicating database locks"
  • "Identify vertical stack patterns in response times"
  • "Why are requests completing in synchronized bursts?"
  • "Explain the blocking pattern at 300 users"

Pattern 3: Gradual Degradation with Spare Capacity

Symptoms:

  • Response times slowly increase as load increases
  • Server resources show plenty of spare capacity
  • No obvious spikes or errors
  • Throughput plateaus even though CPU/memory are not maxed

Why this happens: Parallelism limits. The system can only process N operations simultaneously, even on a 16-core server. If each operation takes 20ms and you can only run one at a time, your maximum throughput is 50 ops/sec regardless of CPU cores.

Where to look next: Database parallelism settings, connection pool sizes, thread pool limits.


Systematic Bottleneck Investigation

Follow this three-step workflow to identify bottlenecks methodically.

Step 1: Long-Running Queries

Start with the database. Finding long-running queries is DBA 101. That's always the first place to look.

What to check:

  • Are there any queries that are running slowly?
  • Are they used frequently in the workflows being tested?
  • Do slow queries correlate with response time spikes?

How to prioritize:

  • High priority: Queries used frequently in tested workflows
  • Lower priority: Slow queries not in critical paths
  • Optimize high-priority queries first, then re-test

Typical slow query causes:

  • Missing database indexes
  • Full table scans instead of indexed lookups
  • Complex joins on large tables
  • Inefficient WHERE clauses
  • N+1 query problems (repeated queries in loops)

Work with Your DBA

Have a DBA present during testing so they can inspect the system while it's under load and capture data needed for later analysis.

After optimizing, re-test: Did it help? Are there more long-running queries to be optimized? Repeat until you're satisfied that no more gains can be made here.

AI Query Analysis

Ask the AI to correlate slow queries with performance:

  • "Which database queries correlate with slow response times?"
  • "Show me the top 5 slowest queries and their impact"
  • "Compare query performance at 100 vs 300 users"
  • "Explain why this query is slow under load"

Step 2: Locking and Blocking

Locking prevents simultaneous access to database data to ensure consistent results. Blocking is what happens when one user locks data and a second user needs the same data. The second user waits. They are blocked.

Why blocking doesn't show up in small tests:

  • If you don't have simultaneous access, you won't have blocking
  • Even with a handful of users, blocking is highly unlikely to impact performance
  • Blocking frequently doesn't show up until the system is under load from many simultaneous users

How to detect blocking:

  • Look for vertical stack patterns in scatter plots
  • Check database monitoring for lock waits
  • Review database wait statistics (PAGEIOLATCH, LCK_M_X, etc. in SQL Server)
  • Examine query execution plans for blocking indicators

Common causes of blocking:

  • Read locks held too long: Transactions that read data and hold locks while doing other work
  • Write locks on hot tables: High-concurrency updates to the same rows (e.g., inventory counts, sequence generators)
  • Isolation level too strict: SERIALIZABLE isolation when READ COMMITTED would suffice
  • Long-running transactions: Transactions that span multiple user interactions

Approaches to reduce blocking:

  • Shorten transaction duration (commit faster)
  • Use optimistic locking instead of pessimistic locking
  • Lower isolation levels where consistency allows
  • Partition hot tables to spread contention
  • Use read replicas for read-heavy workloads

After reducing blocking, re-test and iterate as long as opportunities to reduce blocking persist.

AI Blocking Analysis

Ask the AI to identify locking issues:

  • "Are there database blocking issues in this test?"
  • "Show me lock wait times correlated with response time spikes"
  • "Which tables are causing blocking contention?"
  • "Recommend solutions for the observed blocking patterns"

Step 3: Parallelism Limits

Sometimes you get to this point: the queries are fast, blocking has been minimized, but the system is still slow under load and the database hardware does not appear stressed. CPU and disk I/O both have capacity to spare. How can that be?

The answer is parallelism: specifically, the number of simultaneous operations the database can perform at the same time.

Example scenario:

  • Queries are fast (average 20ms completion time)
  • Without parallelization, when incoming rate exceeds 50/sec, the system will start to slow
  • Why? At 20ms per operation, 50 operations will take a full second
  • When load increases to 100/sec, those 100 operations will take 2 seconds to complete
  • Some users will be waiting a second longer than they did at 50 ops/sec
  • With each passing second, the system falls further behind and response times continue to degrade

The system is now overloaded, and if it's running on a 16-core server, CPU utilization may be as little as 7%. Fifteen cores sitting idle while one does all the work.

Allowing 16 operations to run in parallel essentially multiplies system capacity by a factor of 16, assuming no other limits are reached (disk I/O, memory, bandwidth, or blocking).

Parallelism in databases is complex:

  • Many queries can run in parallel on a single core (while one waits for disk, another can execute)
  • Queries may be split into multiple tasks that run in parallel
  • Modern databases have parallelization turned on with default settings
  • Defaults are chosen to be safe, not optimal

Tuning parallelism requires understanding:

  • Max degree of parallelism (MAXDOP) settings
  • Cost threshold for parallelism
  • Whether individual queries benefit from parallelism
  • Interaction with other settings (memory grants, thread pools)

Parallelism Is Not a Silver Bullet

Just changing the amount of parallelism, without understanding the underlying causes, could result in no improvement, or even worse performance. Some queries don't benefit from parallelism and may perform worse with it enabled.

AI Parallelism Analysis

Ask the AI to diagnose parallelism issues:

  • "Is this a parallelism bottleneck?"
  • "Why are servers not busy despite slow response times?"
  • "Explain the relationship between query throughput and CPU usage"
  • "Recommend parallelism tuning for this database workload"

Bottleneck Types by Resource

Different resources bottleneck in different ways. Here's how to identify each type.

CPU Bottlenecks

Symptoms:

  • CPU utilization at or near 100%
  • Response times increase linearly with CPU usage
  • Throughput plateaus as CPU maxes out
  • Load average (Linux) or processor queue length (Windows) is high

Causes:

  • Application CPU-intensive operations: Complex business logic, encryption, compression
  • Inefficient algorithms: O(n²) loops, regex processing, XML/JSON parsing
  • Web server overhead: Static file serving, SSL/TLS handshakes
  • Lack of caching: Recomputing results instead of caching

Solutions:

  • Profile application to find CPU-intensive code paths
  • Optimize algorithms (better data structures, caching)
  • Add CPU cores (vertical scaling)
  • Add web servers (horizontal scaling)
  • Use CDN for static content
  • Enable caching at multiple levels

AI CPU Analysis

  • "Analyze CPU bottlenecks in this test"
  • "Which pages are CPU-intensive?"
  • "Compare CPU usage across user levels"

Memory Bottlenecks

Symptoms:

  • High memory usage (>90% in use)
  • Disk paging/swapping activity increases
  • Response times degrade as memory fills
  • Out of memory errors or crashes

Causes:

  • Memory leaks: Application fails to release memory
  • Session state bloat: Too much data stored in user sessions
  • Large result sets: Queries returning millions of rows
  • Inefficient caching: Cache grows unbounded without eviction

Solutions:

  • Fix memory leaks (profile to find allocations)
  • Reduce session state size
  • Paginate large result sets
  • Configure cache eviction policies
  • Add more RAM
  • Use external cache (Redis, Memcached)

AI Memory Analysis

  • "Is memory a bottleneck?"
  • "Show memory trends correlated with errors"
  • "Identify memory leaks in the application"

Disk I/O Bottlenecks

Symptoms:

  • High disk queue length
  • Disk utilization at 100%
  • Database response times increase
  • Disk wait time is significant portion of response time

Causes:

  • Slow disks: Spinning HDDs instead of SSDs
  • Missing indexes: Queries doing full table scans
  • High write volume: Logging, temp tables, sort operations
  • Insufficient IOPS: Storage system can't handle request rate

Solutions:

  • Add database indexes to reduce scans
  • Upgrade to SSDs or faster storage
  • Increase IOPS capacity (cloud storage tiers)
  • Tune database buffer pool/cache
  • Use read replicas to distribute load
  • Partition large tables

AI Disk Analysis

  • "Identify disk I/O bottlenecks"
  • "Show correlation between disk waits and response times"
  • "Which queries are causing high disk I/O?"

Network Bottlenecks

Symptoms:

  • High network utilization (>80% of capacity)
  • Packet loss or retransmissions
  • Response times increase with payload size
  • Latency increases under load

Causes:

  • Bandwidth saturation: Traffic exceeds link capacity
  • Chatty protocols: Too many round trips (N+1 API calls)
  • Large payloads: Uncompressed responses, inefficient serialization
  • Network congestion: Shared infrastructure, routing issues

Solutions:

  • Enable compression (gzip, Brotli)
  • Reduce payload sizes (pagination, field filtering)
  • Batch API calls to reduce round trips
  • Use CDN for static content
  • Upgrade network capacity
  • Optimize routing

AI Network Analysis

  • "Are there network bottlenecks?"
  • "Show bandwidth usage correlated with response times"
  • "Identify pages with large payloads"

Database-Specific Bottlenecks

Beyond general resource bottlenecks, databases have specific limitations:

Connection pool exhaustion:

  • Symptom: Errors like "max connections reached"
  • Cause: Too few connections in pool for concurrent load
  • Solution: Increase connection pool size (but watch for other limits)

Query execution plan issues:

  • Symptom: Queries slow despite available CPU/disk
  • Cause: Bad execution plans (table scans, missing statistics)
  • Solution: Update statistics, rebuild indexes, hint optimizer

Tempdb contention (SQL Server):

  • Symptom: PAGELATCH_UP waits on tempdb
  • Cause: High concurrency on temp tables/sort operations
  • Solution: Add tempdb files, reduce temp table usage

Undo/redo log contention:

  • Symptom: Log file I/O is bottleneck
  • Cause: High transaction volume, large transactions
  • Solution: Batch commits, faster log disk, tune log size

AI Database-Specific Analysis

  • "Identify database-specific bottlenecks"
  • "Show connection pool statistics"
  • "Analyze query execution plans"
  • "Diagnose tempdb contention"

Using the Dashboard to Find Bottlenecks

The Embedded Analytics Dashboard provides powerful correlation capabilities for bottleneck identification.

Correlation Workflow

1. Start with the Metrics tab:

  • Identify when response times degraded
  • Note the user level or time period

2. Switch to the Servers tab:

  • Look at server metrics for the same time period
  • Check for resources at or near capacity

3. Ask: "What's maxed out?"

  • CPU at 100%? → CPU bottleneck
  • Memory at 100% with paging? → Memory bottleneck
  • Disk queue length high? → Disk I/O bottleneck
  • Network utilization high? → Network bottleneck
  • Nothing maxed out? → Database concurrency/locking/parallelism issue

4. Drill down to Pages tab:

  • Identify which pages are slowest
  • Check if specific pages correlate with resource spikes

5. Examine Errors tab (if errors exist):

  • Look for error patterns (timeouts, 500 errors, connection refused)
  • Correlate errors with resource exhaustion

Visual Correlation

The dashboard overlays response times on server metrics so you can see cause-and-effect relationships:

  • Response times spike → What server resource spiked at same moment?
  • Database CPU hits 98% → Response times jump from 1.2s to 4.5s at same time
  • Strong correlation = likely bottleneck

AI Dashboard Correlation

Ask the AI to analyze correlations automatically:

  • "What's the bottleneck in this test?"
  • "Correlate server metrics with response time degradation"
  • "Identify which resource limits capacity"
  • "Compare bottlenecks at different user levels"
  • "Explain why response times increased at 350 users"

Bottleneck Investigation Checklist

Use this systematic checklist when investigating performance problems:

Initial Assessment

  • [ ] Review response time trends (time-based and user-level views)
  • [ ] Identify when degradation started (user level or time)
  • [ ] Check for errors (type, frequency, timing)
  • [ ] Note server resource utilization at degradation point

Server Resource Check

  • [ ] CPU: Is any server at >90% CPU?
  • [ ] Memory: Is any server at >90% memory with paging?
  • [ ] Disk: Is disk queue length high? Disk utilization at 100%?
  • [ ] Network: Is network bandwidth >80% in use?

Database Investigation

  • [ ] Long-running queries: Are any queries taking >1s under load?
  • [ ] Locking/blocking: Check for vertical stack patterns, lock waits
  • [ ] Parallelism: Is CPU low despite degradation? Check parallelism settings
  • [ ] Connection pool: Check for connection exhaustion errors
  • [ ] Execution plans: Review query plans for inefficiencies

Application Investigation

  • [ ] Slow pages: Which pages have highest response times?
  • [ ] Component breakdown: Is wait time, receive time, or processing time highest?
  • [ ] Error patterns: Do errors correlate with resource exhaustion?
  • [ ] Caching: Is caching effective? Cache hit rates?

Next Steps

  • [ ] Prioritize bottlenecks by business impact
  • [ ] Test fixes one at a time (so you know what worked)
  • [ ] Re-run load test after each fix
  • [ ] Document findings and improvements

Common Mistakes in Bottleneck Analysis

Mistake 1: Trusting Averages

Problem: Averages hide the distribution. Some users experience 10s response times while the average is 2s.

Solution: Always check percentiles (95th, 99th) and scatter plots.

Mistake 2: Ignoring "Low" Resource Utilization

Problem: Seeing 40% CPU and concluding "server has capacity" when it's actually a concurrency bottleneck.

Solution: When servers aren't busy but site is slow, look at database locking, blocking, and parallelism.

Mistake 3: Changing Multiple Things at Once

Problem: Making several optimizations simultaneously, then not knowing which one helped.

Solution: Test one fix at a time. Re-run load test after each change.

Mistake 4: Optimizing the Wrong Thing

Problem: Spending days optimizing a query that's only called once per session when the real problem is a query called 50 times per page.

Solution: Use profiling data and load test results to prioritize. Optimize high-frequency operations first.

Mistake 5: Not Re-Testing

Problem: Assuming the fix worked without measuring the actual improvement.

Solution: Always re-run load tests after changes. Measure the improvement (or regression).


What to Do After Finding the Bottleneck

Once you've identified the bottleneck, you need to decide: optimize or scale?

Optimize First

Before adding more hardware, optimize what you have:

  • Fix slow queries (add indexes, rewrite logic)
  • Reduce locking/blocking (shorten transactions, lower isolation levels)
  • Tune parallelism (MAXDOP, connection pools)
  • Enable caching (application, database, CDN)
  • Optimize algorithms (better data structures, fewer loops)

Benefits: Often free or low cost, improves efficiency at all scales.

Scale When Optimization Plateaus

After optimizing, if you still need more capacity:

Vertical scaling (bigger servers):

  • More CPU cores, more RAM, faster disks
  • Simple but limited by maximum instance size
  • Good for: CPU, memory, disk-bound workloads

Horizontal scaling (more servers):

  • Add web servers, app servers, database read replicas
  • Requires load balancing and distributed architecture
  • Good for: Read-heavy workloads, stateless applications

Database-specific scaling:

  • Read replicas for read-heavy workloads
  • Sharding for write scalability
  • Caching layers (Redis, Memcached) to reduce database load

Re-Test and Iterate

After each change, run the same load test scenario. Compare results to baseline. Verify the bottleneck is resolved (or at least reduced). Then check for new bottlenecks that emerge at higher capacity.

Performance optimization is iterative: fix one bottleneck, test, find the next one, repeat. You keep going until you meet your capacity goals or run out of things to optimize.