Skip to content

Monitoring During a Load Test

Load test monitoring is detective work in real time. You're watching hundreds of virtual users stress your application, looking for clues about performance limits, bottlenecks, and failure modes. Response times spike at 300 VUs? That's a clue. Database CPU hits 100% at the same moment? That's the culprit.

Running a load test without monitoring is like driving blindfolded. You'll crash, but you won't know why. Monitoring tells you not just "the server failed at 500 VUs" but "the database connection pool exhausted at 500 VUs because only 100 connections were configured."

This guide explains:

  • Which metrics to watch during a load test
  • What each metric means and why it matters
  • How to correlate metrics to identify bottlenecks
  • Warning signs that indicate problems
  • Real-time degradation detection with AI assistance

Key Metrics Overview

Load testing produces dozens of metrics, but these seven are the ones to watch in real time:

Metric What It Measures Why It Matters
Response Time (avg) Time from request sent to response received User experience (slow responses = frustrated users)
Hits/sec HTTP requests per second across all VUs Server throughput: how many requests/sec can it handle?
Bandwidth Data transferred per second (download + upload) Network capacity: are you bandwidth-limited?
Virtual Users Number of concurrent VUs executing test case Load level: more VUs = more stress
Errors/sec Failed transactions per second Application health: errors indicate broken functionality
CPU % (server) Server CPU utilization Compute capacity: high CPU = compute-bound
Memory % (server) Server memory utilization Memory capacity: high memory = potential leak or cache issue

These metrics tell a story: response times increase (the symptom) because CPU hits 100% (the cause). Monitoring reveals the narrative.


Response Time: The Primary Performance Metric

Response time is what users experience. Everything else is diagnostic. If response times are fast, users are happy. If response times are slow, users are frustrated, and it doesn't matter that your server CPU is only 30%.

What Response Time Measures

Response time = time from sending HTTP request to receiving complete response:

[VU sends request] → [network latency] → [server processes] →
[network latency] → [VU receives response] = Response Time

Components:

  • Network latency: Time for packets to travel (typically 10-100ms)
  • Server processing: Time for server to generate response (varies: 10ms for cached page, 1000ms for complex database query)
  • Network download time: Time to transfer response body (depends on response size and bandwidth)

What "good" response times look like:

Page Type Acceptable Good Excellent
Static content (images, CSS) < 500ms < 200ms < 100ms
Dynamic pages (database queries) < 2000ms < 1000ms < 500ms
API calls (simple) < 500ms < 200ms < 100ms
API calls (complex) < 2000ms < 1000ms < 500ms

These are guidelines. Your application's acceptable response times depend on user expectations and business requirements.


Interpreting Response Time Patterns

Response time patterns reveal how the server behaves under load. Learn to read them.

Pattern 1: Flat Line (Ideal)

What it looks like:

Response Time (ms)
200 |████████████████████████████████
    |
  0 +--------------------------------
    0  100  200  300  400  500 (VUs)

What it means: Server handling load beautifully. Response times stay constant as VUs increase.

Why this happens: Server has capacity to spare (CPU 40%, memory 50%, database well-optimized).

What to do: Keep ramping VUs to find the capacity limit.


Pattern 2: Gradual Increase (Normal)

What it looks like:

Response Time (ms)
400 |                      ██████████
300 |              ███████████
200 |      ████████████
100 |██████████
    +----------------------------------
    0  100  200  300  400  500 (VUs)

What it means: Server handling load well, but performance degrades proportionally with load.

Why this happens: Server resource contention increases as VUs increase (more DB connections, more CPU threads, more memory usage).

What to do: Acceptable if degradation is linear and response times stay under acceptable thresholds (e.g., < 2000ms).


Pattern 3: Sharp Spike (Capacity Limit Reached)

What it looks like:

Response Time (ms)
8000|                    ████
2000|                ████
 500|        ████████
 100|████████
    +----------------------------
    0  100  200  300  400 (VUs)

What it means: Server hit a hard limit at around 300 VUs, with response times jumping from 500ms to 8,000ms in one load level.

Why this happens: Resource exhaustion, plain and simple. Database connection pool full, memory exhausted, CPU maxed, thread pool saturated. Something ran out.

What to do: Note the VU count when the spike occurred (capacity limit = 300 VUs). Check server metrics (CPU, memory, database connections) to identify which bottleneck you hit. Check the Errors View for specific error messages, which often reveal exactly what exhausted ("connection pool exhausted" being a common one).

This is valuable data. You found the breaking point.


Pattern 4: Erratic Spikes (Intermittent Issues)

What it looks like:

Response Time (ms)
5000|    ██       ██          ██
2000|    ██       ██      ████
 500|████████████████████████████
    +-------------------------------
    0  100  200  300  400  500 (VUs)

What it means: Intermittent performance issues, with occasional slow requests (outliers).

Why this happens: Garbage collection pauses in the JVM or .NET CLR. Database query timeouts where slow queries occasionally take 10x longer. Network hiccups (packet loss, retransmissions). Background jobs like cron tasks or scheduled processes competing for resources.

What to do: Check whether spikes correlate with time. If they happen every 5 minutes, that's a scheduled job. Review server logs during spike periods, paying attention to GC logs and slow query logs. If spikes are random and infrequent (under 5% of requests), they may be acceptable noise. If they're frequent (over 10%), investigate the root cause: GC tuning, query optimization.


Ask the AI to Interpret Response Time Patterns

If you see unusual response time patterns:

My response times are flat at 100ms until 250 VUs, then jump to 5000ms at 300 VUs.
CPU is at 60% and memory is at 50%. What's the bottleneck?

The AI can:

  • Analyze response time patterns to identify capacity limits
  • Correlate response times with server metrics (CPU, memory, database) to pinpoint bottlenecks
  • Distinguish between normal degradation vs. hard limits vs. intermittent issues
  • Recommend immediate actions (stop test, add resources, investigate specific components)
  • Suggest long-term fixes (optimize queries, increase connection pools, add caching)

Hits/Sec: Server Throughput

Hits/sec measures how many HTTP requests your server processes per second. Raw throughput capacity.

What Hits/Sec Tells You

Hits/sec should increase as VUs increase:

VUs Expected Hits/Sec (Typical Web App) Why
100 ~500-1000 Each VU makes 5-10 requests/min (60 sec think time)
200 ~1000-2000 Linear scaling (2x VUs = 2x hits/sec)
500 ~2500-5000 Continues scaling

If hits/sec stops increasing even though VUs keep ramping, the server is maxed out: it can't process more requests even though you're sending them. The VUs are waiting for slow responses, which is also why response times will be spiking.

Example (problem):

VUs Hits/Sec Response Time What It Means
100 1000 100ms Good
200 2000 150ms Good (linear scaling)
300 2500 500ms Scaling slows
400 2500 2000ms Hits/sec plateaued, server can't handle more

This tells you the server maxes out at around 2,500 hits/sec, regardless of how many more VUs you throw at it.


Hits/Sec vs. Response Time Correlation

The relationship between hits/sec and response time reveals server behavior.

Hits/Sec Response Time What It Means
Increasing Flat/Low Server handling load easily (plenty of capacity)
Increasing Gradually increasing Server handling load but approaching limits
Plateaus Spiking Server maxed out, can't process more requests
Decreasing Spiking Server overloaded, actually processing FEWER requests because it's so slow

Decreasing hits/sec is the red flag. The server is so overloaded it's actually processing fewer requests than before. It's going backward.


Bandwidth: Network Throughput

Bandwidth measures data transferred per second (typically in Mbps or Gbps).

What Bandwidth Tells You

Bandwidth should increase as VUs increase (more users = more data transferred):

VUs Expected Bandwidth (Image-Heavy Site) Expected Bandwidth (Text-Heavy Site)
100 ~50 Mbps ~5 Mbps
500 ~250 Mbps ~25 Mbps
1000 ~500 Mbps ~50 Mbps

If bandwidth plateaus (stops increasing even though VUs increase):

  • Network bottleneck: server's network interface maxed out (e.g., 1 Gbps NIC at capacity)
  • Engine bottleneck: load engines maxed out on bandwidth (e.g., cloud engines at 90 Mbps each)

Example (network bottleneck):

VUs Bandwidth Response Time What It Means
100 200 Mbps 100ms Good
500 900 Mbps 150ms Approaching 1 Gbps NIC limit
1000 1000 Mbps 5000ms Network maxed out, server can't send more data

This tells you the server's 1 Gbps network interface is the bottleneck. Not CPU, not database. The network.

Fix: Upgrade to a 10 Gbps NIC, or add a load balancer with multiple servers.


Engine Bandwidth Monitoring

Monitor engine bandwidth in Engines View to ensure engines aren't the bottleneck:

Engine Bandwidth Status What It Means
Engine 1 35 Mbps OK Plenty of headroom
Engine 2 89 Mbps ⚠️ Warning Near capacity (cloud engines max ~90 Mbps)

If engine bandwidth exceeds 80 Mbps: Add more engines to distribute the bandwidth load.

See: Cloud Load Testing for engine bandwidth expectations.


Virtual Users: Load Level

VU count shows the current load level. More VUs means more concurrent users.

VU Ramp Monitoring

VUs should increase according to load profile:

  • Stepped profile: VUs increase in discrete steps (e.g., 100 → 150 → 200 every 5 min)
  • Exponential profile: VUs increase by percentage (e.g., 100 → 125 → 156 → 195)
  • Constant profile: VUs stay constant (e.g., 100 for entire test)

If VUs don't increase on schedule, one of three things happened. Engines detected overload (CPU > 90%) and self-regulated. Engine capacity was exceeded (you asked for 5,000 VUs but engine max is 3,000). Or the test duration was too short to complete all the ramps. Check the Engines View for warnings or "Overloaded" status.


VUs per Engine Distribution

VUs should distribute evenly across engines:

Engine VUs Status Good/Bad
Engine 1 167 OK ✅ Balanced
Engine 2 167 OK ✅ Balanced
Engine 3 166 OK ✅ Balanced

Unbalanced distribution (problem):

Engine VUs Status Good/Bad
Engine 1 450 Overloaded ❌ Imbalanced
Engine 2 25 OK ❌ Imbalanced
Engine 3 25 OK ❌ Imbalanced

This indicates an engine configuration issue or outright engine failure: Engine 1 didn't recognize the other engines and tried to carry the whole load itself.


Errors/Sec: Application Health

Errors/sec shows failed transactions: HTTP errors, timeouts, connection failures.

What Error Rate Means

Errors/Sec Error Rate What It Means
0 0% Perfect, all transactions succeeding
< 5 < 1% Acceptable, occasional transient errors
5-50 1-10% Concerning, investigate root cause
> 50 > 10% Critical, application broken under load

Common error types:

HTTP Status Error Type Likely Cause
401 Unauthorized Authentication failure Session expired, auth tokens invalid
403 Forbidden Permission denied CSRF token missing, session security check failed
404 Not Found Resource not found Dynamic URL correlation failed, resource deleted
500 Internal Server Error Server-side error Application bug, database error, exception
502 Bad Gateway Proxy/load balancer error Backend server down
503 Service Unavailable Server overloaded Connection pool exhausted, server shutdown
504 Gateway Timeout Timeout Backend server too slow
Connection refused Network error Server not listening, firewall blocking
Read timeout Response timeout Server processing took too long

Error Rate During Load Ramp

When errors appear tells you what caused them:

VU Level Error Rate Response Time Diagnosis
0-200 VUs 0% 100ms Good
300 VUs 5% (503 errors) 500ms Connection pool exhaustion starting
400 VUs 25% (503 errors) 5000ms Server overloaded
500 VUs 50% (503 errors + timeouts) Timeouts Server critically overloaded

This tells you the server's capacity limit is around 250 VUs. Beyond that, the connection pool exhausts and errors start piling up.

What to do: Check error details in the Errors View for specific messages. Increase the connection pool on the server (say, from 100 to 500 database connections). Re-run the test to verify the fix.


Ask the AI to Diagnose Error Patterns

If you see errors during load testing:

I'm getting 503 errors starting at 300 VUs. Response times are 5000ms and
server CPU is only 40%. What's wrong?

The AI can:

  • Correlate error types with server metrics to identify root cause
  • Distinguish between application errors (bugs) vs. capacity errors (overload)
  • Explain why specific HTTP status codes appear under load (503 = service unavailable, likely connection pool)
  • Recommend configuration changes (increase connection pools, add caching, optimize queries)
  • Suggest whether errors are acceptable (< 1%) or critical (> 10%)

Server Metrics: Identifying Bottlenecks

Server-side metrics reveal WHY performance degrades. Response times tell you there's a problem. Server metrics tell you what the problem is.

CPU %: Compute Capacity

CPU utilization shows how much compute capacity is used:

CPU % What It Means Action
< 50% Plenty of capacity Keep ramping load
50-70% Moderate usage Watch for degradation
70-90% High usage Approaching limit
> 90% Critically high CPU bottleneck: optimize code or add CPU

Correlating CPU with response times:

CPU % Response Time Diagnosis
40% 100ms CPU not the bottleneck (plenty of capacity)
70% 200ms CPU moderately loaded (normal degradation)
95% 5000ms CPU is the bottleneck: server can't process requests fast enough

If CPU hits 100% and response times spike: you're CPU-bound. Optimize application code, add CPU cores, or scale horizontally by adding servers.


Memory %: Memory Capacity

Memory utilization shows RAM usage:

Memory % What It Means Action
< 70% Healthy Normal
70-85% Moderate Watch for growth
85-95% High Potential memory pressure
> 95% Critical Memory bottleneck or leak

Memory leak pattern:

Time Memory % Response Time Diagnosis
0 min 30% 100ms Good
30 min 50% 150ms Growing (expected)
60 min 75% 500ms Concerning
90 min 95% 5000ms Memory leak: memory keeps growing
120 min 100% (OOM) Crash Server ran out of memory

If memory keeps growing throughout the test, even at constant VU load, you have a memory leak. The application isn't releasing memory that it should be.

What to do: Profile the application with a memory profiler, identify the leak, fix the code. No shortcut.


Database Metrics

Database-specific metrics (if monitoring database server):

Metric What to Watch Red Flag
DB CPU % < 80% > 90% = database compute-bound
DB Connections < max pool size = max pool size = connection pool exhausted
Query time (avg) < 100ms > 1000ms = slow queries
Lock wait time < 10ms > 100ms = database locking/deadlocks
Disk I/O % < 70% > 90% = disk bottleneck (slow storage)

Example (database bottleneck):

Metric Value Diagnosis
Web server CPU 30% Plenty of capacity
Web server memory 40% Plenty of capacity
Database CPU 95% Bottleneck
Database connections 85 / 100 Not maxed
Query time (avg) 2000ms Slow queries

This tells you the database is the bottleneck, not the web server. Optimize queries, add indexes, or add database CPU capacity. The web server is sitting there waiting for the database to finish.


Correlating Metrics to Find Bottlenecks

The power of monitoring is correlation. Any single metric in isolation is ambiguous. Combined, they reveal root causes.

Correlation Pattern 1: CPU Bottleneck

Response Time Hits/Sec Server CPU Database CPU Diagnosis
⬆️ Spiking ⬇️ Plateaus ⬆️ 95% 40% Web server CPU bottleneck

Fix: Optimize application code, add CPU cores, or add web servers.


Correlation Pattern 2: Database Bottleneck

Response Time Hits/Sec Server CPU Database CPU Diagnosis
⬆️ Spiking ⬇️ Plateaus 40% ⬆️ 95% Database CPU bottleneck

Fix: Optimize queries, add indexes, add database CPU capacity, or add read replicas.


Correlation Pattern 3: Network Bottleneck

Response Time Bandwidth Server CPU Server Network Diagnosis
⬆️ Spiking ⬆️ Maxed (1 Gbps) 50% ⬆️ 100% Network bandwidth bottleneck

Fix: Upgrade NIC to 10 Gbps, add CDN for static assets, or optimize response sizes.


Correlation Pattern 4: Connection Pool Exhaustion

Response Time Errors/Sec Server CPU DB Connections Diagnosis
⬆️ Spiking ⬆️ 503 errors 40% ⬆️ 100/100 (maxed) Connection pool exhausted

Fix: Increase database connection pool size (e.g., 100 → 500 connections).


Correlation Pattern 5: Memory Leak

Time Response Time Memory % CPU % Diagnosis
0-30 min 100ms 30% → 50% 60% Normal
30-60 min 200ms 50% → 75% 60% Memory growing (CPU constant)
60-90 min 1000ms 75% → 95% 60% Memory leak
90 min Crash (OOM) 100% N/A Out of memory

Fix: Profile application, find leak, fix code.


Ask the AI to Correlate Metrics

If you're struggling to identify the bottleneck:

Response times are 5000ms at 300 VUs. Server CPU is 40%, memory is 50%, but
database CPU is 95%. What's the bottleneck and how do I fix it?

The AI can:

  • Analyze combinations of metrics to pinpoint the exact bottleneck
  • Distinguish between application bottlenecks (code) vs. infrastructure bottlenecks (CPU/memory/network)
  • Recommend immediate fixes (increase connection pools, optimize queries)
  • Suggest long-term architectural improvements (caching, read replicas, CDN)
  • Validate your diagnosis before you make expensive infrastructure changes

Real-Time Degradation Detection

Detecting performance degradation during the test lets you intervene before wasting hours on a broken test.

Automated Warning Signs

Load Tester monitors for these conditions automatically:

Condition Warning Level What It Means
Engine CPU > 90% ⚠️ Warning Engine overloaded, may self-regulate
Engine bandwidth > 80 Mbps ⚠️ Warning Engine near bandwidth limit
Error rate > 10% 🚨 Critical Application broken under load
Response time > 30 seconds 🚨 Critical Server severely overloaded or timing out
VUs not ramping ⚠️ Warning Engine self-regulation or capacity limit

When warnings appear, investigate immediately. Don't wait for the test to finish.


Manual Degradation Detection

Watch for these patterns during the test:

Pattern What to Watch Action
Response time doubles 100ms → 200ms Note VU count, approaching capacity limit
Response time increases 10x 100ms → 1000ms+ Stop and investigate, something broke
Errors appear 0% → 5%+ Check Errors View for error types
Hits/sec plateaus Increasing → flat Server maxed out, note capacity limit
Memory keeps growing 30% → 50% → 70% → ... Potential memory leak, watch closely

Ask the AI for Real-Time Alerts

Configure the AI to monitor your test in real time:

Monitor my load test and alert me if response times increase 5x or if error
rate exceeds 5%. I'm ramping from 100 to 1000 VUs over 60 minutes.

The AI can:

  • Watch metrics in real time and alert you to degradation patterns
  • Detect capacity limits as they're reached (response times spike at X VUs)
  • Identify correlation breakdowns (hits/sec plateaus while VUs keep increasing)
  • Recommend stopping the test early if conditions are critical (50% error rate)
  • Suggest immediate actions during live tests (add engines, adjust ramp rates)

Next Steps

After monitoring your load test:

If you need to optimize: