Monitoring During a Load Test¶

Load test monitoring is detective work in real time. You're watching hundreds of virtual users stress your application, looking for clues about performance limits, bottlenecks, and failure modes. Response times spike at 300 VUs? That's a clue. Database CPU hits 100% at the same moment? That's the culprit.

Running a load test without monitoring is like driving blindfolded. You'll crash, but you won't know why. Monitoring tells you not just "the server failed at 500 VUs" but "the database connection pool exhausted at 500 VUs because only 100 connections were configured."

This guide explains:

Which metrics to watch during a load test
What each metric means and why it matters
How to correlate metrics to identify bottlenecks
Warning signs that indicate problems
Real-time degradation detection with AI assistance

Key Metrics Overview¶

Load testing produces dozens of metrics, but these seven are the ones to watch in real time:

Metric	What It Measures	Why It Matters
Response Time (avg)	Time from request sent to response received	User experience (slow responses = frustrated users)
Hits/sec	HTTP requests per second across all VUs	Server throughput: how many requests/sec can it handle?
Bandwidth	Data transferred per second (download + upload)	Network capacity: are you bandwidth-limited?
Virtual Users	Number of concurrent VUs executing test case	Load level: more VUs = more stress
Errors/sec	Failed transactions per second	Application health: errors indicate broken functionality
CPU % (server)	Server CPU utilization	Compute capacity: high CPU = compute-bound
Memory % (server)	Server memory utilization	Memory capacity: high memory = potential leak or cache issue

These metrics tell a story: response times increase (the symptom) because CPU hits 100% (the cause). Monitoring reveals the narrative.

Response Time: The Primary Performance Metric¶

Response time is what users experience. Everything else is diagnostic. If response times are fast, users are happy. If response times are slow, users are frustrated, and it doesn't matter that your server CPU is only 30%.

What Response Time Measures¶

Response time = time from sending HTTP request to receiving complete response:

[VU sends request] → [network latency] → [server processes] →
[network latency] → [VU receives response] = Response Time

Components:

Network latency: Time for packets to travel (typically 10-100ms)
Server processing: Time for server to generate response (varies: 10ms for cached page, 1000ms for complex database query)
Network download time: Time to transfer response body (depends on response size and bandwidth)

What "good" response times look like:

Page Type	Acceptable	Good	Excellent
Static content (images, CSS)	< 500ms	< 200ms	< 100ms
Dynamic pages (database queries)	< 2000ms	< 1000ms	< 500ms
API calls (simple)	< 500ms	< 200ms	< 100ms
API calls (complex)	< 2000ms	< 1000ms	< 500ms

These are guidelines. Your application's acceptable response times depend on user expectations and business requirements.

Interpreting Response Time Patterns¶

Response time patterns reveal how the server behaves under load. Learn to read them.

Pattern 1: Flat Line (Ideal)¶

What it looks like:

Response Time (ms)
200 |████████████████████████████████
    |
  0 +--------------------------------
    0  100  200  300  400  500 (VUs)

What it means: Server handling load beautifully. Response times stay constant as VUs increase.

Why this happens: Server has capacity to spare (CPU 40%, memory 50%, database well-optimized).

What to do: Keep ramping VUs to find the capacity limit.

Pattern 2: Gradual Increase (Normal)¶

What it looks like:

Response Time (ms)
400 |                      ██████████
300 |              ███████████
200 |      ████████████
100 |██████████
    +----------------------------------
    0  100  200  300  400  500 (VUs)

What it means: Server handling load well, but performance degrades proportionally with load.

Why this happens: Server resource contention increases as VUs increase (more DB connections, more CPU threads, more memory usage).

What to do: Acceptable if degradation is linear and response times stay under acceptable thresholds (e.g., < 2000ms).

Pattern 3: Sharp Spike (Capacity Limit Reached)¶

What it looks like:

Response Time (ms)
8000|                    ████
2000|                ████
 500|        ████████
 100|████████
    +----------------------------
    0  100  200  300  400 (VUs)

What it means: Server hit a hard limit at around 300 VUs, with response times jumping from 500ms to 8,000ms in one load level.

Why this happens: Resource exhaustion, plain and simple. Database connection pool full, memory exhausted, CPU maxed, thread pool saturated. Something ran out.

What to do: Note the VU count when the spike occurred (capacity limit = 300 VUs). Check server metrics (CPU, memory, database connections) to identify which bottleneck you hit. Check the Errors View for specific error messages, which often reveal exactly what exhausted ("connection pool exhausted" being a common one).

This is valuable data. You found the breaking point.

Pattern 4: Erratic Spikes (Intermittent Issues)¶

What it looks like:

Response Time (ms)
5000|    ██       ██          ██
2000|    ██       ██      ████
 500|████████████████████████████
    +-------------------------------
    0  100  200  300  400  500 (VUs)

What it means: Intermittent performance issues, with occasional slow requests (outliers).

Why this happens: Garbage collection pauses in the JVM or .NET CLR. Database query timeouts where slow queries occasionally take 10x longer. Network hiccups (packet loss, retransmissions). Background jobs like cron tasks or scheduled processes competing for resources.

What to do: Check whether spikes correlate with time. If they happen every 5 minutes, that's a scheduled job. Review server logs during spike periods, paying attention to GC logs and slow query logs. If spikes are random and infrequent (under 5% of requests), they may be acceptable noise. If they're frequent (over 10%), investigate the root cause: GC tuning, query optimization.

Ask the AI to Interpret Response Time Patterns

If you see unusual response time patterns:

My response times are flat at 100ms until 250 VUs, then jump to 5000ms at 300 VUs.
CPU is at 60% and memory is at 50%. What's the bottleneck?

The AI can:

Analyze response time patterns to identify capacity limits
Correlate response times with server metrics (CPU, memory, database) to pinpoint bottlenecks
Distinguish between normal degradation vs. hard limits vs. intermittent issues
Recommend immediate actions (stop test, add resources, investigate specific components)
Suggest long-term fixes (optimize queries, increase connection pools, add caching)

Hits/Sec: Server Throughput¶

Hits/sec measures how many HTTP requests your server processes per second. Raw throughput capacity.

What Hits/Sec Tells You¶

Hits/sec should increase as VUs increase:

VUs	Expected Hits/Sec (Typical Web App)	Why
100	~500-1000	Each VU makes 5-10 requests/min (60 sec think time)
200	~1000-2000	Linear scaling (2x VUs = 2x hits/sec)
500	~2500-5000	Continues scaling

If hits/sec stops increasing even though VUs keep ramping, the server is maxed out: it can't process more requests even though you're sending them. The VUs are waiting for slow responses, which is also why response times will be spiking.

Example (problem):

VUs	Hits/Sec	Response Time	What It Means
100	1000	100ms	Good
200	2000	150ms	Good (linear scaling)
300	2500	500ms	Scaling slows
400	2500	2000ms	Hits/sec plateaued, server can't handle more

This tells you the server maxes out at around 2,500 hits/sec, regardless of how many more VUs you throw at it.

Hits/Sec vs. Response Time Correlation¶

The relationship between hits/sec and response time reveals server behavior.

Hits/Sec	Response Time	What It Means
Increasing	Flat/Low	Server handling load easily (plenty of capacity)
Increasing	Gradually increasing	Server handling load but approaching limits
Plateaus	Spiking	Server maxed out, can't process more requests
Decreasing	Spiking	Server overloaded, actually processing FEWER requests because it's so slow

Decreasing hits/sec is the red flag. The server is so overloaded it's actually processing fewer requests than before. It's going backward.

Bandwidth: Network Throughput¶

Bandwidth measures data transferred per second (typically in Mbps or Gbps).

What Bandwidth Tells You¶

Bandwidth should increase as VUs increase (more users = more data transferred):

VUs	Expected Bandwidth (Image-Heavy Site)	Expected Bandwidth (Text-Heavy Site)
100	~50 Mbps	~5 Mbps
500	~250 Mbps	~25 Mbps
1000	~500 Mbps	~50 Mbps

If bandwidth plateaus (stops increasing even though VUs increase):

Network bottleneck: server's network interface maxed out (e.g., 1 Gbps NIC at capacity)
Engine bottleneck: load engines maxed out on bandwidth (e.g., cloud engines at 90 Mbps each)

Example (network bottleneck):

VUs	Bandwidth	Response Time	What It Means
100	200 Mbps	100ms	Good
500	900 Mbps	150ms	Approaching 1 Gbps NIC limit
1000	1000 Mbps	5000ms	Network maxed out, server can't send more data

This tells you the server's 1 Gbps network interface is the bottleneck. Not CPU, not database. The network.

Fix: Upgrade to a 10 Gbps NIC, or add a load balancer with multiple servers.

Engine Bandwidth Monitoring¶

Monitor engine bandwidth in Engines View to ensure engines aren't the bottleneck:

Engine	Bandwidth	Status	What It Means
Engine 1	35 Mbps	OK	Plenty of headroom
Engine 2	89 Mbps	⚠️ Warning	Near capacity (cloud engines max ~90 Mbps)

If engine bandwidth exceeds 80 Mbps: Add more engines to distribute the bandwidth load.

See: Cloud Load Testing for engine bandwidth expectations.

Virtual Users: Load Level¶

VU count shows the current load level. More VUs means more concurrent users.

VU Ramp Monitoring¶

VUs should increase according to load profile:

Stepped profile: VUs increase in discrete steps (e.g., 100 → 150 → 200 every 5 min)
Exponential profile: VUs increase by percentage (e.g., 100 → 125 → 156 → 195)
Constant profile: VUs stay constant (e.g., 100 for entire test)

If VUs don't increase on schedule, one of three things happened. Engines detected overload (CPU > 90%) and self-regulated. Engine capacity was exceeded (you asked for 5,000 VUs but engine max is 3,000). Or the test duration was too short to complete all the ramps. Check the Engines View for warnings or "Overloaded" status.

VUs per Engine Distribution¶

VUs should distribute evenly across engines:

Engine	VUs	Status	Good/Bad
Engine 1	167	OK	✅ Balanced
Engine 2	167	OK	✅ Balanced
Engine 3	166	OK	✅ Balanced

Unbalanced distribution (problem):

Engine	VUs	Status	Good/Bad
Engine 1	450	Overloaded	❌ Imbalanced
Engine 2	25	OK	❌ Imbalanced
Engine 3	25	OK	❌ Imbalanced

This indicates an engine configuration issue or outright engine failure: Engine 1 didn't recognize the other engines and tried to carry the whole load itself.

Errors/Sec: Application Health¶

Errors/sec shows failed transactions: HTTP errors, timeouts, connection failures.

What Error Rate Means¶

Errors/Sec	Error Rate	What It Means
0	0%	Perfect, all transactions succeeding
< 5	< 1%	Acceptable, occasional transient errors
5-50	1-10%	Concerning, investigate root cause
> 50	> 10%	Critical, application broken under load

Common error types:

HTTP Status	Error Type	Likely Cause
401 Unauthorized	Authentication failure	Session expired, auth tokens invalid
403 Forbidden	Permission denied	CSRF token missing, session security check failed
404 Not Found	Resource not found	Dynamic URL correlation failed, resource deleted
500 Internal Server Error	Server-side error	Application bug, database error, exception
502 Bad Gateway	Proxy/load balancer error	Backend server down
503 Service Unavailable	Server overloaded	Connection pool exhausted, server shutdown
504 Gateway Timeout	Timeout	Backend server too slow
Connection refused	Network error	Server not listening, firewall blocking
Read timeout	Response timeout	Server processing took too long

Error Rate During Load Ramp¶

When errors appear tells you what caused them:

VU Level	Error Rate	Response Time	Diagnosis
0-200 VUs	0%	100ms	Good
300 VUs	5% (503 errors)	500ms	Connection pool exhaustion starting
400 VUs	25% (503 errors)	5000ms	Server overloaded
500 VUs	50% (503 errors + timeouts)	Timeouts	Server critically overloaded

This tells you the server's capacity limit is around 250 VUs. Beyond that, the connection pool exhausts and errors start piling up.

What to do: Check error details in the Errors View for specific messages. Increase the connection pool on the server (say, from 100 to 500 database connections). Re-run the test to verify the fix.

Ask the AI to Diagnose Error Patterns

If you see errors during load testing:

I'm getting 503 errors starting at 300 VUs. Response times are 5000ms and
server CPU is only 40%. What's wrong?

The AI can:

Correlate error types with server metrics to identify root cause
Distinguish between application errors (bugs) vs. capacity errors (overload)
Explain why specific HTTP status codes appear under load (503 = service unavailable, likely connection pool)
Recommend configuration changes (increase connection pools, add caching, optimize queries)
Suggest whether errors are acceptable (< 1%) or critical (> 10%)

Server Metrics: Identifying Bottlenecks¶

Server-side metrics reveal WHY performance degrades. Response times tell you there's a problem. Server metrics tell you what the problem is.

CPU %: Compute Capacity¶

CPU utilization shows how much compute capacity is used:

CPU %	What It Means	Action
< 50%	Plenty of capacity	Keep ramping load
50-70%	Moderate usage	Watch for degradation
70-90%	High usage	Approaching limit
> 90%	Critically high	CPU bottleneck: optimize code or add CPU

Correlating CPU with response times:

CPU %	Response Time	Diagnosis
40%	100ms	CPU not the bottleneck (plenty of capacity)
70%	200ms	CPU moderately loaded (normal degradation)
95%	5000ms	CPU is the bottleneck: server can't process requests fast enough

If CPU hits 100% and response times spike: you're CPU-bound. Optimize application code, add CPU cores, or scale horizontally by adding servers.

Memory %: Memory Capacity¶

Memory utilization shows RAM usage:

Memory %	What It Means	Action
< 70%	Healthy	Normal
70-85%	Moderate	Watch for growth
85-95%	High	Potential memory pressure
> 95%	Critical	Memory bottleneck or leak

Memory leak pattern:

Time	Memory %	Response Time	Diagnosis
0 min	30%	100ms	Good
30 min	50%	150ms	Growing (expected)
60 min	75%	500ms	Concerning
90 min	95%	5000ms	Memory leak: memory keeps growing
120 min	100% (OOM)	Crash	Server ran out of memory

If memory keeps growing throughout the test, even at constant VU load, you have a memory leak. The application isn't releasing memory that it should be.

What to do: Profile the application with a memory profiler, identify the leak, fix the code. No shortcut.

Database Metrics¶

Database-specific metrics (if monitoring database server):

Metric	What to Watch	Red Flag
DB CPU %	< 80%	> 90% = database compute-bound
DB Connections	< max pool size	= max pool size = connection pool exhausted
Query time (avg)	< 100ms	> 1000ms = slow queries
Lock wait time	< 10ms	> 100ms = database locking/deadlocks
Disk I/O %	< 70%	> 90% = disk bottleneck (slow storage)

Example (database bottleneck):

Metric	Value	Diagnosis
Web server CPU	30%	Plenty of capacity
Web server memory	40%	Plenty of capacity
Database CPU	95%	Bottleneck
Database connections	85 / 100	Not maxed
Query time (avg)	2000ms	Slow queries

This tells you the database is the bottleneck, not the web server. Optimize queries, add indexes, or add database CPU capacity. The web server is sitting there waiting for the database to finish.

Correlating Metrics to Find Bottlenecks¶

The power of monitoring is correlation. Any single metric in isolation is ambiguous. Combined, they reveal root causes.

Correlation Pattern 1: CPU Bottleneck¶

Response Time	Hits/Sec	Server CPU	Database CPU	Diagnosis
⬆️ Spiking	⬇️ Plateaus	⬆️ 95%	40%	Web server CPU bottleneck

Fix: Optimize application code, add CPU cores, or add web servers.

Correlation Pattern 2: Database Bottleneck¶

Response Time	Hits/Sec	Server CPU	Database CPU	Diagnosis
⬆️ Spiking	⬇️ Plateaus	40%	⬆️ 95%	Database CPU bottleneck

Fix: Optimize queries, add indexes, add database CPU capacity, or add read replicas.

Correlation Pattern 3: Network Bottleneck¶

Response Time	Bandwidth	Server CPU	Server Network	Diagnosis
⬆️ Spiking	⬆️ Maxed (1 Gbps)	50%	⬆️ 100%	Network bandwidth bottleneck

Fix: Upgrade NIC to 10 Gbps, add CDN for static assets, or optimize response sizes.

Correlation Pattern 4: Connection Pool Exhaustion¶

Response Time	Errors/Sec	Server CPU	DB Connections	Diagnosis
⬆️ Spiking	⬆️ 503 errors	40%	⬆️ 100/100 (maxed)	Connection pool exhausted

Fix: Increase database connection pool size (e.g., 100 → 500 connections).

Correlation Pattern 5: Memory Leak¶

Time	Response Time	Memory %	CPU %	Diagnosis
0-30 min	100ms	30% → 50%	60%	Normal
30-60 min	200ms	50% → 75%	60%	Memory growing (CPU constant)
60-90 min	1000ms	75% → 95%	60%	Memory leak
90 min	Crash (OOM)	100%	N/A	Out of memory

Fix: Profile application, find leak, fix code.

Ask the AI to Correlate Metrics

If you're struggling to identify the bottleneck:

Response times are 5000ms at 300 VUs. Server CPU is 40%, memory is 50%, but
database CPU is 95%. What's the bottleneck and how do I fix it?

The AI can:

Analyze combinations of metrics to pinpoint the exact bottleneck
Distinguish between application bottlenecks (code) vs. infrastructure bottlenecks (CPU/memory/network)
Recommend immediate fixes (increase connection pools, optimize queries)
Suggest long-term architectural improvements (caching, read replicas, CDN)
Validate your diagnosis before you make expensive infrastructure changes

Real-Time Degradation Detection¶

Detecting performance degradation during the test lets you intervene before wasting hours on a broken test.

Automated Warning Signs¶

Load Tester monitors for these conditions automatically:

Condition	Warning Level	What It Means
Engine CPU > 90%	⚠️ Warning	Engine overloaded, may self-regulate
Engine bandwidth > 80 Mbps	⚠️ Warning	Engine near bandwidth limit
Error rate > 10%	🚨 Critical	Application broken under load
Response time > 30 seconds	🚨 Critical	Server severely overloaded or timing out
VUs not ramping	⚠️ Warning	Engine self-regulation or capacity limit

When warnings appear, investigate immediately. Don't wait for the test to finish.

Manual Degradation Detection¶

Watch for these patterns during the test:

Pattern	What to Watch	Action
Response time doubles	100ms → 200ms	Note VU count, approaching capacity limit
Response time increases 10x	100ms → 1000ms+	Stop and investigate, something broke
Errors appear	0% → 5%+	Check Errors View for error types
Hits/sec plateaus	Increasing → flat	Server maxed out, note capacity limit
Memory keeps growing	30% → 50% → 70% → ...	Potential memory leak, watch closely

Ask the AI for Real-Time Alerts

Configure the AI to monitor your test in real time:

Monitor my load test and alert me if response times increase 5x or if error
rate exceeds 5%. I'm ramping from 100 to 1000 VUs over 60 minutes.

The AI can:

Watch metrics in real time and alert you to degradation patterns
Detect capacity limits as they're reached (response times spike at X VUs)
Identify correlation breakdowns (hits/sec plateaus while VUs keep increasing)
Recommend stopping the test early if conditions are critical (50% error rate)
Suggest immediate actions during live tests (add engines, adjust ramp rates)

Next Steps¶

After monitoring your load test:

Analyze results: See Analyzing Results
Interactive dashboard: See Embedded Analytics Dashboard
Identify bottlenecks: See Performance Analysis Workflow
Export reports: See Legacy Reports for archival/sharing

If you need to optimize:

Server monitoring: See Server Monitoring
Cloud engine optimization: See Cloud Load Testing
Test case troubleshooting: See Debugging Failed Replays