Server Performance Checklist¶

Work through this checklist after every load test where you collected server metrics. Each category (CPU, Memory, Disk, Network) has recommended thresholds. When a metric exceeds its threshold, you've found something worth investigating.

How to use this checklist:

Open your completed load test result in the Embedded Analytics Dashboard (double-click the result in the Navigator)
Find the server metrics view in the Dashboard, where CPU / Memory / Disk / Network are plotted alongside response-time and throughput data from the test
Work through each category below, checking your metrics against the thresholds
When a metric exceeds the threshold, follow the guidance to investigate and resolve

The Dashboard is the right tool for post-test review because it correlates server metrics with the same test's response-time and throughput data, so you can see exactly when each server-side spike happened and what the users were experiencing at that moment. If you want to watch the same metrics in real time during a load test, the Servers View shows current values live.

For detailed metric definitions, see Server Metrics & Counters.

CPU Performance¶

Check these CPU metrics to identify processor bottlenecks:

☐ CPU % (Processor Time)¶

What to check: - Peak CPU % during the load test - Sustained CPU % at steady state

Thresholds:

CPU %	Status	Action
< 70%	Good	No action needed
70-85%	Warning	Monitor closely; capacity is adequate but limited headroom
85-95%	Critical	Processor is bottleneck; optimize code or add capacity
> 95%	Severe	Processor is saturated; expect significant response time degradation

What it means: - CPU % should scale proportionally with load (double the VUs → roughly double the CPU %) - If CPU hits 85%+ and response times spike, CPU is the bottleneck - If CPU is < 70% but response times are still slow, look elsewhere (memory, disk, database)

Next steps if high: - Identify inefficient code (profiling tools) - Optimize database queries (check query execution plans) - Add more CPU cores or scale horizontally - Consider caching to reduce computation

☐ Context Switches/sec¶

What to check: - Rate of thread context switches during the test

Thresholds: - Should scale proportionally with load - Greater-than-linear increase suggests inefficient threading or lock contention

What it means: - Context switches occur when the CPU switches between threads - Excessive context switching wastes CPU cycles - Often indicates thread pool size mismatch or lock contention

Next steps if high: - Review thread pool configuration (too many threads → excessive switching) - Check for lock contention in application code - Consider thread affinity or worker thread optimization

☐ Process Queue Length¶

What to check: - Number of threads waiting to be scheduled

Thresholds:

Queue Length (per processor)	Status	Action
< 2	Good	No action needed
2-10	Acceptable	Monitor; may indicate CPU pressure
> 10	Critical	CPU cannot keep up with demand

What it means: - Threads waiting for CPU time are queued - Sustained queue length > 10 per processor indicates CPU saturation

Next steps if high: - Same as high CPU %: optimize code, add capacity - Check if background processes are competing for CPU

Memory Performance¶

Check these memory metrics to identify RAM exhaustion or paging issues:

☐ % Memory (Memory Utilization)¶

What to check: - Peak memory % during the test - Memory growth over time (memory leak indicator)

Thresholds:

Memory %	Status	Action
< 80%	Good	Adequate memory headroom
80-90%	Warning	Monitor; risk of paging if usage grows
> 90%	Critical	Memory pressure; risk of swap/paging

What it means: - High memory % forces the OS to page memory to disk - Paging causes severe performance degradation because disk is roughly 1000x slower than RAM - Gradual memory growth over time indicates a memory leak

Next steps if high: - Check for memory leaks (heap dumps, profiling) - Increase physical RAM - Optimize memory usage (object pooling, caching strategies) - Review garbage collection settings (Java/.NET apps)

☐ Page Reads/sec and Page Writes/sec¶

What to check: - Rate of page faults (disk reads to resolve memory access) - Rate of page writes (memory flushed to disk)

Thresholds:

Paging Rate	Status	Action
< 10/sec	Good	Minimal paging
10-100/sec	Warning	Some paging; monitor for growth
> 100/sec	Critical	Excessive paging; severe performance impact

What it means: - Page reads occur when memory isn't in RAM and must be fetched from disk - Page writes occur when RAM is full and pages must be evicted to disk - Both cause massive slowdowns (millisecond memory access → seconds for disk)

Next steps if high: - Increase physical RAM immediately - Reduce memory consumption (optimize code, reduce cache sizes) - Check for memory leaks

☐ Cache Memory Allocation Ratio¶

What to check: - Percentage of RAM reserved for OS cache - Decreasing ratio indicates memory pressure

What it means: - OS reduces cache allocation when memory is needed elsewhere - Decreasing cache means less efficient file system access

Next steps if decreasing: - Same as high memory %: add RAM or optimize usage

Disk I/O Performance¶

Check these disk metrics to identify storage bottlenecks:

☐ % I/O Time Utilized¶

What to check: - Percentage of time disk was busy with I/O

Thresholds:

I/O Time %	Status	Action
< 80%	Good	Disk is keeping up
80-95%	Warning	Disk is under heavy load
> 95%	Critical	Disk is saturated; I/O bottleneck

What it means: - Disk at 95%+ cannot handle more I/O requests - I/O requests will queue, causing delays

Next steps if high: - Move to faster storage (SSD vs. spinning disk) - Reduce disk writes (optimize logging, caching) - Separate logs/temp files to different physical disks

☐ Queue Length¶

What to check: - Average number of I/O requests waiting for disk

Thresholds:

Queue Length	Status	Action
< 2	Good	No queuing
2-5	Acceptable	Disk under load but managing
> 5	Critical	Requests are queueing; disk bottleneck

What it means: - Requests waiting in queue are delayed - High queue length → slow disk response times

Next steps if high: - Same as high I/O %: faster storage, reduce I/O

☐ Reads/sec and Writes/sec¶

What to check: - Rate of disk read/write operations - Plateaus indicate disk capacity limit

What it means: - Disk I/O should scale with load - Plateau (rate stops increasing despite more load) indicates disk saturation

Next steps if plateauing: - Upgrade to faster storage (NVMe SSD) - Optimize database queries to reduce disk reads - Increase RAM to cache more data in memory

Network Performance¶

Check these network metrics to identify bandwidth or packet loss issues:

☐ Bytes Received/sec and Bytes Sent/sec¶

What to check: - Network throughput during the test - Scaling behavior as load increases

Thresholds: - Should scale proportionally with load - Less-than-linear increase indicates network capacity limit

What it means: - Bytes/sec measures actual data transfer rate - Plateaus indicate network saturation (hit bandwidth limit)

Next steps if saturated: - Upgrade network interface (1 GbE → 10 GbE) - Check for network congestion (switch, firewall bottlenecks) - Optimize data transfer (compression, reduce payload sizes)

☐ Packets Received Errors and Packets Sent Errors¶

What to check: - Number of packets with errors

Thresholds:

Error Rate	Status	Action
0	Good	No packet errors
> 0	Critical	Network degradation; investigate immediately

What it means: - Packet errors indicate serious network problems - Can be caused by bad cables, failing NICs, switch issues

Next steps if > 0: - Check network cables and connections - Replace faulty network hardware - Review switch/router logs for errors

☐ Collisions/sec (Ethernet)¶

What to check: - Rate of packet collisions on Ethernet

Thresholds:

Collision Rate	Status	Action
< 5% of Packets Sent/sec	Good	Normal collision rate
> 5% of Packets Sent/sec	Critical	Network problem or capacity limit

What it means: - Collisions occur when two devices transmit simultaneously - Excessive collisions indicate network congestion or misconfiguration

Next steps if high: - Check for network congestion - Upgrade to full-duplex Ethernet (eliminates collisions) - Review network topology for bottlenecks

☐ Connections Established¶

What to check: - Number of active TCP connections - Should scale proportionally with VU count

What it means: - Each virtual user typically requires 1-6 TCP connections (HTTP/1.1 keep-alive) - Plateaus indicate connection limit reached

Next steps if plateauing: - Increase TCP connection limits (OS tuning) - Check application server connection pool settings - Review TIME_WAIT socket exhaustion

☐ Connection Failures¶

What to check: - Number of failed TCP connection attempts

Thresholds:

Failures	Status	Action
0	Good	No connection failures
> 0	Critical	Investigate cause immediately

What it means: - Connection failures indicate server refusing connections - Often caused by listen queue exhaustion or firewall rules

Next steps if > 0: - Increase listen queue backlog - Check firewall rules - Review server logs for refused connections

Systematic Diagnosis Process¶

If you're seeing performance problems, work through this process:

1. Response Times Slow?¶

Yes → Continue to #2
No → No server bottleneck; check client-side (network latency, think time)

2. CPU > 85%?¶

Yes → CPU bottleneck. Optimize code, add capacity, or scale horizontally.
No → Continue to #3

3. Memory > 90% or Paging > 100/sec?¶

Yes → Memory bottleneck. Add RAM, fix memory leaks, optimize usage.
No → Continue to #4

4. Disk I/O > 95% or Queue Length > 5?¶

Yes → Disk bottleneck. Upgrade to SSD, optimize queries, cache more in RAM.
No → Continue to #5

5. Network errors > 0 or Collisions > 5%?¶

Yes → Network problem. Fix hardware, check cables, upgrade network.
No → Continue to #6

6. All server metrics look good, but response times are still slow?¶

Likely causes:
- Database locks (check for blocked queries)
- External service dependency (API calls, third-party services)
- Misconfigured load balancer (uneven distribution)
- Application-level bottleneck (thread pool exhaustion, lock contention)
Next steps:
- Review application logs for errors
- Check database query performance
- Trace external service calls
- Profile the application under load

Server Metrics & Counters - Detailed definitions of all metrics
Server Monitoring Introduction - Why server monitoring matters
Basic Server Monitoring - How to set up server monitoring
Server Monitoring Agent - Installing and configuring the agent

Run through this checklist after every load test. The bottleneck is usually in the last place you'd think to look, which is exactly why a systematic approach matters.

Server Performance Checklist¶

CPU Performance¶

☐ CPU % (Processor Time)¶

☐ Context Switches/sec¶

☐ Process Queue Length¶

Memory Performance¶

☐ % Memory (Memory Utilization)¶

☐ Page Reads/sec and Page Writes/sec¶

☐ Cache Memory Allocation Ratio¶

Disk I/O Performance¶

☐ % I/O Time Utilized¶

☐ Queue Length¶

☐ Reads/sec and Writes/sec¶

Network Performance¶

☐ Bytes Received/sec and Bytes Sent/sec¶

☐ Packets Received Errors and Packets Sent Errors¶

☐ Collisions/sec (Ethernet)¶

☐ Connections Established¶

☐ Connection Failures¶

Systematic Diagnosis Process¶

1. Response Times Slow?¶

2. CPU > 85%?¶

3. Memory > 90% or Paging > 100/sec?¶

4. Disk I/O > 95% or Queue Length > 5?¶

5. Network errors > 0 or Collisions > 5%?¶

6. All server metrics look good, but response times are still slow?¶

Related Topics¶