Skip to content

Server Monitoring Introduction

Server monitoring is the bridge between seeing performance problems and understanding what causes them. A load test measures response times from the user's perspective. When those times degrade, you have to look at the server side to know why.

Why Monitor Your Servers?

The load test tells you what's happening on the client side. Server metrics tell you why it's happening on the backend.

Real example. A high-end retailer ran a load test that showed excellent average response times and no visible degradation even at 2,250 concurrent users. But scatter plots revealed a different story: certain pages (checkout, shopping cart) had response times spiking into the 5-15 second range, with occasional 35-second blocks where multiple requests stacked up vertically. That's the telltale sign of database locks or backend blocking.

The servers were only at 40% CPU when these blocks occurred. Without server monitoring, the team would have assumed the problem was capacity and added more servers. That wouldn't have fixed the real issue, which was locked database tables.

Server monitoring lets you correlate client-side symptoms with server-side causes:

Client-Side Symptom What Server Metrics Reveal
Response times spike at 500 VUs CPU hits 95%; need more capacity or optimization
Response times spike but CPU is low Database locks, network issues, or misconfigured load balancer
Gradual response time increase Memory exhaustion, garbage collection pauses
Intermittent timeouts One server in cluster crashed, or dynamic scaling too slow

Without server metrics, you're guessing. With them, you know exactly where to look.

Why This Matters for AI Analysis

The AI assistant's analytical ceiling is set by the data it can see. Without server metrics, the AI can describe what happened to the user but can only speculate about why. "Response times spiked at 500 VUs" is as far as it gets. With server metrics in the same dataset, the AI can name the cause: "Response times spiked at 500 VUs because CPU saturated on web-server-3 at 14:23:15, two seconds before the slowdown surfaced in client-side response times."

This is the biggest argument for using the WPI Server Monitoring Agent or the CloudWatch integration rather than relying solely on a third-party APM tool. Both feed metrics directly into the load-test result, where the AI can correlate them automatically against response times, error rates, and throughput. The depth of the resulting analysis is qualitatively different from what's possible when the AI has to work from client-side symptoms alone.

Choosing Your Monitoring Approach

There are three viable paths. The choice depends on whether you already have an enterprise APM tool, whether you want AI-powered root-cause analysis, and how much depth you need beyond CPU and memory.

Install the WPI Server Monitoring Agent on each target server, or wire up CloudWatch integration for AWS-hosted servers. Both feed metrics directly into the load-test result, which is what makes deep AI-driven root-cause analysis possible. The AI has both halves of the picture in one place: what the users experienced, and what the server was doing at the same instant. It can correlate the two automatically and tell you not just what slowed down but why.

What you get: CPU, memory, disk I/O, network throughput (plus optional JVM metrics via JMX). Integrated dashboards. Automatic correlation between response-time spikes and server-resource pressure. AI-powered bottleneck identification with the server data as evidence, not inference.

What you don't get: Performance profiling (method-level execution times, call stacks, query plans). Application-specific metrics beyond what JMX exposes.

This is the right default for most teams that don't already have a heavyweight APM solution. It's lightweight, secure, and Basic mode is available at every Load Tester license tier.

See Basic Server Monitoring for setup, Server Monitoring Agent for installation details, and CloudWatch Monitoring for the AWS path.

Path 2: Enterprise APM (Datadog / New Relic / DynaTrace / etc.)

If you already have Datadog, New Relic, DynaTrace, or another enterprise APM tool, use it. It will give you far more comprehensive metrics and performance profiling than Load Tester's built-in monitoring can: method-level execution times, call stacks, query plans, every metric your APM vendor knows how to collect.

The tradeoff is the one you'd expect: those metrics live in the APM tool's dashboard, not in the load-test result, which means Load Tester's AI assistant can't see them. The AI ends up doing client-side-only analysis ("response times spiked at 500 VUs") and you do the correlation against the APM data manually, lining up timelines and matching spikes. The AI's value-add drops substantially when half the relevant data is in another system. If AI-driven root-cause analysis matters to you, run Path 1 alongside Path 2 (see Path 3).

DynaTrace integration (optional purchase) bridges part of the gap. Enable it via right-click test case → Properties → DynaTrace tab. This tags load-test traffic with DynaTrace user-tracking metadata so individual virtual users can be identified in DynaTrace dashboards. Useful for tying a specific virtual user's failed transactions back to specific DynaTrace traces. The metrics themselves still stay in DynaTrace though, so AI-driven analysis on the load-test side is still working without them.

Path 3: Both

Use WPI Server Monitoring (or CloudWatch) for AI-driven correlation and high-level resource visibility, and use enterprise APM for deep profiling when you need it. Native CloudWatch integration makes this combination practical on AWS: configure CloudWatch once and WPI AI analyzes those metrics alongside the load test, while your APM tool keeps doing what it's already doing.

Windows Direct Monitoring (local networks only)

There's a fourth path worth mentioning briefly. Direct Windows monitoring uses Windows performance APIs directly, with no agent installed on the target server. The catch: it requires Windows authentication to be pre-established (something like browsing to \\servername\C$ and entering credentials), and that's almost never feasible in production environments. Useful for development and test environments on your local network where you have admin access; not a production monitoring path. See Basic Server Monitoring for setup.

How Server Monitoring Works During a Load Test

When you configure server monitoring and start a load test, Load Tester:

  1. Connects to each monitored server (via agent, direct API, or HTTP script)
  2. Samples metrics at regular intervals (typically every 5-10 seconds)
  3. Correlates metrics with client-side response times in real time
  4. Displays metrics in the Servers View alongside VU count, hits/sec, and response times
  5. Records all metrics for post-test analysis

After the test, you can overlay server metrics on response-time graphs to see exactly when server resources became constrained and how that impacted user experience.

Firewall Considerations

Basic monitoring can work entirely through manual file collection, with no firewall changes needed: run the agent, collect the stats file after the test, load it into Load Tester.

Live-connection monitoring works best when ports are open. The agent uses:

  • Port 1099: Main agent communication
  • Port 1100: Auto-detection (multicast)

Each server needs its own port pair if you're monitoring multiple servers from one controller. That can get awkward in locked-down environments, and most customers in that situation default to manual stats-file collection instead. See Monitoring Through Firewalls for the configuration choices.

Next Steps

If you have an enterprise APM tool, use it. Run the load test in Load Tester, monitor the backend in your APM dashboard, and correlate timestamps manually. CloudWatch is the one exception: it integrates natively with Load Tester's AI analysis, so use CloudWatch Monitoring if your servers are on AWS.

If you don't have an enterprise APM tool, start with the Server Monitoring Agent in Basic mode. CPU and memory will tell you if the bottleneck is resource capacity. When Basic shows low utilization but response times still degrade, upgrade to Advanced mode for disk I/O, network, and JMX visibility, then use the Server Performance Checklist to systematically work through the metrics.

Server monitoring turns load testing from "we see a problem" into "we know exactly what's causing it."