Cloud & Engine Issues¶

Problems with cloud load engines? This page covers AWS EC2 engine issues, connectivity problems, and cloud configuration. Find the symptom that matches what you're seeing.

AWS Configuration & Credentials¶

"AWS credentials invalid"¶

Symptom: Configuring cloud engines fails with credentials error.

Most common causes: 1. Access Key ID or Secret Access Key incorrect 2. AWS credentials expired 3. IAM user lacks required permissions

Fix: 1. Verify credentials - AWS IAM Console → Users → Security Credentials 2. Check IAM permissions - User must have EC2 permissions (launch instances, describe, terminate) 3. Regenerate keys - Create new Access Key if old one expired 4. Test credentials - Try launching an EC2 instance manually in AWS console

Required IAM permissions:

{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Allow",
    "Action": [
      "ec2:RunInstances",
      "ec2:DescribeInstances",
      "ec2:TerminateInstances",
      "ec2:DescribeImages",
      "ec2:DescribeSecurityGroups",
      "ec2:DescribeSubnets",
      "ec2:DescribeVpcs",
      "ec2:CreateTags"
    ],
    "Resource": "*"
  }]
}

See: Cloud Load Testing for AWS setup.

"Insufficient capacity in availability zone"¶

Symptom: Starting cloud engines fails with AWS capacity error.

What this means: AWS temporarily out of capacity for that instance type in that availability zone (AZ).

Fix: 1. Try different instance type - Switch from c5.xlarge to c5.2xlarge or c6i.xlarge 2. Try different availability zone - Choose different AZ in same region 3. Try different region - us-east-1 → us-west-2 4. Wait and retry - Capacity issues are often temporary (minutes to hours)

Why this happens: AWS capacity is finite per availability zone. Popular instance types (c5.xlarge) in popular AZs (us-east-1a) can run out during peak usage. The cloud is somebody else's computers, and sometimes those computers are busy.

"Request limit exceeded"¶

Symptom: Starting many engines at once fails with AWS API rate limit error.

What this means: AWS API rate limits exceeded (too many requests too quickly).

Fix: 1. Start engines in smaller batches - Launch 5-10 at a time instead of 50 2. Wait between batches - 30 second delay between launches 3. Request limit increase - AWS Support → request higher EC2 API limits 4. Use CloudFormation - Launch engines via CloudFormation template (better rate limit handling)

Why this happens: AWS API has rate limits (e.g., 20 RunInstances calls per second). Launching 50 engines simultaneously exceeds this.

"Invalid AMI ID"¶

Symptom: Starting engines fails with "AMI ID not found" or similar error.

Most common causes: 1. AMI ID is region-specific (copied from different region) 2. AMI no longer exists (deprecated) 3. AMI not shared with your AWS account

Fix: 1. Check region - Verify AMI exists in the region you're launching in 2. Get correct AMI ID - Contact WPI support for current AMI ID for your region 3. Verify AMI permissions - Ensure WPI has shared the AMI with your AWS account

See: Cloud Load Testing for AMI configuration.

Engine Connectivity & Communication¶

"Engine not responding"¶

Symptom: Engine shows as "Not Responding" in Engines View.

Most common causes: 1. Engine crashed or Java process died 2. Network connectivity lost 3. Engine overloaded (too many VUs) 4. Firewall blocking ports 1099/1100

Fix: 1. Check engine status - AWS console → verify EC2 instance is running 2. SSH to engine - Check if OS is responsive: ssh -i key.pem ubuntu@<engine-ip> 3. Check Java process - ps aux | grep java - is engine process running? 4. Check firewall - Security groups must allow inbound TCP 1099/1100 from your IP 5. Restart engine - Terminate and launch new instance if crashed

Diagnostics via SSH:

# Check if engine process is running
ps aux | grep java

# Check engine logs
tail -f /var/log/wpi-engine.log

# Check system resources
top
df -h
free -m

"Engines auto-detect on local network but not in cloud"¶

Symptom: Engines auto-detect when running locally, but not when running in AWS.

Why this happens: Auto-detection uses multicast, which doesn't work across the internet or through a VPN.

Fix: Manual configuration required for cloud engines: 1. Don't rely on auto-detection for cloud engines 2. Manually add each engine - Engines View → Add → enter IP address 3. Use Elastic IPs - Assign Elastic IPs to engines so IP doesn't change on restart

See: Cloud Load Testing for manual engine configuration.

"Can SSH to engine but Load Tester can't connect"¶

Symptom: SSH works but Load Tester shows "connection refused" on ports 1099/1100.

Most common causes: 1. Security group allows SSH (22) but not RMI ports (1099/1100) 2. Engine process not running 3. Engine firewall (iptables) blocking ports

Fix: 1. Check security group - Must allow inbound TCP 1099 and 1100 from your IP 2. Verify engine process running - SSH to engine, ps aux | grep java 3. Check port binding - netstat -an | grep 1099 - should show LISTEN 4. Test connectivity - From workstation: telnet <engine-ip> 1099

Security group rules needed:

Type: Custom TCP
Port: 1099
Source: <your-workstation-ip>/32

Type: Custom TCP
Port: 1100
Source: <your-workstation-ip>/32

Type: SSH
Port: 22
Source: <your-workstation-ip>/32 (for diagnostics)

"Engines in multiple regions can't communicate"¶

Symptom: Load test fails when using engines in different AWS regions.

Why this happens: Engines need to communicate with each other for coordinated load generation. Cross-region communication has higher latency and may hit firewall issues.

Fix: 1. Use engines in same region - Best practice for coordinated load tests 2. If multi-region required - Ensure security groups allow cross-region communication 3. Use VPC peering - Connect VPCs across regions for better performance

Recommendation: Use multi-region engines only for testing geographically distributed applications. For most load tests, keep all engines in a single region.

Engine Performance & Capacity¶

"Engines running out of resources (CPU, memory, network)"¶

Symptom: Engine CPU at 100%, or memory exhausted, or network saturated.

Most common causes: 1. Instance type too small for VU count 2. Test case too complex (large responses, many transactions) 3. Too many VUs per engine

Fix: 1. Use larger instance types - Switch from t3.medium → c5.xlarge → c5.2xlarge 2. Reduce VUs per engine - Distribute load across more engines 3. Optimize test case - Remove unnecessary transactions, reduce response size 4. Monitor engine resources - SSH to engine, run top and iftop during load test

Instance type guidelines: - t3.medium: 50-100 VUs (simple test cases only) - c5.large: 200-400 VUs - c5.xlarge: 500-800 VUs - c5.2xlarge: 1000-1500 VUs

(Actual capacity depends on test case complexity)

"Engines hitting network bandwidth limits"¶

Symptom: Load test plateaus, throughput stops increasing despite adding VUs.

Most common causes: 1. Instance network bandwidth limit reached 2. Test case involves large file downloads/uploads 3. Too many concurrent connections

Fix: 1. Use instance types with higher network performance - c5n.xlarge (25 Gbps) vs. c5.xlarge (10 Gbps) 2. Reduce payload sizes - Filter out large images/videos if not needed for test 3. Distribute across more engines - Spread network load

Network performance by instance type: - c5.xlarge: Up to 10 Gbps - c5.2xlarge: Up to 10 Gbps - c5n.xlarge: Up to 25 Gbps - c5n.2xlarge: Up to 25 Gbps

Engine Lifecycle Issues¶

"Engines won't terminate after load test"¶

Symptom: Load test finishes but EC2 instances remain running.

Most common causes: 1. Auto-terminate disabled 2. Load Tester lost connection during termination 3. AWS API error during termination

Fix: 1. Manually terminate - AWS Console → EC2 → Instances → Terminate 2. Enable auto-terminate - Engines View → Properties → check "Automatically terminate after test" 3. Check AWS permissions - IAM user must have ec2:TerminateInstances permission

"Engines launch slowly or fail to become ready"¶

Symptom: Engines take >5 minutes to launch, or never reach "Ready" status.

Most common causes: 1. AMI has slow startup scripts 2. Instance type has slow boot time (t3.micro) 3. Network connectivity issue delaying engine registration 4. Engine process failing to start

Fix: 1. Wait longer - Engines can take 3-5 minutes to fully start 2. Use faster instance types - c5.xlarge boots faster than t3.medium 3. Check engine logs - SSH to engine, check /var/log/wpi-engine.log 4. Verify AMI is current - Old AMIs may have bugs or slow scripts

"Engines intermittently disconnect mid-test"¶

Symptom: Engines drop connection during load test, then reconnect.

Most common causes: 1. Network instability between workstation and AWS 2. Engine overloaded (CPU/memory exhaustion) 3. AWS network maintenance 4. VPN connection unstable

Fix: 1. Check network stability - Ping engine continuously during test 2. Monitor engine resources - SSH to engine, check CPU/memory with top 3. Use Elastic IPs - Prevent IP changes on disconnect/reconnect 4. Increase timeout values - Tools → Preferences → Engines → Communication Timeout

Cost Management¶

"Unexpected AWS charges"¶

Symptom: AWS bill higher than expected for load testing.

Most common causes: 1. Engines left running after test (forgot to terminate) 2. Data transfer charges (cross-region, internet egress) 3. Using expensive instance types unnecessarily 4. Many short-duration tests (EC2 bills per-hour for on-demand instances)

Prevention: 1. Enable auto-terminate - Engines View → Properties → check auto-terminate 2. Use same region as target - Avoid cross-region data transfer charges 3. Right-size instance types - Don't use c5.4xlarge when c5.xlarge suffices 4. Use Spot Instances - 70% cheaper than on-demand (but can be interrupted) 5. Monitor running instances - AWS Console → EC2 → check for orphaned instances

Cost optimization: - Spot Instances: ~70% cheaper than on-demand - Reserved Instances: 30-50% cheaper for frequent testing - Same-region testing: Avoid cross-region data transfer ($0.02/GB) - Auto-terminate: Prevent engines running when idle

Error Message Lookup¶

Looking for a specific error message? Use the searchable error catalog:

See: Common Error Messages - Use Ctrl+F to search for the exact error text you're seeing.

Still Stuck?¶

If none of these solutions work:

Check load testing issues: Load Testing Issues for test execution problems
Verify AWS setup: Cloud Load Testing Guide has complete AWS configuration instructions
Contact Support: See Getting Support for how to gather diagnostic logs (include AWS region, instance types, error messages)

Most cloud engine issues come down to firewall or security group configuration (ports 1099 and 1100 must be open) or AWS capacity and permission problems.