Skip to content

Datasets & Data-Driven Testing

A common problem when setting up a load testing configuration is figuring out how much test data you need. You know you need usernames and passwords, but how many? And why does using unique data for each user matter when you could just reuse the same username over and over?

The answer: real applications behave differently when multiple users share the same credentials simultaneously. Session conflicts, cache collisions, and server-side optimizations all produce unrealistic results when every virtual user logs in as the same person. Data-driven testing with datasets ensures you're measuring real-world performance, not artificial behavior caused by test artifacts.

This guide covers creating datasets, calculating how many rows you need, editing data efficiently, and using JavaScript data sources for dynamic generation.


Why Data-Driven Testing Matters

The Problem with Shared Data

When you record a test case, you authenticate once as yourself. When you run a load test with 1000 virtual users, you have three options:

Approach What Happens Result
All users share same credentials 1000 concurrent sessions with username "jsmith" Server may invalidate earlier sessions, rate-limit the account, or behave unpredictably. NOT realistic.
Users cycle through small dataset 1000 users share 10 credentials (100 users per credential) Better than option 1, but still artificial. Many applications don't handle 100 concurrent sessions for the same user gracefully.
Each user gets unique credentials 1000 users use 1000 different credentials Realistic simulation. Each virtual user behaves like an independent real user. Server behavior matches production.

Use enough unique test data that each virtual user has its own identity during the test.


What Datasets Provide

A dataset is a collection of tabular data (like a spreadsheet) that Load Tester uses to dynamically change the actions of a test case.

Datasets contain:

  • Fields (columns): The types of data you need (e.g., username, password, email, product_id)
  • Rows: The actual values for each field (e.g., user1, P@ssw0rd1, user1@example.com, 12345)

Example dataset (usernames and passwords):

username password
alice_smith AlicePass123
bob_jones BobSecret456
carol_white CarolKey789
dave_brown DaveAuth012
eve_green EveToken345

When you link this dataset to your test case's username/password fields, each virtual user gets a different row. User 1 logs in as alice_smith, User 2 as bob_jones, and so on.


Understanding Dataset Configuration

Before creating a dataset, you should understand three critical configuration options that determine how Load Tester uses your data.

Lifespan: How Long Does a Row Last?

Lifespan controls how long a virtual user sticks with the same row of data before fetching the next row.

Lifespan Behavior When to Use
Virtual User Virtual user uses the same row for all test case iterations User identity should persist across multiple test case iterations. Most common for username/password datasets.
Test Case Virtual user gets a new row for each test case iteration Each iteration should use different data (e.g., different product searches each iteration).
Web Page Virtual user gets a new row for each web page Multiple pages use dataset values, and each page should get fresh data.
URL Virtual user gets a new row for each transaction (URL) Every HTTP request should use different data (rare).
Single Use Every dataset value used gets a new row Extremely rare; usually overkill.

Most common settings:

  • User credentials (username/password): Virtual User lifespan (user keeps same identity for entire test)
  • Form field data (product search, customer info): Test Case lifespan (new data each iteration) or Web Page lifespan (new data for each form)

Reusable: What Happens When Rows Run Out?

Reusable determines whether Load Tester can start over at the beginning of the dataset when all rows have been used.

Setting Behavior Impact
Reusable = ON (default) When all rows are used, start over from row 1 Virtual users will reuse data. Common for load testing.
Reusable = OFF When all rows are used, load test fails with an error Ensures every virtual user gets unique data, but requires enough rows for the entire test.

When to enable Reusable:

  • You have limited test data (e.g., 100 credentials) but need to test with 1000 users
  • Data reuse is acceptable for your application (most applications)

When to disable Reusable:

  • Compliance requirement: Each virtual user MUST have unique data (e.g., financial regulations)
  • Data gets consumed: Application permanently modifies or deletes data during the test (e.g., draining an inventory)
  • You have more than enough rows for your test

Load Test Will Fail If Rows Run Out

If Reusable = OFF and your dataset doesn't have enough rows, the load test will terminate with an error when the last row is used. Always calculate row requirements carefully (see How Many Rows Do You Need? below).


Sharable: Can Multiple Users Use the Same Row?

Sharable determines whether multiple virtual users can simultaneously use the same row from a dataset.

Setting Behavior Impact
Sharable = OFF (default) Each row can only be used by one virtual user at a time Ensures unique data per user. Dataset must have at least as many rows as concurrent virtual users.
Sharable = ON Multiple virtual users can use the same row simultaneously Allows smaller datasets, but defeats the purpose of data-driven testing if many users share the same credentials.

When to enable Sharable:

  • Dataset contains non-identity data (e.g., product IDs, search terms) where multiple users searching for the same product is realistic
  • You're testing with more virtual users than available dataset rows and reuse is acceptable

When to disable Sharable (default):

  • Dataset contains user identity data (username/password, certificates)
  • You want to ensure each virtual user has unique data at any given moment

Sharable Requires Reusable

If a dataset is not reusable, it cannot be sharable. This makes sense: if you've disabled row reuse entirely, allowing simultaneous sharing would contradict that constraint.


Creating a Dataset

Load Tester provides two ways to create datasets: start with an empty dataset and fill it manually, or import existing data from an external file (CSV, text).

Method 1: Create Empty Dataset

When to use: Small datasets (<50 rows), or you'll use the Fill feature to generate data automatically.

Step 1: Open New Dataset dialog

  1. In the Navigator view, locate the Datasets folder in your repository
  2. Right-click on the Datasets folder (or any existing dataset)
  3. Select: New → Dataset

Step 2: Define dataset structure

  1. Enter dataset name: e.g., UserCredentials
  2. Click: Add button
  3. Enter field names (column names), one per line:
  4. username
  5. password
  6. (Press Enter after each field name)
  7. Click: OK

Load Tester creates the dataset with one row of sample data and opens the Dataset Editor.


Method 2: Import from External File

When to use: Large datasets (hundreds or thousands of rows), or data generated by external tools (database export, scripts).

Supported formats:

  • CSV (comma-separated values): Most common, works with Excel/Google Sheets
  • TSV (tab-separated values): Alternative to CSV
  • Custom delimiters: Any character can be used as field separator

Step 1: Prepare your data file

Create a CSV or text file with your data:

username,password,email
alice_smith,AlicePass123,alice@example.com
bob_jones,BobSecret456,bob@example.com
carol_white,CarolKey789,carol@example.com
dave_brown,DaveAuth012,dave@example.com
eve_green,EveToken345,eve@example.com

Step 2: Open Import Dataset dialog

  1. In the Navigator view, right-click on Datasets folder
  2. Select: Import

Step 3: Configure import settings

  1. Choose the file to import: Browse to your CSV/text file
  2. Import into: Select New dataset (or choose existing dataset to replace its data)
  3. Field separator: Choose comma for CSV files (or tab, space, semicolon, or custom)
  4. Trim whitespace: Leave enabled (default) to automatically remove leading/trailing spaces
  5. Use first row as field names: Enable if your file has a header row (recommended)
  6. Parse escaped characters: Enable if your data contains escaped newlines (\r\n) or special characters

Step 4: Preview and import

  • Preview section shows the first 10 rows with current settings
  • Verify the preview looks correct (columns align properly, field names are detected)
  • Click: OK to import

The dataset is created and the Dataset Editor opens with your imported data.

Preview Before Importing

The Preview section updates dynamically as you change import settings. Always verify the preview looks correct before clicking OK. If columns don't align properly, adjust the field separator or "Use first row" setting.


How Many Rows Do You Need?

Calculating the number of dataset rows you need comes down to three numbers:

  1. Test duration: How long the load test runs (e.g., 60 minutes)
  2. Test case duration: How long one iteration of the test case takes (e.g., 13 minutes)
  3. Concurrent users: Maximum number of virtual users during the test (e.g., 4)

Load Tester provides all three.


Step 1: Find Test Duration

Test duration is set by you in the Load Test configuration:

  1. Open Load Test Editor (double-click load test in Navigator)
  2. Look for: Duration field (usually in minutes)
  3. Note the value: e.g., 60 minutes

Step 2: Find Test Case Duration

Load Tester calculates test case duration automatically in the Test Case Editor:

  1. Open Test Case Editor (double-click test case in Navigator)
  2. Look at the bottom of the editor: Test case duration is displayed
  3. Note the value: e.g., 13 minutes

Estimate Conservatively

As a rule of thumb, round to the nearest minute and round DOWN if possible. This number is not always accurate, as it does not include:

  • Test case looping (Restart Options in Test Case Properties)
  • Page processors that refresh a page repeatedly while waiting for a result
  • Think time variations

In general, it's a good idea to lowball the test case duration estimate, as the test will only fail if you come up short. Having too many dataset rows is never a problem for the test.


Step 3: Find Maximum Concurrent Users

Maximum users is set in the Load Test configuration:

  1. Open Load Test Editor
  2. Look for: Maximum Users field (calculated from user ramp settings)
  3. Note the value: e.g., 4 users

Calculation #1: The Simplest Case (No Ramping)

If all users in a test started at the very beginning and continued to the very end, the calculation is simple:

Rows Needed = Users × (Test Duration ÷ Test Case Duration)

Example:

  • Users: 4
  • Test duration: 60 minutes
  • Test case duration: 13 minutes
4 × (60 ÷ 13) = 4 × 5 (rounded up from 4.6) = 20 rows

Visual representation:

User 1: |████████████████████████████████████████████████| (60 min)
User 2: |████████████████████████████████████████████████|
User 3: |████████████████████████████████████████████████|
User 4: |████████████████████████████████████████████████|
        0                                              60 min

All four users run the entire 60 minutes, each completing ~5 iterations (60 ÷ 13 = 4.6, rounded up).

Total rows needed: 4 users × 5 iterations = 20 rows


Calculation #2: The Normal Case (Ramping)

Most load tests ramp up during the course of the test instead of starting all users at the beginning. A typical load test looks like this:

User 1: |████████████████████████████████████████████████| (60 min)
User 2:          |███████████████████████████████████████| (45 min)
User 3:                   |███████████████████████████████| (30 min)
User 4:                            |██████████████████████| (15 min)
        0                                              60 min

Users start at different times (ramping), but all run until the test ends.

Estimating rows in this case is just like calculating the area of a triangle in high school geometry: divide the previous calculation by two.

Rows Needed = (Users × (Test Duration ÷ Test Case Duration)) ÷ 2

Example (same numbers as before):

(4 × (60 ÷ 13)) ÷ 2 = (4 × 5) ÷ 2 = 20 ÷ 2 = 10 rows

As long as the user ramp is even and terminates within one test case duration of the end of the test, this estimate will be accurate.


Calculation #3: Ramp + Hold (Complex Load Profile)

What about a test where you ramp up to a certain number of users, then hold at that level for an extended period?

In such a case, split the test into two parts for the purposes of the estimate:

  1. Ramping period: Use Calculation #2 (triangle area)
  2. Holding period: Use Calculation #1 (rectangle area)

Visual representation:

User 1: |████████████████████████████████████████████████| (60 min)
User 2:    |█████████████████████████████████████████████| (55 min)
User 3:        |█████████████████████████████████████████| (50 min)
User 4:            |█████████████████████████████████████| (45 min)
        0      10                                     60 min
              Ramp         Hold at 4 users

Ramp period (0-10 minutes): - Users ramp from 0 to 4 over 10 minutes - Use Calculation #2 (triangle): (4 × (10 ÷ 13)) ÷ 2 = (4 × 1) ÷ 2 = 2 rows

Hold period (10-60 minutes): - All 4 users run for 50 minutes (60 - 10 = 50) - Use Calculation #1 (rectangle): 4 × (50 ÷ 13) = 4 × 4 = 16 rows

Total: 2 + 16 = 18 rows

You can use this technique to estimate for any test, even one that ramps unevenly: simply divide up the test sections into ramping periods and load periods, estimate those, and then add them all together.


Padding: Protecting Against Load Engine Imbalance

Finally, add padding to protect against load engine imbalance.

When one load engine has significantly more virtual users than the others and the datasets are divided evenly among the engines, that engine risks running out of data before the others. The fix is simple: add at least 25% to your estimate.

Example:

  • Calculation #2 estimate: 10 rows
  • With 25% padding: 10 × 1.25 = 12.5 → round up to 13 rows

Padding Recommendation

We generally recommend adding at least 25% to your dataset row estimate. This provides a safety margin for:

  • Load engine imbalance (one engine gets more users than others)
  • Test case duration variations (some iterations take longer than average)
  • Ramp timing inaccuracies (users don't start exactly when you expect)

Having too many rows is never a problem. It's much better to have 50% extra than to have the load test fail halfway through because you ran out of data.


Editing Datasets

After creating a dataset (empty or imported), you can edit it using the Dataset Editor.

Adding and Removing Rows

To add a single row:

  1. Double-click the last cell in the last row
  2. Press Enter
  3. A new row is created with sample data
  4. Fill in values and press Enter to add another row, or Tab to finish editing

To add multiple rows at once:

  1. Click: Add button (toolbar icon: + with row indicator)
  2. Enter number of rows to add
  3. Click OK

To remove rows:

  1. Select rows to remove (click row number on left, or drag to select multiple)
  2. Click: Remove button (toolbar icon: with row indicator)
  3. Rows are deleted immediately

Easier Dataset Editing (v4.2+)

Before Load Tester 4.2, deleting rows required either:

  • Selecting each row individually and manually deleting it
  • Exporting the dataset to Excel, removing data, and re-importing

With Load Tester 4.2 and later, you can simply:

  • Highlight all the rows you want to remove
  • Click the "remove dataset row" icon

Adding rows is just as easy: simply click the "add dataset rows" icon.


Adding and Removing Fields (Columns)

To add a field:

  1. Click: Edit Fields... button
  2. In the Edit Dataset Fields dialog:
  3. Type new field name at the bottom of the list
  4. Press Enter to add it
  5. Click OK

The new field appears as the last column in the dataset with empty values.

To rename a field:

  1. Click: Edit Fields... button
  2. Select the field in the list
  3. Click: Rename button (or double-click field name)
  4. Enter new name
  5. Click OK

To remove a field:

  1. Click: Edit Fields... button
  2. Select the field in the list
  3. Click: Remove button
  4. Click OK

All values for that field are deleted from all rows.


Editing Cell Values

To edit a single cell:

  1. Double-click the cell
  2. Type the new value
  3. Press Enter to move to the next cell below, or Tab to move to the next cell right
  4. Press ESC to cancel changes

To copy/paste data:

  • Select cells (click and drag, or Shift+click)
  • Ctrl+C (Cmd+C on macOS) to copy
  • Ctrl+V (Cmd+V on macOS) to paste

Filling Fields with Generated Data

Instead of manually entering hundreds of rows of data, Load Tester can automatically generate random or sequential values to fill a field.

Step 1: Select the Field to Fill

  1. Click the column heading (field name) to select the entire field
  2. Click: Fill... button

The Fill Dataset Field dialog opens.


Step 2: Choose Generation Method

Load Tester provides three generation types:

Method Description Best For
Random Generate random alphabetic or numeric strings Usernames, passwords, random IDs, test data variation
Sequence Generate sequences of numeric strings User IDs, order numbers, sequential identifiers
List Select strings from pre-populated lists Common names, email domains, product categories

Step 3: Configure Generation Settings

For Random or Sequence:

  • Quantity: Number of values to generate (defaults to total rows in dataset)
  • Width: Length of each generated value (e.g., 8 characters for passwords)
  • Data Type: Alphabetic, Numeric, or Alphanumeric

For List:

  • Select list: Choose from pre-populated lists (e.g., common first names, last names, email domains)

Step 4: Preview and Apply

  1. Click: Generate Values button
  2. Preview appears on the right showing the first several generated values
  3. If satisfied, click OK to save values into the dataset
  4. If not, adjust settings and regenerate

Example: Generating 1000 random usernames:

  • Method: Random
  • Quantity: 1000
  • Width: 10
  • Data Type: Alphabetic

Result: jkdfhgkslp, mnbvcxzaqw, poiuytrewq, etc.


Advanced: JavaScript Data Sources

Sometimes you need to generate dynamic data during the load test that can't be pre-populated in a dataset: a unique UUID for each request, a fresh timestamp, a computed hash.

Load Tester supports JavaScript data sources that execute during the test to provide these dynamic values.

Use Case: Generating Dynamic UUIDs

Problem: Your application uses UUIDs in URL path elements:

http://mysite/path1/123e4567-e89b-12d3-a456-426614174000

Each request needs a unique UUID. You can't pre-generate UUIDs in a dataset because you don't know how many requests will be made during the test.

Solution: Use a JavaScript data source to generate UUIDs dynamically.


Step 1: Select the Field to Configure

  1. Click on the transaction with the UUID in the URL
  2. Open Fields View: Window → Show View → Fields View
  3. Switch to PATH view mode: Use the dropdown on the right to select PATH
  4. Locate the UUID path element in the Fields View

Step 2: Configure Script Datasource

  1. Double-click the path element with the UUID
  2. Field Assignment dialog opens
  3. Datasource: Select Script from the dropdown

Step 3: Write JavaScript Function

In the Script editor, enter:

function getValue(user_state) {
    return java.util.UUID.randomUUID();
}

How this works:

  • The return value from getValue(user_state) is substituted every time this URL is called during the load test
  • You can use any valid JavaScript
  • JavaScript can call Java functions using the java. prefix
  • In this case, we're calling java.util.UUID.randomUUID() to generate a UUID

Step 4: Test and Apply

  1. The JavaScript dialog dynamically executes the script as you type
  2. Results box shows the return value (you should see a UUID like a1b2c3d4-...)
  3. If the result looks correct, click OK

Now, during the load test, every time this URL is requested, a new UUID is generated dynamically.

JavaScript Interop with Java

This technique is surprisingly powerful and can help with all sorts of tricky situations in complex web applications. JavaScript's standard library is modest, but Java's is enormous, and you can call any Java function from JavaScript using the java. prefix.

Examples: - java.util.UUID.randomUUID() - Generate UUIDs - java.lang.System.currentTimeMillis() - Get current timestamp - java.lang.Math.random() - Generate random numbers


Using Datasets in Test Cases

After creating a dataset, you need to link it to fields in your test case so the values are actually used.

Automatic Linking (User Identity Wizard)

For username/password datasets, the User Identity Wizard automatically links the dataset to login fields:

  1. Right-click test case in Navigator → Properties
  2. Navigate to: User Identity tab
  3. Select: Use dataset for credentials
  4. Choose dataset: Select your dataset from the dropdown
  5. Map fields:
  6. Username field: Select dataset column containing usernames
  7. Password field: Select dataset column containing passwords
  8. Click OK

Manual Linking (Fields View)

For any field in your test case (not just username/password), you can manually link a dataset using the Fields View:

  1. Open test case in Test Case Editor
  2. Open Fields View: Window → Show View → Fields View
  3. Select the transaction containing the field you want to configure
  4. Locate the field in Fields View (e.g., search_term, product_id)
  5. Double-click the field to open Field Assignment dialog
  6. Datasource: Select Dataset from dropdown
  7. Choose dataset: Select the dataset from the list
  8. Choose field: Select the dataset column (field) to use
  9. Click OK

Now when the test case runs, Load Tester will substitute values from the dataset into that field.


Reloading Datasets from External Files

If you imported a dataset from an external file (CSV, text), you can easily re-import the data after modifying the original file. This is useful when:

  • You're generating test data from a database and need to refresh it
  • You're editing the data in Excel and want to reload changes into Load Tester

Automatic Reload

  1. Open Dataset Editor (double-click dataset in Navigator)
  2. Click: Reload button

Load Tester automatically re-imports the dataset using the same settings (file path, separator, etc.) that were used for the original import.

If the file location changed or the import settings need adjustment:

  1. Click: ... button (next to Reload)
  2. The Import Dataset dialog opens with the original settings pre-filled
  3. Adjust settings as needed (choose new file, change separator, etc.)
  4. Click OK to re-import

Troubleshooting Datasets

Load Test Fails: "Dataset rows exhausted"

Symptom: Load test terminates with an error message about running out of dataset rows.

Cause: Your dataset doesn't have enough rows for the load test, and Reusable is disabled.

Solution:

Option 1: Enable Reusable (recommended)

  1. Open Dataset Editor (double-click dataset in Navigator)
  2. Check: Reusable checkbox
  3. Save: Ctrl+S (Cmd+S on macOS)

Option 2: Add more rows to the dataset

  • Calculate rows needed using the formulas in How Many Rows Do You Need?
  • Add rows using the Dataset Editor or re-import a larger file

Load Test Fails: "Dataset rows conflict"

Symptom: Load test fails with errors about dataset row conflicts or concurrent access.

Cause: Your dataset has Sharable disabled, but you're trying to run more concurrent users than available rows.

Example: 100 concurrent users, but dataset only has 50 rows and Sharable = OFF.

Solution:

Option 1: Enable Sharable (if data reuse is acceptable)

  1. Open Dataset Editor
  2. Check: Sharable checkbox
  3. Save

Option 2: Add more rows to the dataset

  • Ensure dataset has at least as many rows as maximum concurrent virtual users
  • Use the Fill feature to quickly generate additional rows

Dataset Import Preview Looks Wrong

Symptom: When importing a dataset, the preview shows columns misaligned or data in wrong fields.

Likely causes:

  1. Wrong field separator: You selected "comma" but the file uses tabs (or vice versa)
  2. First row setting incorrect: You enabled "Use first row as field names" but the first row contains data (or vice versa)

Solution:

  • Try different field separator: Comma, Tab, Semicolon, Space
  • Toggle "Use first row as field names" and check the preview
  • Open the file in a text editor to verify what separator is actually used

JavaScript Data Source Not Working

Symptom: JavaScript datasource shows an error, or the generated value is blank/incorrect.

Likely causes:

  1. Syntax error in JavaScript: Missing semicolon, unclosed brace, etc.
  2. Wrong Java class path: java.util.UUID works, but UUID.randomUUID() doesn't (missing java. prefix)
  3. Function doesn't return a value: Missing return statement

Solution:

  • Check the Results box in the JavaScript editor; it shows errors and return values
  • Test incrementally: Start with a simple script like return "test"; and add complexity
  • Use Java interop correctly: Always use java. prefix for Java functions

Ask the AI to Help with Datasets

If you're struggling with dataset configuration:

I need to create a dataset with 500 unique usernames and passwords for
a load test with 100 concurrent users running for 30 minutes. My test
case takes about 5 minutes per iteration. How many rows do I need, and
what lifespan/reusable settings should I use?

The AI can:

  • Calculate exact row requirements based on your test parameters
  • Recommend lifespan and reusable/sharable settings
  • Guide you through dataset creation and configuration
  • Help troubleshoot dataset-related errors during load tests
  • Explain JavaScript data source syntax for dynamic generation

Best Practices

1. Use Datasets for User Identity

Why: Real applications behave differently when 1000 users all log in as the same person. Session conflicts, server-side caching, and rate limiting can all produce unrealistic results.

How: Create a dataset with enough usernames/passwords for all virtual users, with Reusable = ON (so users cycle through credentials if needed).


2. Add 25% Padding to Row Estimates

Why: Load engine imbalance, test case duration variations, and ramp timing inaccuracies can cause the test to use more rows than calculated.

How: Calculate rows needed using the formulas above, then multiply by 1.25 and round up.


3. Use Fill Feature for Large Datasets

Why: Manually entering 1000 rows of test data is tedious and error-prone.

How: Create a dataset with the right fields, then use Fill... to generate random or sequential values automatically.


4. Import Real Data When Possible

Why: Real customer data (anonymized/sanitized) produces more realistic load test results than randomly generated strings.

How: Export data from your production database (after removing PII), import into Load Tester as CSV. Use representative product IDs, search terms, and transaction patterns.


5. Test Dataset Configuration Before Load Testing

Why: Dataset configuration errors (wrong lifespan, too few rows, sharable issues) are easier to diagnose with a small replay than a full load test.

How:

  1. Create dataset and link to test case
  2. Run a small replay (2-5 virtual users)
  3. Verify each virtual user gets different data (check Fields View during replay)
  4. Only then: Scale up to full load test

Next Steps

After configuring datasets, you're ready to customize other aspects of your test case:

For using datasets during load tests:

For authentication datasets specifically: