About
Simulation lets you compare two workflow configurations — a selected workflow and a counter workflow — by running them in parallel across multiple automated iterations. Each simulation batch produces individual run results for both workflows, which are then evaluated and scored so you can objectively measure the impact of changes before activating a new version for live traffic.Simulation is designed for comparing workflow versions or configurations head-to-head. Each simulation run executes both workflows independently and captures their outputs for side-by-side evaluation.
How Simulation Works
Two Workflows in Parallel
Each simulation run executes two workflows simultaneously — the selected workflow (your candidate) and the counter workflow (your baseline). Both receive the same inputs.
Multiple Iterations
You configure how many simulation iterations to run. The platform spawns that many parallel run pairs, giving you a statistically meaningful sample of results.
Automated Evaluation
Each individual run can be evaluated automatically using a configured evaluation workflow, which scores the output against defined criteria and contributes to the overall simulation summary.
Comparison Summary
After all iterations complete, the simulation produces a summary — total runs, success and failure counts, average scores, and score distribution — for both the selected and counter workflows side by side.
Setting Up a Simulation
Open the Simulations Panel
Navigate to your workflow in the dashboard and click the Simulations tab. Click Create Simulation to begin.

Select the Two Workflows
Choose the selected workflow (the version or configuration you want to test) and the counter workflow (the baseline you are comparing against). These can be different versions of the same workflow or two entirely different workflows.
Configure the Number of Runs
Set how many simulation iterations to execute. Each iteration creates one run pair — one run for the selected workflow and one for the counter. More iterations give you more reliable comparison data.
Simulation Run States
Batch Status
Pending
Pending
The simulation batch has been created and the platform is beginning to spawn individual run pairs.
In Progress
In Progress
Run pairs are actively executing. You can monitor progress as individual runs complete.
Completed
Completed
All configured iterations have finished. The evaluation summary is available for review.
Cancelled
Cancelled
The simulation batch was stopped before all iterations completed. Runs already in progress finish, but no new pairs are spawned.
Failed
Failed
An error prevented the batch from completing. Check individual run details for diagnostics.
Reviewing Simulation Results
Once a simulation batch completes, navigate to the batch to view its results.Evaluation Summary
Evaluation Summary
The summary compares both workflows across all iterations:
- Total runs — number of iterations executed
- Success / Failure counts — how many runs completed vs. failed for each workflow
- Average evaluation score — mean score across all evaluated runs
- Score distribution — breakdown of scores across defined ranges
Individual Run Details
Individual Run Details
Drill into any individual iteration to see the full run details for both the selected and counter workflow — every node that fired, the outputs produced, variables collected, and the evaluation result for that specific run.
Comparing Batches
Comparing Batches
You can run multiple simulation batches against the same simulation configuration. Compare batch summaries over time to track how your workflows evolve across iterations of improvement.
Stopping a Simulation
If you need to stop a simulation batch before it completes:- Click Stop on the running batch
- The batch status changes to Cancelled
- Run pairs already in progress finish naturally — no new pairs are spawned after the stop
Best Practices
Use a stable counter workflow
Use a stable counter workflow
The counter workflow is your baseline. Keep it pinned to a known-good version so that differences in evaluation scores are attributable to your selected workflow changes, not baseline drift.
Run enough iterations
Run enough iterations
A small number of runs may not produce reliable comparison data. Run at least enough iterations to cover the expected variability in inputs and LLM outputs.
Review failures before activating
Review failures before activating
If the selected workflow has a higher failure rate than the counter workflow, investigate the individual failed runs before activating it for live traffic.
Next Steps
Workflow Versioning
Create and activate versions once simulation confirms your changes perform better
Building a Workflow
Go back to the workflow builder to refine nodes and edges
Node Types
Reference for all node types to help diagnose simulation run failures
Workflow Webhooks
Subscribe to run completion events for automated post-simulation reporting
