Evaluation States
Every evaluation moves through a defined lifecycle:State Descriptions
| State | Description | Duration |
|---|---|---|
pending | Evaluation created, waiting to start | Seconds |
running | Evaluators actively processing samples | Minutes to hours |
in_review | Automated scoring complete, awaiting human review | Hours to days |
completed | All scoring finished | Final |
failed | Error occurred during evaluation | Final |
cancelled | Manually stopped by user | Final |
Triggering Evaluations
Evaluations can be triggered in multiple ways:Manual (Dashboard/SDK)
CI/CD Integration
Automatically run evaluations on code changes:Scheduled Evaluations
Run evaluations on a recurring schedule:Webhook-Triggered
Trigger evaluations from external events:Progress Monitoring
Polling Status
Using Callbacks
Streaming Progress
Wait Helper
Progress States
During execution, individual samples have their own states:| State | Description |
|---|---|
queued | Waiting to be processed |
processing | Evaluator running |
scored | Automated scoring complete |
flagged | Needs human review |
reviewed | Human review complete |
failed | Error processing sample |
skipped | Excluded from evaluation |
Error Handling
Evaluation-Level Errors
Sample-Level Errors
Common Error Types
| Error Code | Description | Resolution |
|---|---|---|
evaluator_timeout | Evaluator took too long | Increase timeout or simplify evaluator |
invalid_sample | Sample data malformed | Check sample schema |
evaluator_error | Evaluator threw exception | Check evaluator logs |
quota_exceeded | Hit usage limits | Upgrade plan or wait |
rate_limited | Too many concurrent evals | Add retry logic |
Retry Failed Samples
Cancellation
Best Practices
Set Appropriate Timeouts
Set Appropriate Timeouts
Configure timeouts based on your evaluator complexity:
Use Idempotency Keys
Use Idempotency Keys
Prevent duplicate evaluations in CI/CD:
Handle Partial Results
Handle Partial Results
Always handle cases where evaluation partially completes:
Set Up Alerts
Set Up Alerts
Configure alerts for evaluation failures:
