Evaluation Quickstart

Overview

Evaluations are the core of Rubric. They automatically score your AI’s outputs against clinical criteria like triage accuracy, red flag detection, and guideline compliance.

Prerequisites

Before running an evaluation, you need:

A project with logged data
Or a dataset with sample cases

Available Evaluators

Triage Accuracy

Measures whether the AI assigned the correct urgency level. Penalizes under-triage more heavily than over-triage.

Red Flag Detection

Checks if the AI identified critical symptoms requiring immediate attention.

Guideline Compliance

Scores adherence to clinical protocols and decision trees.

Hallucination Detection

Identifies claims not supported by the patient’s input.

Run an Evaluation

Via Dashboard

Navigate to your project
Click Run Evaluation
Select evaluators and configure weights
Choose dataset or date range
Click Start

Via SDK

from rubric import Rubric

client = Rubric()

# Create and run an evaluation
evaluation = client.evaluations.create(
    name="Weekly Triage Review",
    project="proj_abc123",
    dataset="ds_xyz789",  # Or use filters instead
    evaluators=[
        {
            "type": "triage_accuracy",
            "config": {
                "severity_weights": {
                    "under_triage": 5.0,  # Penalize more heavily
                    "over_triage": 1.0
                }
            }
        },
        {
            "type": "red_flag_detection",
            "config": {
                "protocols": ["chest_pain", "headache", "stroke"]
            }
        }
    ]
)

print(f"Evaluation started: {evaluation.id}")

Monitor Progress

Evaluations run asynchronously. Check progress via:

# Poll for completion
evaluation = client.evaluations.get("eval_abc123")
print(f"Status: {evaluation.status}")
print(f"Progress: {evaluation.progress.completed}/{evaluation.progress.total}")

Or use webhooks for real-time updates:

# Configure webhook in dashboard or via API
client.webhooks.create(
    url="https://your-app.com/webhooks/rubric",
    events=["evaluation.completed", "evaluation.failed"]
)

View Results

Once complete, examine results in the dashboard or via API:

results = client.evaluations.get_results("eval_abc123")

print(f"Overall Score: {results.overall_score}%")

for evaluator in results.evaluators:
    print(f"{evaluator.name}: {evaluator.score}%")
    
# Get samples that failed specific criteria
failed_triage = client.evaluations.get_samples(
    "eval_abc123",
    evaluator="triage_accuracy",
    status="failed"
)

Interpret Scores

Score Range	Interpretation	Action
90-100%	Excellent	Monitor for regression
80-89%	Good	Review edge cases
70-79%	Needs improvement	Analyze failure patterns
< 70%	Critical	Immediate review required

Safety-critical evaluators like Red Flag Detection should maintain > 95% accuracy. Any missed red flag should trigger immediate review.

Compare Evaluations

Track progress over time by comparing evaluations:

comparison = client.evaluations.compare([
    "eval_week1",
    "eval_week2", 
    "eval_week3"
])

for metric in comparison.metrics:
    print(f"{metric.name}: {metric.trend}")  # "improving", "stable", "regressing"

Home

Getting Started

Core Concepts

Onboarding Guides

Evaluation Framework

Tutorials

Integrations

Security & Compliance

Voice AI

Medical Imaging

Clinical Notes

Workflows

Glossary & Appendix

Evaluation Quickstart

Overview

Prerequisites

Available Evaluators

Triage Accuracy

Red Flag Detection

Guideline Compliance

Hallucination Detection

Run an Evaluation

Via Dashboard

Via SDK

Monitor Progress

View Results

Interpret Scores

Compare Evaluations

Next Steps

Custom Evaluators

Clinician Review

Home

Getting Started

Core Concepts

Onboarding Guides

Evaluation Framework

Tutorials

Integrations

Security & Compliance

Voice AI

Medical Imaging

Clinical Notes

Workflows

Glossary & Appendix

​Overview

​Prerequisites

​Available Evaluators

Triage Accuracy

Red Flag Detection

Guideline Compliance

Hallucination Detection

​Run an Evaluation

​Via Dashboard

​Via SDK

​Monitor Progress

​View Results

​Interpret Scores

​Compare Evaluations

​Next Steps

Custom Evaluators

Clinician Review

Overview

Prerequisites

Available Evaluators

Run an Evaluation

Via Dashboard

Via SDK

Monitor Progress

View Results

Interpret Scores

Compare Evaluations

Next Steps