Overview
Rubric’s data model is built around six core objects that work together to enable healthcare AI evaluation:Datasets
Collections of samples for evaluation
Tasks
Individual items needing human review
Models
AI models being evaluated
Evaluations
Automated scoring runs
Reviewers
Clinicians who review AI outputs
Scores
Evaluation results and metrics
Datasets
Datasets are collections of samples that share a common purpose — typically test sets for evaluation.Schema
Usage
Samples
Samples are the individual data points within a dataset:Tasks
Tasks represent individual items requiring human review. They’re created automatically when AI outputs need clinical oversight.Schema
Task Status Flow
| Status | Description |
|---|---|
pending | Awaiting assignment |
assigned | Assigned to a reviewer |
in_progress | Reviewer is actively working |
completed | Review submitted |
skipped | Reviewer skipped (reassigned) |
Usage
Models
Models represent the AI systems being evaluated. Track different versions and configurations.Schema
Usage
Evaluations
Evaluations are automated scoring runs that assess AI performance against a dataset.Schema
Evaluation Status Flow
Usage
Reviewers
Reviewers are clinicians who provide human oversight on AI outputs.Schema
Credential Types
| Type | Description | Can Review |
|---|---|---|
| MD | Doctor of Medicine | All clinical decisions |
| DO | Doctor of Osteopathy | All clinical decisions |
| NP | Nurse Practitioner | Triage, symptom assessment |
| PA | Physician Assistant | Triage, symptom assessment |
| RN | Registered Nurse | Protocol compliance, documentation |
| LPN | Licensed Practical Nurse | Basic documentation review |
Usage
Scores
Scores are the evaluation results, both from automated evaluators and human reviewers.Schema
Score Sources
| Source | Description |
|---|---|
evaluator | Automated evaluation score |
reviewer | Human reviewer score |
consensus | Aggregated from multiple reviewers |
