Data Modalities
Rubric supports evaluation across all major healthcare AI data types:Voice & Audio
Patient calls, triage conversations, voice assistants
Clinical Notes
SOAP notes, discharge summaries, visit documentation
Medical Imaging
DICOM studies, X-rays, CT, MRI, pathology slides
Voice & Audio
| Feature | Description |
|---|---|
| Audio file support | WAV, MP3, M4A, FLAC up to 2 hours |
| Transcript formats | JSON with speaker labels and timestamps |
| Real-time streaming | WebSocket API for live call evaluation |
| Multi-speaker | Automatic speaker diarization |
Clinical Notes
| Feature | Description |
|---|---|
| Document types | SOAP, H&P, Progress notes, Discharge summaries |
| Structured extraction | ICD-10, CPT, SNOMED CT code validation |
| Section parsing | Automatic section identification |
| Template support | Custom documentation templates |
Medical Imaging (DICOM)
| Feature | Description |
|---|---|
| Modalities | CR, CT, MR, US, PT, MG, DX, and more |
| PACS integration | DICOMweb (WADO-RS, STOW-RS) |
| Coordinate systems | Pixel, anatomical, and normalized coordinates |
| Series handling | Multi-frame and multi-series support |
Evaluation Framework
The core of Rubric — automated clinical evaluation powered by healthcare-specific evaluators.Evaluation Types
Model Output Accuracy
Model Output Accuracy
Validates that AI model outputs are correct and match expected results.Use cases: Classification accuracy, entity extraction, structured output validation
Clinical Safety
Clinical Safety
Evaluates whether AI outputs meet clinical safety standards and don’t cause patient harm.Checks: Red flag detection, contraindication identification, escalation appropriateness
Hallucination Detection
Hallucination Detection
Identifies when AI generates information not grounded in the source data.Methods: Citation verification, fact checking, source attribution analysis
Completeness & Coverage
Completeness & Coverage
Measures whether AI captured all relevant information from the input.Metrics: Recall, coverage score, missing element identification
Metrics
| Metric | Description |
|---|---|
| Clinical Accuracy | Validates medical information correctness against clinical guidelines |
| Sensitivity / Specificity | Measures true positive and true negative rates for clinical decisions |
| Rubric-Based Scoring | Multi-dimensional scoring using customizable clinical rubrics |
| Custom Metrics | Define your own metrics for specialized evaluation needs |
Human Review Design
Configure how clinical experts review AI outputs:- Review templates: Pre-built forms for common clinical review tasks
- Grading rubrics: Multi-criteria scoring with weighted dimensions
- Annotation tools: Highlight, comment, and label AI outputs
- Side-by-side comparison: View AI output alongside source data
Consensus & Disagreement Handling
When multiple reviewers evaluate the same output:| Feature | Description |
|---|---|
| Multi-reviewer assignment | Route samples to 2+ reviewers for consensus |
| Adjudication workflows | Escalate disagreements to senior reviewers |
| Inter-rater reliability | Calculate Cohen’s kappa and agreement metrics |
| Tie-breaking rules | Configurable resolution for split decisions |
Evaluation Versioning
Track changes to your evaluation configurations over time:- Version history: Full audit trail of evaluation changes
- Rollback support: Revert to previous evaluation versions
- Change comparison: Diff view between evaluation versions
- Release management: Tag and deploy evaluation versions
Comparing Model Runs
Compare model versions, prompts, or configurations with statistical rigor.experiments.py
Reproducibility Guarantees
Ensure consistent evaluation results:- Deterministic evaluation: Seeded random sampling and consistent ordering
- Environment pinning: Lock evaluator versions and dependencies
- Input hashing: Verify dataset integrity across runs
- Audit logging: Complete record of evaluation parameters and results
Observability & Logging
Real-time visibility into your healthcare AI in production.Structured Logging
Log inputs, outputs, and metadata with healthcare-specific schemas
Real-time Dashboard
Monitor evaluation metrics, error rates, and trends
Alerting
Get notified when metrics degrade or safety thresholds are breached
Tracing
Track requests through multi-step AI pipelines
Logging Example
logging.py
Human Expert Network
Route AI outputs to clinical experts for review, feedback, and ground truth generation.Who Reviews
Our network includes credentialed healthcare professionals across specialties:Physicians
Board-certified MDs and DOs across specialties
Nurses
RNs and NPs with clinical experience
Coders
Certified medical coders (CPC, CCS, RHIA)
Dieticians
Registered dietitians and nutritionists
Mental Health Coaches
Licensed counselors and therapists
Allied Health
Physical therapists, pharmacists, and more
Credentialing & Verification
All reviewers undergo rigorous verification:| Check | Description |
|---|---|
| License verification | Active license confirmed with state boards |
| Education validation | Degrees verified with institutions |
| Background check | Criminal and sanctions screening |
| Skills assessment | Domain-specific competency testing |
| Ongoing monitoring | Continuous license and sanctions monitoring |
Reviewer Assignment Logic
Intelligent matching of reviews to qualified experts:- Credential matching: Route to reviewers with appropriate licenses
- Specialty alignment: Match clinical domain expertise
- Workload balancing: Distribute work evenly across pool
- Availability windows: Respect reviewer schedules and time zones
- Performance-based routing: Prioritize high-quality reviewers
Conflict-of-Interest Controls
Ensure unbiased reviews:| Control | Description |
|---|---|
| Blinded review | Hide customer identity from reviewers |
| Employer exclusions | Block reviews of competitor organizations |
| Relationship declarations | Reviewers disclose potential conflicts |
| Rotation policies | Prevent over-familiarity with specific outputs |
Quality Assurance & Calibration
Maintain consistent, high-quality reviews:- Gold standard datasets: Test reviewers against known-correct answers
- Inter-rater reliability: Monitor agreement across reviewers
- Calibration sessions: Regular alignment on scoring criteria
- Performance tracking: Individual reviewer quality metrics
- Feedback loops: Share aggregated feedback with reviewers
Security & Compliance
HIPAA Compliant
BAA available, PHI handling, audit logs
SOC 2 Type II
Annual audits, security controls
Encryption
AES-256 at rest, TLS 1.3 in transit
Access Control
RBAC, SSO, MFA support
Data Handling
- PHI De-identification: Automatic PII/PHI detection and redaction
- Data Residency: Choose US, EU, or custom regions
- Retention Policies: Configurable retention with secure deletion
- Audit Logging: Complete audit trail for compliance
Integrations
EHR Systems
Epic, Cerner, Meditech
Voice Platforms
Twilio, Vonage, Amazon Connect
PACS
DICOMweb, Orthanc, dcm4chee
LLM Providers
OpenAI, Anthropic, Azure, AWS Bedrock
CI/CD
GitHub Actions, GitLab CI, Jenkins
Monitoring
Datadog, Grafana, PagerDuty
