Skip to main content

Overview

Not all healthcare AI evaluations require clinical expertise. This guide helps you understand when clinical review is necessary and how to set it up properly.

Clinical vs Non-Clinical

What Requires Clinical Review

Clinical Decisions

  • Triage level appropriateness
  • Diagnostic suggestions
  • Treatment recommendations
  • Red flag assessment

Patient Safety

  • Missed critical symptoms
  • Inappropriate escalation/de-escalation
  • Contraindication detection
  • Emergency recognition

What Doesn’t Require Clinical Review

Evaluation TypeReason
Latency/performanceTechnical metric
Format complianceStructural check
Keyword extractionPattern matching
Sentiment analysisLinguistic analysis
Factual accuracy (non-medical)General knowledge

Decision Matrix

Credential Requirements

Reviewer Credentials

CredentialAbbreviationScope
Doctor of MedicineMDAll clinical decisions
Doctor of OsteopathyDOAll clinical decisions
Nurse PractitionerNPTriage, symptom assessment, care planning
Physician AssistantPATriage, symptom assessment
Registered NurseRNProtocol compliance, patient communication
Licensed Practical NurseLPNDocumentation review

Credential Configuration

# Configure credential requirements by evaluation type
client.projects.update(
    project="patient-triage",
    
    credential_requirements={
        # Emergency decisions require physician review
        "emergency_triage": {
            "required_credentials": ["MD", "DO"],
            "fallback_credentials": ["NP"],  # If no MD available
            "require_verification": True
        },
        
        # Urgent can be NP/PA
        "urgent_triage": {
            "required_credentials": ["MD", "DO", "NP", "PA"],
            "require_verification": True
        },
        
        # Routine can be RN
        "routine_triage": {
            "required_credentials": ["MD", "DO", "NP", "PA", "RN"],
            "require_verification": True
        },
        
        # Documentation review
        "documentation_quality": {
            "required_credentials": ["RN", "LPN"],
            "require_verification": False  # Internal staff
        }
    }
)

Credential Verification

# Add verified reviewer
reviewer = client.reviewers.create(
    user="user_abc123",
    name="Dr. Sarah Chen",
    email="[email protected]",
    
    credentials=[
        {
            "type": "MD",
            "license_number": "A12345",
            "state": "CA",
            "issued_date": "2015-06-01",
            "expiration_date": "2026-12-31",
            "verification_method": "state_board_api",
            "verified_at": "2025-01-15T10:30:00Z"
        },
        {
            "type": "board_certification",
            "specialty": "internal_medicine",
            "board": "ABIM",
            "certification_number": "IM98765",
            "expiration_date": "2027-12-31"
        }
    ],
    
    specialties=["internal_medicine", "cardiology"]
)

Evaluator Categories

Clinical Evaluators

These evaluators assess clinical decision-making quality:
Requires: MD, DO, NP, PAAssesses whether AI assigned appropriate urgency level.
{
    "type": "triage_accuracy",
    "clinical": True,
    "reviewer_credentials": ["MD", "DO", "NP", "PA"]
}
Requires: MD, DO, NPEvaluates identification of critical symptoms.
{
    "type": "red_flag_detection",
    "clinical": True,
    "reviewer_credentials": ["MD", "DO", "NP"]
}
Requires: MD, DOReviews AI’s diagnostic suggestions.
{
    "type": "diagnostic_appropriateness",
    "clinical": True,
    "reviewer_credentials": ["MD", "DO"]
}
Requires: MD, DO, PharmDEvaluates medication and treatment recommendations.
{
    "type": "treatment_safety",
    "clinical": True,
    "reviewer_credentials": ["MD", "DO", "PharmD"]
}

Non-Clinical Evaluators

These can run fully automated:
EvaluatorTypeDescription
response_latencyPerformanceResponse time measurement
format_complianceStructuralOutput format validation
keyword_extractionNLPEntity extraction accuracy
sentiment_scoreNLPTone analysis
token_countResourceUsage metrics
cost_calculationResourceAPI cost tracking
# Mix of clinical and non-clinical
evaluators = [
    # Clinical - requires human review for failures
    {"type": "triage_accuracy", "clinical": True},
    {"type": "red_flag_detection", "clinical": True},
    
    # Non-clinical - fully automated
    {"type": "response_latency", "clinical": False},
    {"type": "format_compliance", "clinical": False},
    {"type": "symptom_extraction", "clinical": False}
]

Regulatory Considerations

FDA Software as Medical Device (SaMD)

If your AI qualifies as SaMD, evaluation must meet regulatory requirements:
This section provides general guidance only. Consult with regulatory experts for your specific situation.
FDA Risk LevelClinical Evaluation Requirements
Class I (Low)Documented evaluation process
Class II (Moderate)Clinical validation with physician oversight
Class III (High)Extensive clinical trials, physician-led evaluation

Documentation Requirements

# Generate FDA-ready evaluation reports
report = client.exports.create_regulatory_report(
    evaluation="eval_abc123",
    
    format="fda_samd",
    
    include=[
        "evaluation_protocol",
        "reviewer_credentials",
        "sample_selection_methodology",
        "statistical_analysis",
        "adverse_event_tracking",
        "audit_trail"
    ],
    
    certify={
        "clinical_oversight": "Dr. Jane Smith, MD",
        "qa_review": "Quality Team Lead"
    }
)

Audit Trail

Rubric maintains complete audit trails for compliance:
# Get audit log for an evaluation
audit_log = client.audit.get_evaluation_log(
    evaluation="eval_abc123",
    include=[
        "sample_access",
        "reviewer_actions",
        "score_modifications",
        "status_changes"
    ]
)

for entry in audit_log:
    print(f"{entry.timestamp}: {entry.action} by {entry.user}")

Custom Clinical Evaluators

Build evaluators that enforce clinical requirements:
from rubric import Evaluator

class ChestPainProtocolEvaluator(Evaluator):
    """Evaluates adherence to chest pain triage protocol."""
    
    name = "chest_pain_protocol"
    clinical = True
    required_credentials = ["MD", "DO", "NP"]
    
    # Required questions per protocol
    REQUIRED_QUESTIONS = [
        "pain_location",
        "pain_quality", 
        "pain_radiation",
        "associated_symptoms",
        "onset_timing",
        "exacerbating_factors",
        "medical_history"
    ]
    
    def evaluate(self, sample):
        transcript = sample.input["transcript"]
        questions_asked = self.extract_questions(transcript)
        
        missing = set(self.REQUIRED_QUESTIONS) - set(questions_asked)
        coverage = len(questions_asked) / len(self.REQUIRED_QUESTIONS)
        
        # Flag for clinical review if protocol not followed
        needs_review = coverage < 0.8 or "pain_radiation" in missing
        
        return {
            "score": coverage,
            "passed": coverage >= 0.8,
            "needs_clinical_review": needs_review,
            "details": {
                "questions_asked": questions_asked,
                "missing_questions": list(missing),
                "coverage_percent": coverage * 100
            }
        }
    
    def extract_questions(self, transcript):
        # NLP logic to identify questions asked
        ...

# Register the evaluator
client.evaluators.register(ChestPainProtocolEvaluator)

Setting Up Clinical Review

Configure Review Rubric

# Clinical review rubric
client.rubrics.create(
    name="Triage Clinical Review",
    project="patient-triage",
    
    dimensions=[
        {
            "name": "triage_appropriateness",
            "question": "Was the triage level clinically appropriate?",
            "type": "categorical",
            "options": [
                {"value": "appropriate", "label": "Appropriate"},
                {"value": "over_triaged", "label": "Over-triaged (acceptable)"},
                {"value": "under_triaged", "label": "Under-triaged (unsafe)", "severity": "critical"}
            ],
            "required": True
        },
        {
            "name": "safety_assessment",
            "question": "Were all safety-critical symptoms addressed?",
            "type": "categorical",
            "options": [
                {"value": "all_addressed", "label": "All addressed"},
                {"value": "minor_gaps", "label": "Minor gaps"},
                {"value": "critical_missed", "label": "Critical symptoms missed", "severity": "critical"}
            ],
            "required": True
        },
        {
            "name": "correct_triage",
            "question": "If incorrect, what should the triage level be?",
            "type": "categorical",
            "options": ["emergency", "urgent", "semi_urgent", "routine"],
            "conditional": "triage_appropriateness != 'appropriate'"
        },
        {
            "name": "clinical_notes",
            "question": "Clinical rationale and notes",
            "type": "text",
            "required_on_failure": True,
            "min_length": 50
        }
    ],
    
    credential_requirements={
        "default": ["MD", "DO", "NP", "PA"],
        "on_critical_finding": ["MD", "DO"]  # Escalate to physician
    }
)

Review Interface

The clinical review interface includes:
  • Audio/Video playback (for voice/video data)
  • Transcript view with symptom highlighting
  • AI reasoning display showing extracted symptoms and decision logic
  • Structured rubric for consistent grading
  • Clinical notes field with required minimum detail
  • Reference materials (protocols, guidelines) accessible

Best Practices

Run non-clinical evaluators automatically, reserve clinical review for decisions that truly need expertise:
# First pass: automated
auto_eval = client.evaluations.create(
    evaluators=[
        {"type": "response_latency", "clinical": False},
        {"type": "format_compliance", "clinical": False}
    ]
)

# Second pass: clinical (on flagged samples)
clinical_eval = client.evaluations.create(
    evaluators=[
        {"type": "triage_accuracy", "clinical": True}
    ],
    samples=flagged_sample_ids
)
Automate credential verification and expiration tracking:
# Set up expiration alerts
client.reviewers.configure_alerts(
    alert_before_expiration_days=[90, 30, 7],
    notify=["[email protected]", "reviewer_email"]
)
For regulatory compliance, ensure complete documentation:
  • Evaluation protocol definition
  • Reviewer credential verification
  • Sample selection methodology
  • All scores and rationale
  • Audit trail of all actions

Next Steps