Overview
Not all healthcare AI evaluations require clinical expertise. This guide helps you understand when clinical review is necessary and how to set it up properly.
Clinical vs Non-Clinical
What Requires Clinical Review
Clinical Decisions
Triage level appropriateness
Diagnostic suggestions
Treatment recommendations
Red flag assessment
Patient Safety
Missed critical symptoms
Inappropriate escalation/de-escalation
Contraindication detection
Emergency recognition
What Doesn’t Require Clinical Review
Evaluation Type Reason Latency/performance Technical metric Format compliance Structural check Keyword extraction Pattern matching Sentiment analysis Linguistic analysis Factual accuracy (non-medical) General knowledge
Decision Matrix
Credential Requirements
Reviewer Credentials
Credential Abbreviation Scope Doctor of Medicine MD All clinical decisions Doctor of Osteopathy DO All clinical decisions Nurse Practitioner NP Triage, symptom assessment, care planning Physician Assistant PA Triage, symptom assessment Registered Nurse RN Protocol compliance, patient communication Licensed Practical Nurse LPN Documentation review
Credential Configuration
# Configure credential requirements by evaluation type
client.projects.update(
project = "patient-triage" ,
credential_requirements = {
# Emergency decisions require physician review
"emergency_triage" : {
"required_credentials" : [ "MD" , "DO" ],
"fallback_credentials" : [ "NP" ], # If no MD available
"require_verification" : True
},
# Urgent can be NP/PA
"urgent_triage" : {
"required_credentials" : [ "MD" , "DO" , "NP" , "PA" ],
"require_verification" : True
},
# Routine can be RN
"routine_triage" : {
"required_credentials" : [ "MD" , "DO" , "NP" , "PA" , "RN" ],
"require_verification" : True
},
# Documentation review
"documentation_quality" : {
"required_credentials" : [ "RN" , "LPN" ],
"require_verification" : False # Internal staff
}
}
)
Credential Verification
# Add verified reviewer
reviewer = client.reviewers.create(
user = "user_abc123" ,
name = "Dr. Sarah Chen" ,
email = "[email protected] " ,
credentials = [
{
"type" : "MD" ,
"license_number" : "A12345" ,
"state" : "CA" ,
"issued_date" : "2015-06-01" ,
"expiration_date" : "2026-12-31" ,
"verification_method" : "state_board_api" ,
"verified_at" : "2025-01-15T10:30:00Z"
},
{
"type" : "board_certification" ,
"specialty" : "internal_medicine" ,
"board" : "ABIM" ,
"certification_number" : "IM98765" ,
"expiration_date" : "2027-12-31"
}
],
specialties = [ "internal_medicine" , "cardiology" ]
)
Evaluator Categories
Clinical Evaluators
These evaluators assess clinical decision-making quality:
Requires : MD, DO, NP, PAAssesses whether AI assigned appropriate urgency level. {
"type" : "triage_accuracy" ,
"clinical" : True ,
"reviewer_credentials" : [ "MD" , "DO" , "NP" , "PA" ]
}
Requires : MD, DO, NPEvaluates identification of critical symptoms. {
"type" : "red_flag_detection" ,
"clinical" : True ,
"reviewer_credentials" : [ "MD" , "DO" , "NP" ]
}
Diagnostic Appropriateness
Requires : MD, DOReviews AI’s diagnostic suggestions. {
"type" : "diagnostic_appropriateness" ,
"clinical" : True ,
"reviewer_credentials" : [ "MD" , "DO" ]
}
Requires : MD, DO, PharmDEvaluates medication and treatment recommendations. {
"type" : "treatment_safety" ,
"clinical" : True ,
"reviewer_credentials" : [ "MD" , "DO" , "PharmD" ]
}
Non-Clinical Evaluators
These can run fully automated:
Evaluator Type Description response_latencyPerformance Response time measurement format_complianceStructural Output format validation keyword_extractionNLP Entity extraction accuracy sentiment_scoreNLP Tone analysis token_countResource Usage metrics cost_calculationResource API cost tracking
# Mix of clinical and non-clinical
evaluators = [
# Clinical - requires human review for failures
{ "type" : "triage_accuracy" , "clinical" : True },
{ "type" : "red_flag_detection" , "clinical" : True },
# Non-clinical - fully automated
{ "type" : "response_latency" , "clinical" : False },
{ "type" : "format_compliance" , "clinical" : False },
{ "type" : "symptom_extraction" , "clinical" : False }
]
Regulatory Considerations
FDA Software as Medical Device (SaMD)
If your AI qualifies as SaMD, evaluation must meet regulatory requirements:
This section provides general guidance only. Consult with regulatory experts for your specific situation.
FDA Risk Level Clinical Evaluation Requirements Class I (Low) Documented evaluation process Class II (Moderate) Clinical validation with physician oversight Class III (High) Extensive clinical trials, physician-led evaluation
Documentation Requirements
# Generate FDA-ready evaluation reports
report = client.exports.create_regulatory_report(
evaluation = "eval_abc123" ,
format = "fda_samd" ,
include = [
"evaluation_protocol" ,
"reviewer_credentials" ,
"sample_selection_methodology" ,
"statistical_analysis" ,
"adverse_event_tracking" ,
"audit_trail"
],
certify = {
"clinical_oversight" : "Dr. Jane Smith, MD" ,
"qa_review" : "Quality Team Lead"
}
)
Audit Trail
Rubric maintains complete audit trails for compliance:
# Get audit log for an evaluation
audit_log = client.audit.get_evaluation_log(
evaluation = "eval_abc123" ,
include = [
"sample_access" ,
"reviewer_actions" ,
"score_modifications" ,
"status_changes"
]
)
for entry in audit_log:
print ( f " { entry.timestamp } : { entry.action } by { entry.user } " )
Custom Clinical Evaluators
Build evaluators that enforce clinical requirements:
from rubric import Evaluator
class ChestPainProtocolEvaluator ( Evaluator ):
"""Evaluates adherence to chest pain triage protocol."""
name = "chest_pain_protocol"
clinical = True
required_credentials = [ "MD" , "DO" , "NP" ]
# Required questions per protocol
REQUIRED_QUESTIONS = [
"pain_location" ,
"pain_quality" ,
"pain_radiation" ,
"associated_symptoms" ,
"onset_timing" ,
"exacerbating_factors" ,
"medical_history"
]
def evaluate ( self , sample ):
transcript = sample.input[ "transcript" ]
questions_asked = self .extract_questions(transcript)
missing = set ( self . REQUIRED_QUESTIONS ) - set (questions_asked)
coverage = len (questions_asked) / len ( self . REQUIRED_QUESTIONS )
# Flag for clinical review if protocol not followed
needs_review = coverage < 0.8 or "pain_radiation" in missing
return {
"score" : coverage,
"passed" : coverage >= 0.8 ,
"needs_clinical_review" : needs_review,
"details" : {
"questions_asked" : questions_asked,
"missing_questions" : list (missing),
"coverage_percent" : coverage * 100
}
}
def extract_questions ( self , transcript ):
# NLP logic to identify questions asked
...
# Register the evaluator
client.evaluators.register(ChestPainProtocolEvaluator)
Setting Up Clinical Review
# Clinical review rubric
client.rubrics.create(
name = "Triage Clinical Review" ,
project = "patient-triage" ,
dimensions = [
{
"name" : "triage_appropriateness" ,
"question" : "Was the triage level clinically appropriate?" ,
"type" : "categorical" ,
"options" : [
{ "value" : "appropriate" , "label" : "Appropriate" },
{ "value" : "over_triaged" , "label" : "Over-triaged (acceptable)" },
{ "value" : "under_triaged" , "label" : "Under-triaged (unsafe)" , "severity" : "critical" }
],
"required" : True
},
{
"name" : "safety_assessment" ,
"question" : "Were all safety-critical symptoms addressed?" ,
"type" : "categorical" ,
"options" : [
{ "value" : "all_addressed" , "label" : "All addressed" },
{ "value" : "minor_gaps" , "label" : "Minor gaps" },
{ "value" : "critical_missed" , "label" : "Critical symptoms missed" , "severity" : "critical" }
],
"required" : True
},
{
"name" : "correct_triage" ,
"question" : "If incorrect, what should the triage level be?" ,
"type" : "categorical" ,
"options" : [ "emergency" , "urgent" , "semi_urgent" , "routine" ],
"conditional" : "triage_appropriateness != 'appropriate'"
},
{
"name" : "clinical_notes" ,
"question" : "Clinical rationale and notes" ,
"type" : "text" ,
"required_on_failure" : True ,
"min_length" : 50
}
],
credential_requirements = {
"default" : [ "MD" , "DO" , "NP" , "PA" ],
"on_critical_finding" : [ "MD" , "DO" ] # Escalate to physician
}
)
Review Interface
The clinical review interface includes:
Audio/Video playback (for voice/video data)
Transcript view with symptom highlighting
AI reasoning display showing extracted symptoms and decision logic
Structured rubric for consistent grading
Clinical notes field with required minimum detail
Reference materials (protocols, guidelines) accessible
Best Practices
Separate Clinical from Non-Clinical
Run non-clinical evaluators automatically, reserve clinical review for decisions that truly need expertise: # First pass: automated
auto_eval = client.evaluations.create(
evaluators = [
{ "type" : "response_latency" , "clinical" : False },
{ "type" : "format_compliance" , "clinical" : False }
]
)
# Second pass: clinical (on flagged samples)
clinical_eval = client.evaluations.create(
evaluators = [
{ "type" : "triage_accuracy" , "clinical" : True }
],
samples = flagged_sample_ids
)
Maintain Credential Currency
Automate credential verification and expiration tracking: # Set up expiration alerts
client.reviewers.configure_alerts(
alert_before_expiration_days = [ 90 , 30 , 7 ],
notify = [ "[email protected] " , "reviewer_email" ]
)
For regulatory compliance, ensure complete documentation:
Evaluation protocol definition
Reviewer credential verification
Sample selection methodology
All scores and rationale
Audit trail of all actions
Next Steps