Skip to main content

Overview

Rubric accepts both audio recordings and transcripts for voice AI evaluation. This page covers supported formats, processing options, and best practices.

Audio Formats

Supported Formats

FormatExtensionNotes
WAV.wavRecommended for highest quality
MP3.mp3Good compression, widely supported
M4A.m4aAAC codec, Apple ecosystem
FLAC.flacLossless compression
OGG.oggOpen format
WebM.webmBrowser recordings

Quality Requirements

For best transcription accuracy, we recommend:
  • Sample rate: 16kHz or higher
  • Bit depth: 16-bit or higher
  • Channels: Mono or stereo with clear speaker separation

Providing Audio

# Option 1: Public URL
client.calls.log(
    audio_url="https://storage.example.com/calls/call_123.wav",
    ...
)

# Option 2: Signed URL (recommended for private storage)
client.calls.log(
    audio_url="https://bucket.s3.amazonaws.com/call.wav?X-Amz-Signature=...",
    ...
)

# Option 3: Upload directly
with open("call.wav", "rb") as f:
    client.calls.log(
        audio=f,
        ...
    )

Transcript Format

Standard Format

{
  "transcript": [
    {
      "speaker": "agent",
      "text": "Thank you for calling. How can I help you today?",
      "start": 0.0,
      "end": 2.5
    },
    {
      "speaker": "patient", 
      "text": "I've been having chest pain since this morning.",
      "start": 3.0,
      "end": 6.5
    }
  ]
}

Required Fields

speaker
string
required
Speaker identifier. Use agent for AI and patient for the caller. Custom labels supported.
text
string
required
The spoken text content.

Optional Fields

start
float
Start time in seconds from beginning of audio.
end
float
End time in seconds.
confidence
float
Transcription confidence score (0-1).
words
array
Word-level timestamps for fine-grained analysis.
{
  "words": [
    {"word": "chest", "start": 3.2, "end": 3.5, "confidence": 0.98},
    {"word": "pain", "start": 3.5, "end": 3.9, "confidence": 0.99}
  ]
}

Speaker Diarization

If you provide audio without a transcript, Rubric can perform automatic speaker diarization:
client.calls.log(
    project="triage",
    audio_url="https://...",
    
    # Enable automatic transcription and diarization
    transcribe=True,
    diarize=True,
    
    # Hint: expected number of speakers
    expected_speakers=2,
    
    ai_decision={...}
)

Diarization Options

OptionDescription
expected_speakersHint for number of speakers (improves accuracy)
speaker_labelsMap speaker IDs to roles: {"SPEAKER_00": "agent", "SPEAKER_01": "patient"}

Transcript Sources

Rubric integrates with popular transcription services:

Deepgram

Real-time streaming transcription

AssemblyAI

High accuracy, medical vocabulary

Whisper

Open source, self-hosted option

Using Pre-transcribed Data

If you already have transcripts from your provider:
# Deepgram format
deepgram_response = {...}  # From Deepgram API

client.calls.log(
    project="triage",
    transcript_source="deepgram",
    transcript_raw=deepgram_response,
    ai_decision={...}
)
Rubric will automatically normalize the format.

Audio-Transcript Alignment

When both audio and transcript are provided, Rubric can:
  1. Verify alignment - Check that transcript matches audio
  2. Identify gaps - Find segments missing from transcript
  3. Flag discrepancies - Detect potential transcription errors
client.calls.log(
    audio_url="https://...",
    transcript=[...],
    
    # Enable alignment verification
    verify_alignment=True,
    alignment_tolerance=0.5  # seconds
)

Privacy & Redaction

Voice recordings may contain PHI. Ensure proper handling per HIPAA requirements.

Automatic Redaction

Rubric can automatically detect and redact sensitive information:
client.calls.log(
    transcript=[...],
    
    # Enable PII detection and redaction
    redact_pii=True,
    pii_types=["name", "ssn", "dob", "address", "phone"]
)
The stored transcript will have redactions:
{
  "speaker": "patient",
  "text": "My name is [REDACTED_NAME] and my date of birth is [REDACTED_DOB].",
  "redactions": [
    {"type": "name", "start": 11, "end": 26},
    {"type": "dob", "start": 51, "end": 63}
  ]
}

Next Steps