Transcripts & Audio

Overview

Rubric accepts both audio recordings and transcripts for voice AI evaluation. This page covers supported formats, processing options, and best practices.

Audio Formats

Supported Formats

Format	Extension	Notes
WAV	`.wav`	Recommended for highest quality
MP3	`.mp3`	Good compression, widely supported
M4A	`.m4a`	AAC codec, Apple ecosystem
FLAC	`.flac`	Lossless compression
OGG	`.ogg`	Open format
WebM	`.webm`	Browser recordings

Quality Requirements

For best transcription accuracy, we recommend:

Sample rate: 16kHz or higher
Bit depth: 16-bit or higher
Channels: Mono or stereo with clear speaker separation

Providing Audio

# Option 1: Public URL
client.calls.log(
    audio_url="https://storage.example.com/calls/call_123.wav",
    ...
)

# Option 2: Signed URL (recommended for private storage)
client.calls.log(
    audio_url="https://bucket.s3.amazonaws.com/call.wav?X-Amz-Signature=...",
    ...
)

# Option 3: Upload directly
with open("call.wav", "rb") as f:
    client.calls.log(
        audio=f,
        ...
    )

Transcript Format

Standard Format

{
  "transcript": [
    {
      "speaker": "agent",
      "text": "Thank you for calling. How can I help you today?",
      "start": 0.0,
      "end": 2.5
    },
    {
      "speaker": "patient", 
      "text": "I've been having chest pain since this morning.",
      "start": 3.0,
      "end": 6.5
    }
  ]
}

Required Fields

speaker

string

required

Speaker identifier. Use agent for AI and patient for the caller. Custom labels supported.

text

string

required

The spoken text content.

Optional Fields

start

float

Start time in seconds from beginning of audio.

end

float

End time in seconds.

confidence

float

Transcription confidence score (0-1).

words

array

Word-level timestamps for fine-grained analysis.

{
  "words": [
    {"word": "chest", "start": 3.2, "end": 3.5, "confidence": 0.98},
    {"word": "pain", "start": 3.5, "end": 3.9, "confidence": 0.99}
  ]
}

Speaker Diarization

If you provide audio without a transcript, Rubric can perform automatic speaker diarization:

client.calls.log(
    project="triage",
    audio_url="https://...",
    
    # Enable automatic transcription and diarization
    transcribe=True,
    diarize=True,
    
    # Hint: expected number of speakers
    expected_speakers=2,
    
    ai_decision={...}
)

Diarization Options

Option	Description
`expected_speakers`	Hint for number of speakers (improves accuracy)
`speaker_labels`	Map speaker IDs to roles: `{"SPEAKER_00": "agent", "SPEAKER_01": "patient"}`

Transcript Sources

Rubric integrates with popular transcription services:

Deepgram

Real-time streaming transcription

AssemblyAI

High accuracy, medical vocabulary

Whisper

Open source, self-hosted option

Using Pre-transcribed Data

If you already have transcripts from your provider:

# Deepgram format
deepgram_response = {...}  # From Deepgram API

client.calls.log(
    project="triage",
    transcript_source="deepgram",
    transcript_raw=deepgram_response,
    ai_decision={...}
)

Rubric will automatically normalize the format.

Audio-Transcript Alignment

When both audio and transcript are provided, Rubric can:

Verify alignment - Check that transcript matches audio
Identify gaps - Find segments missing from transcript
Flag discrepancies - Detect potential transcription errors

client.calls.log(
    audio_url="https://...",
    transcript=[...],
    
    # Enable alignment verification
    verify_alignment=True,
    alignment_tolerance=0.5  # seconds
)

Privacy & Redaction

Voice recordings may contain PHI. Ensure proper handling per HIPAA requirements.

Automatic Redaction

Rubric can automatically detect and redact sensitive information:

client.calls.log(
    transcript=[...],
    
    # Enable PII detection and redaction
    redact_pii=True,
    pii_types=["name", "ssn", "dob", "address", "phone"]
)

The stored transcript will have redactions:

{
  "speaker": "patient",
  "text": "My name is [REDACTED_NAME] and my date of birth is [REDACTED_DOB].",
  "redactions": [
    {"type": "name", "start": 11, "end": 26},
    {"type": "dob", "start": 51, "end": 63}
  ]
}

Home

Getting Started

Core Concepts

Onboarding Guides

Evaluation Framework

Tutorials

Integrations

Security & Compliance

Voice AI

Medical Imaging

Clinical Notes

Workflows

Glossary & Appendix

Transcripts & Audio

Overview

Audio Formats

Supported Formats

Quality Requirements

Providing Audio

Transcript Format

Standard Format

Required Fields

Optional Fields

Speaker Diarization

Diarization Options

Transcript Sources

Deepgram

AssemblyAI

Whisper

Using Pre-transcribed Data

Audio-Transcript Alignment

Privacy & Redaction

Automatic Redaction

Next Steps

Voice Overview

Patient Triage

Home

Getting Started

Core Concepts

Onboarding Guides

Evaluation Framework

Tutorials

Integrations

Security & Compliance

Voice AI

Medical Imaging

Clinical Notes

Workflows

Glossary & Appendix

​Overview

​Audio Formats

​Supported Formats

​Quality Requirements

​Providing Audio

​Transcript Format

​Standard Format

​Required Fields

​Optional Fields

​Speaker Diarization

​Diarization Options

​Transcript Sources

Deepgram

AssemblyAI

Whisper

​Using Pre-transcribed Data

​Audio-Transcript Alignment

​Privacy & Redaction

​Automatic Redaction

​Next Steps

Voice Overview

Patient Triage

Overview

Audio Formats

Supported Formats

Quality Requirements

Providing Audio

Transcript Format

Standard Format

Required Fields

Optional Fields

Speaker Diarization

Diarization Options

Transcript Sources

Using Pre-transcribed Data

Audio-Transcript Alignment

Privacy & Redaction

Automatic Redaction

Next Steps