Overview
Rubric accepts both audio recordings and transcripts for voice AI evaluation. This page covers supported formats, processing options, and best practices.Audio Formats
Supported Formats
| Format | Extension | Notes |
|---|---|---|
| WAV | .wav | Recommended for highest quality |
| MP3 | .mp3 | Good compression, widely supported |
| M4A | .m4a | AAC codec, Apple ecosystem |
| FLAC | .flac | Lossless compression |
| OGG | .ogg | Open format |
| WebM | .webm | Browser recordings |
Quality Requirements
For best transcription accuracy, we recommend:
- Sample rate: 16kHz or higher
- Bit depth: 16-bit or higher
- Channels: Mono or stereo with clear speaker separation
Providing Audio
Transcript Format
Standard Format
Required Fields
Speaker identifier. Use
agent for AI and patient for the caller. Custom labels supported.The spoken text content.
Optional Fields
Start time in seconds from beginning of audio.
End time in seconds.
Transcription confidence score (0-1).
Word-level timestamps for fine-grained analysis.
Speaker Diarization
If you provide audio without a transcript, Rubric can perform automatic speaker diarization:Diarization Options
| Option | Description |
|---|---|
expected_speakers | Hint for number of speakers (improves accuracy) |
speaker_labels | Map speaker IDs to roles: {"SPEAKER_00": "agent", "SPEAKER_01": "patient"} |
Transcript Sources
Rubric integrates with popular transcription services:Deepgram
Real-time streaming transcription
AssemblyAI
High accuracy, medical vocabulary
Whisper
Open source, self-hosted option
Using Pre-transcribed Data
If you already have transcripts from your provider:Audio-Transcript Alignment
When both audio and transcript are provided, Rubric can:- Verify alignment - Check that transcript matches audio
- Identify gaps - Find segments missing from transcript
- Flag discrepancies - Detect potential transcription errors
