Skip to main content
POST
https://api.rubric.ai
/
v1
/
evaluations
from rubric import Rubric

client = Rubric()

evaluation = client.evaluations.create(
    name="Triage Accuracy - Weekly",
    project="proj_abc123",
    dataset="ds_xyz789",
    evaluators=[
        {
            "type": "triage_accuracy",
            "config": {
                "severity_weights": {
                    "under_triage": 5.0,
                    "over_triage": 1.0
                }
            }
        },
        {
            "type": "red_flag_detection",
            "config": {
                "protocols": ["chest_pain", "headache"]
            }
        }
    ],
    metadata={
        "triggered_by": "ci_pipeline",
        "model_version": "v2.4.1"
    }
)

print(f"Created evaluation: {evaluation.id}")
print(f"Status: {evaluation.status}")
{
  "id": "eval_def456",
  "object": "evaluation",
  "name": "Triage Accuracy - Weekly",
  "project": "proj_abc123",
  "dataset": "ds_xyz789",
  "status": "pending",
  "evaluators": [
    {
      "type": "triage_accuracy",
      "config": {
        "severity_weights": {
          "under_triage": 5.0,
          "over_triage": 1.0
        }
      }
    },
    {
      "type": "red_flag_detection",
      "config": {
        "protocols": ["chest_pain", "headache"]
      }
    }
  ],
  "progress": {
    "total": 0,
    "completed": 0,
    "failed": 0
  },
  "created_at": "2024-01-15T10:30:00Z",
  "started_at": null,
  "completed_at": null,
  "metadata": {
    "triggered_by": "ci_pipeline",
    "model_version": "v2.4.1"
  }
}
Create a new evaluation to assess a dataset using one or more evaluators. The evaluation runs asynchronously — use the Get Evaluation endpoint to check status.

Authentication

Authorization
string
required
Bearer token with write scope. Example: Bearer gr_live_xxxxxxxx

Request Body

name
string
required
A descriptive name for this evaluation run. Useful for identifying evaluations in the dashboard.
project
string
required
The project ID to run this evaluation in. Must be a valid project you have access to.
dataset
string
required
The dataset ID containing samples to evaluate. All samples in the dataset will be processed.
evaluators
array
required
List of evaluator configurations to run against each sample.
metadata
object
Arbitrary key-value pairs to attach to this evaluation for filtering and organization.
run_async
boolean
default:"true"
Whether to run the evaluation asynchronously.
Evaluator TypesSee the Evaluators Reference for a complete list of built-in evaluator types and their configuration options.

Response

id
string
Unique identifier for the evaluation. Example: eval_def456
object
string
Always evaluation
name
string
The name provided for this evaluation
project
string
The project ID this evaluation belongs to
dataset
string
The dataset ID being evaluated
status
string
Current status: pending, running, completed, failed, cancelled
evaluators
array
The evaluator configurations for this evaluation
progress
object
created_at
string
ISO 8601 timestamp when the evaluation was created
started_at
string
ISO 8601 timestamp when processing started (null if pending)
completed_at
string
ISO 8601 timestamp when processing completed (null if not finished)
metadata
object
Custom metadata attached to this evaluation
from rubric import Rubric

client = Rubric()

evaluation = client.evaluations.create(
    name="Triage Accuracy - Weekly",
    project="proj_abc123",
    dataset="ds_xyz789",
    evaluators=[
        {
            "type": "triage_accuracy",
            "config": {
                "severity_weights": {
                    "under_triage": 5.0,
                    "over_triage": 1.0
                }
            }
        },
        {
            "type": "red_flag_detection",
            "config": {
                "protocols": ["chest_pain", "headache"]
            }
        }
    ],
    metadata={
        "triggered_by": "ci_pipeline",
        "model_version": "v2.4.1"
    }
)

print(f"Created evaluation: {evaluation.id}")
print(f"Status: {evaluation.status}")
{
  "id": "eval_def456",
  "object": "evaluation",
  "name": "Triage Accuracy - Weekly",
  "project": "proj_abc123",
  "dataset": "ds_xyz789",
  "status": "pending",
  "evaluators": [
    {
      "type": "triage_accuracy",
      "config": {
        "severity_weights": {
          "under_triage": 5.0,
          "over_triage": 1.0
        }
      }
    },
    {
      "type": "red_flag_detection",
      "config": {
        "protocols": ["chest_pain", "headache"]
      }
    }
  ],
  "progress": {
    "total": 0,
    "completed": 0,
    "failed": 0
  },
  "created_at": "2024-01-15T10:30:00Z",
  "started_at": null,
  "completed_at": null,
  "metadata": {
    "triggered_by": "ci_pipeline",
    "model_version": "v2.4.1"
  }
}