Create Evaluation

from rubric import Rubric

client = Rubric()

evaluation = client.evaluations.create(
    name="Triage Accuracy - Weekly",
    project="proj_abc123",
    dataset="ds_xyz789",
    evaluators=[
        {
            "type": "triage_accuracy",
            "config": {
                "severity_weights": {
                    "under_triage": 5.0,
                    "over_triage": 1.0
                }
            }
        },
        {
            "type": "red_flag_detection",
            "config": {
                "protocols": ["chest_pain", "headache"]
            }
        }
    ],
    metadata={
        "triggered_by": "ci_pipeline",
        "model_version": "v2.4.1"
    }
)

print(f"Created evaluation: {evaluation.id}")
print(f"Status: {evaluation.status}")

{
  "id": "eval_def456",
  "object": "evaluation",
  "name": "Triage Accuracy - Weekly",
  "project": "proj_abc123",
  "dataset": "ds_xyz789",
  "status": "pending",
  "evaluators": [
    {
      "type": "triage_accuracy",
      "config": {
        "severity_weights": {
          "under_triage": 5.0,
          "over_triage": 1.0
        }
      }
    },
    {
      "type": "red_flag_detection",
      "config": {
        "protocols": ["chest_pain", "headache"]
      }
    }
  ],
  "progress": {
    "total": 0,
    "completed": 0,
    "failed": 0
  },
  "created_at": "2024-01-15T10:30:00Z",
  "started_at": null,
  "completed_at": null,
  "metadata": {
    "triggered_by": "ci_pipeline",
    "model_version": "v2.4.1"
  }
}

POST

https://api.rubric.ai

evaluations

from rubric import Rubric

client = Rubric()

evaluation = client.evaluations.create(
    name="Triage Accuracy - Weekly",
    project="proj_abc123",
    dataset="ds_xyz789",
    evaluators=[
        {
            "type": "triage_accuracy",
            "config": {
                "severity_weights": {
                    "under_triage": 5.0,
                    "over_triage": 1.0
                }
            }
        },
        {
            "type": "red_flag_detection",
            "config": {
                "protocols": ["chest_pain", "headache"]
            }
        }
    ],
    metadata={
        "triggered_by": "ci_pipeline",
        "model_version": "v2.4.1"
    }
)

print(f"Created evaluation: {evaluation.id}")
print(f"Status: {evaluation.status}")

{
  "id": "eval_def456",
  "object": "evaluation",
  "name": "Triage Accuracy - Weekly",
  "project": "proj_abc123",
  "dataset": "ds_xyz789",
  "status": "pending",
  "evaluators": [
    {
      "type": "triage_accuracy",
      "config": {
        "severity_weights": {
          "under_triage": 5.0,
          "over_triage": 1.0
        }
      }
    },
    {
      "type": "red_flag_detection",
      "config": {
        "protocols": ["chest_pain", "headache"]
      }
    }
  ],
  "progress": {
    "total": 0,
    "completed": 0,
    "failed": 0
  },
  "created_at": "2024-01-15T10:30:00Z",
  "started_at": null,
  "completed_at": null,
  "metadata": {
    "triggered_by": "ci_pipeline",
    "model_version": "v2.4.1"
  }
}

Create a new evaluation to assess a dataset using one or more evaluators. The evaluation runs asynchronously — use the Get Evaluation endpoint to check status.

Authentication

Authorization

string

required

Bearer token with write scope. Example: Bearer gr_live_xxxxxxxx

Request Body

name

string

required

A descriptive name for this evaluation run. Useful for identifying evaluations in the dashboard.

project

string

required

The project ID to run this evaluation in. Must be a valid project you have access to.

dataset

string

required

The dataset ID containing samples to evaluate. All samples in the dataset will be processed.

evaluators

array

required

List of evaluator configurations to run against each sample.

Show evaluators properties

evaluators[].type

string

required

The evaluator type (e.g., triage_accuracy, red_flag_detection, custom).

evaluators[].config

object

Evaluator-specific configuration options.

metadata

object

Arbitrary key-value pairs to attach to this evaluation for filtering and organization.

run_async

boolean

default:"true"

Whether to run the evaluation asynchronously.

Evaluator TypesSee the Evaluators Reference for a complete list of built-in evaluator types and their configuration options.

Response

string

Unique identifier for the evaluation. Example: eval_def456

object

string

Always evaluation

name

string

The name provided for this evaluation

project

string

The project ID this evaluation belongs to

dataset

string

The dataset ID being evaluated

status

string

Current status: pending, running, completed, failed, cancelled

evaluators

array

The evaluator configurations for this evaluation

progress

object

Show progress properties

progress.total

integer

Total number of samples to evaluate

progress.completed

integer

Number of samples completed

progress.failed

integer

Number of samples that failed evaluation

created_at

string

ISO 8601 timestamp when the evaluation was created

started_at

string

ISO 8601 timestamp when processing started (null if pending)

completed_at

string

ISO 8601 timestamp when processing completed (null if not finished)

metadata

object

Custom metadata attached to this evaluation

from rubric import Rubric

client = Rubric()

evaluation = client.evaluations.create(
    name="Triage Accuracy - Weekly",
    project="proj_abc123",
    dataset="ds_xyz789",
    evaluators=[
        {
            "type": "triage_accuracy",
            "config": {
                "severity_weights": {
                    "under_triage": 5.0,
                    "over_triage": 1.0
                }
            }
        },
        {
            "type": "red_flag_detection",
            "config": {
                "protocols": ["chest_pain", "headache"]
            }
        }
    ],
    metadata={
        "triggered_by": "ci_pipeline",
        "model_version": "v2.4.1"
    }
)

print(f"Created evaluation: {evaluation.id}")
print(f"Status: {evaluation.status}")

{
  "id": "eval_def456",
  "object": "evaluation",
  "name": "Triage Accuracy - Weekly",
  "project": "proj_abc123",
  "dataset": "ds_xyz789",
  "status": "pending",
  "evaluators": [
    {
      "type": "triage_accuracy",
      "config": {
        "severity_weights": {
          "under_triage": 5.0,
          "over_triage": 1.0
        }
      }
    },
    {
      "type": "red_flag_detection",
      "config": {
        "protocols": ["chest_pain", "headache"]
      }
    }
  ],
  "progress": {
    "total": 0,
    "completed": 0,
    "failed": 0
  },
  "created_at": "2024-01-15T10:30:00Z",
  "started_at": null,
  "completed_at": null,
  "metadata": {
    "triggered_by": "ci_pipeline",
    "model_version": "v2.4.1"
  }
}

Get Evaluation

Retrieve evaluation details and results

List Evaluations

List all evaluations in a project

Get Status

Check evaluation progress

Evaluations Overview List Evaluations

⌘I

Overview

Authentication

Errors

Rate Limits

Datasets

Samples

Models

Evaluations

Tasks

Rubrics

Scores

Reviewers

Reviews

Webhooks

SDKs

Create Evaluation

Authentication

Request Body

Response

Get Evaluation

List Evaluations

Get Status

Overview

Authentication

Errors

Rate Limits

Datasets

Samples

Models

Evaluations

Tasks

Rubrics

Scores

Reviewers

Reviews

Webhooks

SDKs

​Authentication

​Request Body

​Response

​Related Endpoints

Get Evaluation

List Evaluations

Get Status

Authentication

Request Body

Response

Related Endpoints