KORA Benchmark

The first public benchmark for AI child safety. Learn more at korabench.ai.

Overview

KORA measures the safety of AI models interacting with children across 8 risk categories, 25 individual risks, 3 age groups (7–9, 10–12, 13–17), and 10 child motivations. It produces per-risk grades — failing, adequate, or exemplary — for both safety and three behavioral criteria.

The benchmark uses a 3-stage pipeline: seed generation creates short scenario outlines from the risk taxonomy, scenario expansion turns each seed into a detailed test scenario with validation, and test runs execute multi-turn conversations and judge the results.

Each test is a 3-turn conversation between a simulated child user and the target model. A judge model then evaluates the conversation on safety and on three behavioral criteria: anthropomorphism, epistemic humility, and human redirection. Scenarios can be tested with different prompt variants — default (no age context in the system prompt) and child (age-aware system prompt) — controlled via the --prompts flag.

Prerequisites

Node.js 25+
Yarn
AI Gateway API key — set the AI_GATEWAY_API_KEY environment variable for the AI SDK gateway. Copy .env.example to .env and fill in your key.

Getting started

Install dependencies and build:

cp .env.example .env   # then add your API key
yarn && yarn tsbuild

Run the benchmark with pre-built scenarios:

yarn kora run <target-model>

For example, to evaluate gpt-4o:

yarn kora run gpt-4o

Pipeline stages

`generate-seeds`

Generates a set of scenario seeds from the risk taxonomy.

korabench

Molt Pulse

KORA Benchmark

Overview

Prerequisites

Getting started

Pipeline stages

`generate-seeds`

`expand-scenarios`

`run`

Model configuration

Model registry (`models.json`)

Custom models

Evaluating a different model

Risk taxonomy

Interpreting results

Cost and duration

Project structure

Development

License

Ecosystem Role

korabench

Molt Pulse

KORA Benchmark

Overview

Prerequisites

Getting started

Pipeline stages

generate-seeds

expand-scenarios

run

Model configuration

Model registry (models.json)

Custom models

Evaluating a different model

Risk taxonomy

Interpreting results

Cost and duration

Project structure

Development

License

Ecosystem Role

`generate-seeds`

`expand-scenarios`

`run`

Model registry (`models.json`)