AI safety · Independent evaluation

Detecting dishonesty in AI systems.

Tara Research advances AI safety through rigorous research and the development of evaluation infrastructure. We measure the propensity of AI systems to lie, and we benchmark the effectiveness of safety techniques designed to prevent it — releasing code, leaderboards, and scientific results openly.

Misalignment becomes catastrophic when models hide it. A misaligned model that is honest about its goals can be detected and corrected; one that is dishonest cannot. Robust detection and prevention of dishonesty is therefore foundational to every other oversight mechanism. The success of safety measures such as monitoring and auditing depends on the AI system not strategically misleading its evaluators. Tara exists to provide the field with an impartial, public measure of where we stand and which interventions actually help.

Research

Two parallel tracks, one shared infrastructure.

Tara Honesty Benchmark

A multi-turn benchmark that measures whether AI systems lie to the user about their own actions. No instruction to lie, on-policy behavior.

Tara Methods Leaderboard

A maintained, head-to-head comparison of safety techniques — prompting, steering, fine-tuning, classifiers, SAE interventions — evaluated under one standard protocol, on one frontier-model panel.

Approach

Open by default, impartial by design.

We release scenarios, code, and results publicly so other researchers can reproduce, extend, and challenge them. We hold no stake in any specific mitigation technique or in any AI lab; our role is to evaluate the field, not to compete in it.

Tara Research is a Dutch non-profit foundation (stichting) incorporated in Rotterdam, with a U.S. public-charity equivalency determination from NGOsource. The team operates across the Netherlands and Montreal.