
Explore our Agent Evals for Safety

Select an evaluation from the sidebar to view detailed information and test run the agent evaluations we offer.

View model benchmarks

Compare performance across different models

Test eval as a human

Interactive testing environment

Access the eval with our Python API

Integrate evaluations with your own agents

Need evaluations that aren't listed?

We specialize in developing custom evaluations tailored to your needs.