Logo16x Eval

Download 16x Eval

Iterate on prompts. Test different models. Find the best combo for your tasks.

Latest version: v0.0.53
Released: (July 29, 2025)

New updates from the latest releases

0.0.53

July 29, 2025

  • Fixed OpenRouter model with provider options not working

0.0.52

July 29, 2025

  • Added basic tool call support
    • Added Tool Library page to manage and create tools for evaluation
    • Added tools column and tool call column in evaluation table to display tool usage
  • Added technical writing category
  • Added copy benchmark as markdown functionality
  • Various bug fixes and UI/UX improvements
Release 0.0.52 - Tool Library page Release 0.0.52 - Tool call support in evaluations

0.0.51

July 23, 2025

  • Added average rating and ranking counts in benchmark page for better comparison
  • Added rubrics support for experiments to provide more structured evaluation for human evaluators
  • Added sort by prompt or response length functionality in eval table
  • Added more granular rating options (6.5, 5.5, 9.75, 8.25) for better evaluation precision
  • Various bug fixes and UI/UX improvements
Release 0.0.51 - Average rating display Release 0.0.51 - Rubrics support for experiments

Download 16x Eval

Build your own evals. End the best-model debate once and for all.