Logo16x Eval

Release Notes

Release notes for 16x Eval. Latest features and improvements.

0.0.46

June 17, 2025

  • Added evaluation comparison page to compare the results of two evaluations side by side
Release 0.0.46 - Evaluation comparison page

0.0.45

June 8, 2025

  • Added support for Gemini 2.5 Pro Preview (06-05)
  • Optimized the UX for adding OpenRouter models for new users

0.0.44

June 1, 2025

  • Moved benchmark to a separate dedicated page with option to select models
  • Added sorting options for experiments page
  • Added temperature as an advanced setting
  • Merged rating and notes modal for better UX
  • Various UI/UX improvements
Release 0.0.44 - Benchmark page

0.0.43

May 26, 2025

  • Added a new table view for experiments with improved layout and 3-column display on large screens
  • Added experiment linking with evaluation functions to automatically enable the linked evaluation functions
  • Added writing statistics color-coded ranges for words per sentence and words per paragraph
  • Added response token count display in statistics
  • Added line wrapping option for better text readability
  • Added copy as markdown feature
  • Various UI/UX and performance improvements
Release 0.0.43 - Table view for experiments

0.0.42

May 23, 2025

  • Added support for Claude Sonnet 4 and Claude Opus 4 models

In our sample evaluation for coding and writing tasks, Claude Opus 4 absolutely dominated other models in both coding and writing tasks. It is the best performing model for all 4 tasks given.

Claude Sonnet 4 is also very impressive, coming in top 1 or top 2 in all tasks, beating almost all other models.

Release 0.0.42 - Claude 4 Opus Release 0.0.42 - Claude 4 Sonnet

0.0.41

May 22, 2025

  • Added archive functionality for experiments, prompts, and contexts to avoid cluttering the list
  • Added category grouping for evaluation functions and experiments in selection modal
  • Added category icons and ensured category filter persist across pages
  • Improved image context handling and storage
  • Fixed dark mode colors
  • Various UI/UX improvements
Release 0.0.41 - Category grouping Release 0.0.41 - Dark mode color fixes Release 0.0.41 - Archive experiments, prompts, and contexts

0.0.39

May 21, 2025

  • Added reasoning token count from OpenRouter provider

0.0.38

May 20, 2025

  • Added search functionality across the app (experiments, prompts, contexts)
  • Added green and red highlights for target and penalty strings from evaluation functions in model responses
  • Added drag and drop support for re-ordering contexts
  • Added created at timestamps for evaluation functions and experiments
  • Added new options (8.75 and 9.25) for more granular rating
  • Replaced 3rd-party LLM library with our own send-prompt library to support more features and improve stability
  • Various UI/UX improvements and bug fixes
Release 0.0.38 - Search functionality Release 0.0.38 - Target and penalty string highlights

0.0.37

May 15, 2025

  • Added support for system prompts
  • Added advanced settings for more configuration options
  • Added categories for prompts and contexts
  • Various UI/UX improvements
Release 0.0.37 - System prompts Release 0.0.37 - Prompt and Context Categories

0.0.35

May 9, 2025

  • Added ability to duplicate and export single evaluation function
  • Added support for penalizing occurrences of strings in evaluation functions
  • Added ability to copy evaluation as markdown
  • Added sorting by speed
  • Various UI/UX improvements and bug fixes

0.0.32

May 8, 2025

  • Various UI/UX improvements and bug fixes

0.0.31

May 7, 2025

  • Added notes feature for evaluations with improved UX
  • Added ability to cancel running evaluations
  • Added provider logos and icons for models
  • Added sorting by creation time
  • Improved UI/UX for various pages
  • Various code refactoring for better maintainability
Release 0.0.31 - Notes feature

0.0.30

May 1, 2025

  • Added performance metrics including speed, writing statistics, and reasoning response
  • Added ability to sort evaluations by rating or model name
  • Added model highlighting feature in experiments page
  • Increased timeout limit to 10 minutes
  • Improved experiment page UI/UX
  • Fixed image import and export functionality
Release 0.0.30 - Performance metrics and sorting Release 0.0.30 - Experiment page

0.0.29

April 28, 2025

  • Added support for Azure OpenAI models
  • Added experiment categories to help organize experiments
  • Added auto-fill for custom model API settings (Fireworks for now)
  • Fixed bugs with sending prompts to custom models (You need to delete the existing custom models and create a new ones)
  • Fixed temperature setting bug for OpenAI reasoning models
  • Various UI/UX improvements
Release 0.0.29 - Azure OpenAI models

0.0.28

April 26, 2025

  • Added support for creation time and token stats columns in the evaluation page
  • Improve export and import of evaluations containing images
  • Major code refactoring to support more features
Release 0.0.28 - Image context and token stats

0.0.26

April 26, 2025

  • Bug fixes related to app update (surprisingly, it's very tricky to get right)

0.0.24

April 25, 2025

  • Added support image context. You can now send images as context to the models that support it.
  • Added ability to import evaluations from a JSON file that was exported previously
  • Added a link to release notes in the settings page
  • Improve the UI/UX of app update
  • Fixed various bugs
Release 0.0.24 - Image context

0.0.19

April 23, 2025

  • Fixed a bug where API keys cannot be pasted into the API key input field
  • Fixed a bug where importing multiple files from the file system would fail
  • Added built-in exclusion filter for large files (>1MB) and binary files
    • Image file support is coming soon

0.0.16

April 23, 2025

  • Added dedicated page for managing evaluation functions
  • Added ability to check for updates and install updates on Settings page
  • UI/UX improvements
Release 0.0.16 - Evaluation functions page

0.0.10

April 21, 2025

  • Customizable columns for evaluation page
Release 0.0.10 - Customizable columns

0.0.9

April 20, 2025

  • Run evaluation on multiple models
  • Organize evaluations into experiments
  • Prompt library and context library
  • Built-in models and custom models

Download 16x Eval

Join AI builders and power users in running your own evaluations