0.0.71
September 10, 2025
- Added support for background evaluation to run evaluations in the background without blocking the UI.
- Added support for custom model cost configuration to track costs for custom models and OpenRouter models.
- Added support for OpenRouter reasoning effort parameter for compatible models.
- Various bug fixes and UI/UX improvements.
0.0.70
September 5, 2025
- Improved reasoning effort display on evaluation page and benchmark page.
- Included reasoning effort when copying evaluation or benchmark as markdown.
0.0.69
September 4, 2025
- Added support for user-defined JavaScript functions as evaluation functions.
- You can now create your own evaluation functions in JavaScript to evaluate model responses, including tool calls.
- Enhanced tool calls display to be more readable.
- Various bug fixes and UI/UX improvements