0.0.53
July 29, 2025
- Fixed OpenRouter model with provider options not working
0.0.52
July 29, 2025
- Added basic tool call support
- Added Tool Library page to manage and create tools for evaluation
- Added tools column and tool call column in evaluation table to display tool usage
- Added technical writing category
- Added copy benchmark as markdown functionality
- Various bug fixes and UI/UX improvements
0.0.51
July 23, 2025
- Added average rating and ranking counts in benchmark page for better comparison
- Added rubrics support for experiments to provide more structured evaluation for human evaluators
- Added sort by prompt or response length functionality in eval table
- Added more granular rating options (6.5, 5.5, 9.75, 8.25) for better evaluation precision
- Various bug fixes and UI/UX improvements