Logo16x Eval

Effortlessly evaluate prompts and models

Iterate on prompts and test different models. Find the best fit for your use case.

Your personal workspace for prompt engineering

Manage your prompts, contexts, and models in one place, locally on your machine. Test out different combinations and use cases with a few clicks.

Prompt Evaluation

Model Evaluation

Evaluation Function

Human Rating

Experiment Management

Context Library

BYOK API Integrations

Custom Models

16x Eval screenshot

16x Eval Video Demo

Learn how 16x Eval works in this video demo.

16x Eval demo video
16x Evaluation screenshot

Run Evaluations and Compare Results

Create new evaluations by specifying prompt, context and models. Multiple contexts will be combined together and added to the final prompt.

You can select multiple models to evaluate the same prompt and context in parallel.

Experiments for Different Use Cases

Group evaluations into experiments to compare models and prompts performance for different use cases (coding, writing, question answering, etc).

You can view the top models and prompts for different experiments in the Experiments page.

16x Eval screenshot
16x Eval screenshot

Built-in Models and Custom Models

16x Eval supports built-in models from various providers like OpenAI, Anthropic Claude, Google Gemini, DeepSeek, and OpenRouter.

We also support any other providers that offers OpenAI API compatibility, such as locally running Ollama.