OpenAI Evals Framework
(Redirected from OpenAI Evals)
Jump to navigation
Jump to search
A OpenAI Evals Framework is an LLM-based system evaluation framework that is an OpenAI project.
- Context:
- It can incorporate LLMs as components.
- It can include an open-source registry of challenging evals designed to assess system behavior through the Completion Function Protocol.
- It can aim to facilitate building evals with minimal coding.
- It can support the evaluation of various system behaviors, including prompt chains or tool-using agents.
- It can use Git-LFS to download and manage evals from its registry.
- It can offer guidelines for writing your own completion functions and submitting model-graded evals with custom YAML files.
- It can provide options for installing formatters for pre-committing and running evals locally via pip.
- ...
- Example(s):
- Counter-Example(s):
- See: Large Language Model Evaluation, AI Model Evaluation, Git-LFS.
References
2023
- (OpenAI, 2023) ⇒ OpenAI. (2023). “OpenAI Evals: A Framework for Evaluating LLMs." In: GitHub Repository. [2]
- QUOTE: Evals is a framework for evaluating LLMs (large language models) or systems built using LLMs as components. It also includes an open-source registry of challenging evals... With Evals, we aim to make it as simple as possible to build an eval while writing as little code as possible. An "eval" is a task used to evaluate the quality of a system's behavior... To get set up with evals, follow the setup instructions below. You can also run and create evals using Weights & Biases... To run evals, you will need to set up and specify your OpenAI API key...