OpenAI Evals Framework

From GM-RKB
(Redirected from OpenAI Evals)
Jump to navigation Jump to search

A OpenAI Evals Framework is an LLM-based system evaluation framework that is an OpenAI project.

  • Context:
    • It can incorporate LLMs as components.
    • It can include an open-source registry of challenging evals designed to assess system behavior through the Completion Function Protocol.
    • It can aim to facilitate building evals with minimal coding.
    • It can support the evaluation of various system behaviors, including prompt chains or tool-using agents.
    • It can use Git-LFS to download and manage evals from its registry.
    • It can offer guidelines for writing your own completion functions and submitting model-graded evals with custom YAML files.
    • It can provide options for installing formatters for pre-committing and running evals locally via pip.
    • ...
  • Example(s):
  • Counter-Example(s):
  • See: Large Language Model Evaluation, AI Model Evaluation, Git-LFS.


References

2023

  • (OpenAI, 2023) ⇒ OpenAI. (2023). “OpenAI Evals: A Framework for Evaluating LLMs." In: GitHub Repository. [2]
    • QUOTE: Evals is a framework for evaluating LLMs (large language models) or systems built using LLMs as components. It also includes an open-source registry of challenging evals... With Evals, we aim to make it as simple as possible to build an eval while writing as little code as possible. An "eval" is a task used to evaluate the quality of a system's behavior... To get set up with evals, follow the setup instructions below. You can also run and create evals using Weights & Biases... To run evals, you will need to set up and specify your OpenAI API key...