LLM Benchmarking System
Jump to navigation
Jump to search
An LLM Benchmarking System is a ML benchmarking system for LLMs.
- Context:
- It can (typically) evaluate Large Language Models across a variety of tasks to determine their performance and capabilities.
- ...
- Example(s):
- HELM LLM Benchmarking Framework, which evaluates LLMs across 42 different scenarios using multiple metrics.
- ...
- Counter-Example(s):
- ...
- See: HELM, GLUE, SuperGLUE, BIG-bench, Benchmarking.