Large Language Model (LLM) Benchmarking Task
Jump to navigation
Jump to search
A Large Language Model (LLM) Benchmarking Task is a AI benchmarking task that assesses the performance of various Large Language Models based on defined criteria.
- Context:
- It can (often) involve comparison in terms of accuracy, efficiency, computational power requirement, among other various factors.
- It can be supported by an LLM Benchmarking System.
- It can be tailored to address the relative strengths and weaknesses of different models in specific applications.
- ...
- Example(s):
- GLUE Benchmarking Task for large language models.
- SQuAD Benchmarking Task for question answering models.
- MMLU (Massive Multitask Language Understanding) Benchmarking Task for ...
- ...
- Counter-Example(s):
- A Machine Learning Model Development Task, which focuses on the development and not the evaluation of models.
- See: Benchmarking Task, Large Language Model, Performance Evaluation.
References
2023
- (Chen, Zaharia and Zou, 2023) ⇒ Lingjiao Chen, Matei Zaharia, and James Zou. (2023). “How is ChatGPT's Behavior Changing over Time?.” In: arXiv preprint arXiv:2307.09009. doi:10.48550/arXiv.2307.09009