SuperGLUE Benchmark
Jump to navigation
Jump to search
A SuperGLUE Benchmark is a NLU Benchmark that extends the GLUE Benchmark.
- Context:
- It can be available at:
https://super.gluebenchmark.com/
- ...
- It can be available at:
- Example(s):
- the benchmark task performed by Wang et al. (2019),
- …
- Counter-Example(s):
- See: Natural Language Inference System, Lexical Entailment, Syntactic Parser, Morphological Analyzer, Word Sense Disambiguation, Lexical Semantic Relatedness, Logical Inference.
References
2022
- (Liang, Bommasani et al., 2022) ⇒ “Holistic Evaluation of Language Models.” doi:10.48550/arXiv.2211.09110
- QUOTE: ... As more general-purpose approaches to NLP grew, often displacing more bespoke task-specific approaches, new benchmarks such as SentEval (Conneau and Kiela, 2018), DecaNLP (McCann et al., 2018), GLUE (Wang et al., 2019b), and SuperGLUE (Wang et al., 2019a) co-evolved to evaluate their capabilities. In contrast to the previous class of benchmarks, these benchmarks assign each model a vector of scores to measure the accuracy for a suite of scenarios. In some cases, these benchmarks also provide an aggregate score (e.g. the GLUE score, which is the average of the accuracies for each of the constituent scenarios). ...
2019a
- (Wang, Pruksachatkun et al., 2019) ⇒ Alex Wang, Yada Pruksachatkun, Nikita Nangia, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel R. Bowman. (2019). “SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems.” In: Proceedings of the 33rd Conference on Neural Information Processing Systems (NeurIPS 2019). arXiv:1905.00537
- QUOTE: ... In this paper we present SuperGLUE, a new benchmark styled after GLUE with a new set of more difficult language understanding tasks, a software toolkit, and a public leaderboard. ...
2019b
- (Superglue Benchmark, 2019) ⇒ https://super.gluebenchmark.com/ Retrieved:2019-09-15
- QUOTE: In the last year, new models and methods for pretraining and transfer learning have driven striking performance improvements across a range of language understanding tasks. The GLUE benchmark, introduced one year ago, offered a single-number metric that summarizes progress on a diverse set of such tasks, but performance on the benchmark has recently come close to the level of non-expert humans, suggesting limited headroom for further research.
We take into account the lessons learnt from original GLUE benchmark and present SuperGLUE, a new benchmark styled after GLUE with a new set of more difficult language understanding tasks, improved resources, and a new public leaderboard.
- QUOTE: In the last year, new models and methods for pretraining and transfer learning have driven striking performance improvements across a range of language understanding tasks. The GLUE benchmark, introduced one year ago, offered a single-number metric that summarizes progress on a diverse set of such tasks, but performance on the benchmark has recently come close to the level of non-expert humans, suggesting limited headroom for further research.