LLM-based Natural Language Processing (NLP) System

From GM-RKB
(Redirected from LLM-based NLP system)
Jump to navigation Jump to search

An LLM-based Natural Language Processing (NLP) System is an NLP system that is an LLM-based system (which critically relies on a large language model).



References

2023

  • chat
    • Q: Create a wiki table with 3 columns: "NLP Task Type", "Specific Benchmark Tasks", and "Pure LLM Solution?". In the first column, list 14 NLP task types. In the second column, list some specific benchmark tasks typically used to evaluate performance on each task type. In the third column, indicate whether a pure LLM can typically solve the task or if additional training and optimization may be required. Mark up each technical term or noun phrase using wiki link double square bracket annotation. Reply in a formatted text box.
    • A:
NLP Task Type Specific Benchmark Tasks Pure LLM Solution?
Named Entity Recognition (NER) CoNLL-2003 NER, OntoNotes 5.0 No, additional training and optimization required
Sentiment Analysis Stanford Sentiment Treebank (SST), IMDb movie review dataset No, additional training and optimization required
Machine Translation WMT (Workshop on Machine Translation), IWSLT (International Workshop on Spoken Language Translation) No, additional training and optimization required
Text Summarization CNN/Daily Mail dataset, Gigaword dataset No, additional training and optimization required
Question Answering SQuAD (Stanford Question Answering Dataset), Natural Questions (NQ) No, additional training and optimization required
Part-of-Speech Tagging Penn Treebank dataset, Universal Dependencies (UD) dataset No, additional training and optimization required
Text Classification AG News, Yelp Review dataset No, additional training and optimization required
Coreference Resolution CoNLL-2012 Shared Task, GAP (Gendered Ambiguous Pronouns) No, additional training and optimization required
Semantic Parsing GeoQuery, WikiSQL dataset No, additional training and optimization required
Relation Extraction SemEval Task 8, TACRED (TAC Relation Extraction Dataset) No, additional training and optimization required
Dependency Parsing Penn Treebank dataset, Universal Dependencies (UD) dataset No, additional training and optimization required
Topic Modeling 20 Newsgroups dataset, Reuters-21578 dataset No, additional training and optimization required
Language Modeling Penn Treebank dataset, WikiText-2, WikiText-103 No, additional training and optimization required
Paraphrase Identification Microsoft Research Paraphrase Corpus (MRPC), Quora Question Pairs dataset No, additional training and optimization required

2022