Automated Text Generation (NLG) Task
An Automated Text Generation (NLG) Task is a text generation task that is an automated natural language processing task producing natural language expressions, primarily text items.
- AKA: Automated Writing.
- Context:
- Task Input: input data, generation parameters, linguistic resources
- output: NLG Task Output (Machine Written text).
- measure: NLG Performance Measure, such as Syntactic Correctness, Intelligibility, Fluency, Coherence, Relevance, and Factual Accuracy.
- ...
- It can typically transform Structured Data into Natural Language Text through generation algorithms.
- It can typically implement Language Model through probabilistic approaches.
- It can typically maintain Text Coherence through discourse planning.
- It can typically ensure Linguistic Correctness through grammar rules.
- It can typically capture Domain Knowledge through specialized corpuses.
- ...
- It can often require Text Planning for content organization.
- It can often implement Surface Realization for grammatical expression.
- It can often utilize Contextual Information for relevance enhancement.
- It can often incorporate Stylistic Variation for tone appropriateness.
- It can often employ Evaluation Metrics for quality assessment.
- ...
- It can range from being a Heuristic Language Generation Task to being a Data-Driven Language Generation Task.
- It can range from being a Domain-Specific NLG Task to being an Open-Domain NLG Task.
- It can range from being Word Generation Task, Phrase Generation Task, Sentence Generation Task, Passage Generation Task, Document Generation Task, ...
- It can range from being a Freeform NLG Task (such as chit chat) to being a Topic-based NLG Task.
- It can range from being a Shallow NLG Task to being a Deep NLG Task, depending on the level of linguistic and semantic understanding required.
- It can range from being an Automated Formal Writing Task to being a [[...
- It can range from being a Short-form NLG Task to being a Long-form NLG Task, based on the length and complexity of the generated text.
- It can range from being a Template-Based NLG Task to being a Neural NLG Task, depending on its generation approach.
- It can range from being a Rule-Based NLG Task to being a Statistical NLG Task, depending on its algorithmic foundation.
- It can range from being a Single-Stage NLG Task to being a Pipeline NLG Task, depending on its architectural design.
- It can range from being a Single-Language NLG Task to being a Multilingual NLG Task, depending on its language scope.
- It can range from being a Factual NLG Task to being a Creative NLG Task, depending on its generation purpose.
- It can range from being a Supervised NLG Task to being an Unsupervised NLG Task, depending on its learning paradigm.
- ...
- It can be solved by an Automated Text Generation System (that implements a text generation algorithm).
- It can be supported by a Natural Language Understanding Task.
- It can integrate with Human Feedback Mechanisms for quality improvement.
- It can incorporate Knowledge Graphs for factual accuracy.
- It can leverage Pre-trained Language Models for transfer learning.
- It can employ Reinforcement Learning for targeted optimization.
- ...
- Examples:
- NLG Task Types by Linguistic Depth:
- Shallow NLG Tasks, such as:
- Deep NLG Tasks, such as:
- NLG Task Types by Text Length:
- Short-form NLG Tasks:
- Long-form NLG Tasks:
- Automated Wikitext Generation (wikitext generation), such as Automated Wikipedia Page Creation.
- Automated Essay Writing for academic content.
- Automated Report Generation for business documentation.
- Narrative Text Generation for story creation.
- Technical Documentation Generation for instructional content.
- NLG Task Types by Domain Specificity:
- Domain-Specific NLG Tasks, such as:
- Open-Domain NLG Tasks, such as:
- NLG Task Types by Linguistic Unit:
- Word Generation Tasks, such as:
- Phrase Generation Tasks, such as:
- Sentence Generation Tasks, such as:
- Passage Generation Tasks, such as:
- Document Generation Tasks, such as:
- NLG Task Types by Topical Constraint:
- Freeform NLG Tasks, such as:
- Automated Domain-Specific NLG, such as:
- Constrainted NLG Task, such as:
- Generate Text(length={200}, subject='history', vocabulary='advanced', tone='formal', structure='intro, body, conclusion', deadline='2023-12-31', sentiments='neutral', audience='adults') => "Introduction about the subject of history. ...."
- Guided Story Generation for controlled narrative.
- Structured Document Generation for formatted content.
- Tone-Specific Content Generation for stylistic adherence.
- NLG Task Types by Application:
- Writing Assistance Tasks, such as:
- Data-to-Text Generation Task, such as:
- Translation-Based Tasks, such as:
- NLG Task Types by Learning Approach:
- Supervised NLG Tasks, such as:
- Unsupervised NLG Tasks, such as:
- ...
- NLG Task Types by Linguistic Depth:
- Counter-Examples:
- an Automated Natural Language Understanding (NLU) Task, which focuses on text interpretation rather than text generation.
- Human-Performed Language Generation Task, such as Human-Performed Writing, which relies on human cognition rather than automated systems.
- Automated Software Programming, which produces executable code rather than natural language.
- Automated Speech Generation, which creates audio output rather than textual content.
- Automated Image Generation, which produces visual content rather than textual content.
- Text Annotation Task, which enhances existing text rather than generating new text.
- Text Classification Task, which categorizes text samples rather than creating content.
- See: Content Planning, Document Structuring, Lexical Choice, Narrative Generation, Pragmatic Analysis, Semantic Analysis, Surface Realization, Text Planning, Text Structuring, Language Model, Natural Language Processing, Computational Linguistics, Text Generation Algorithm, Neural Text Generation, Machine Learning for NLP.
References
2021
- (Wikipedia, 2021) ⇒ https://en.wikipedia.org/wiki/Natural-language_generation Retrieved:2021-2-20.
- Natural-language generation (NLG) is a software process that transforms structured data into natural language. It can be used to produce long form content for organizations to automate custom reports, as well as produce custom content for a web or mobile application. It can also be used to generate short blurbs of text in interactive conversations (a chatbot) which might even be read out by a text-to-speech system.
Automated NLG can be compared to the process humans use when they turn ideas into writing or speech. Psycholinguists prefer the term language production for this process, which can also be described in mathematical terms, or modeled in a computer for psychological research. NLG systems can also be compared to translators of artificial computer languages, such as decompilers or transpilers, which also produce human-readable code generated from an intermediate representation. Human languages tend to be considerably more complex and allow for much more ambiguity and variety of expression than programming languages, which makes NLG more challenging.
NLG may be viewed as the opposite of natural-language understanding (NLU): whereas in natural-language understanding, the system needs to disambiguate the input sentence to produce the machine representation language, in NLG the system needs to make decisions about how to put a concept into words. The practical considerations in building NLU vs. NLG systems are not symmetrical. NLU needs to deal with ambiguous or erroneous user input, whereas the ideas the system wants to express through NLG are generally known precisely. NLG needs to choose a specific, self-consistent textual representation from many potential representations, whereas NLU generally tries to produce a single, normalized representation of the idea expressed.[1]
NLG has existed since ELIZA was developed in the mid 1960s, but commercial NLG technology has only recentlybecome widely available. NLG techniques range from simple template-based systems like a mail merge that generates form letters, to systems that have a complex understanding of human grammar. NLG can also be accomplished by training a statistical model using machine learning, typically on a large corpus of human-written texts.
- Natural-language generation (NLG) is a software process that transforms structured data into natural language. It can be used to produce long form content for organizations to automate custom reports, as well as produce custom content for a web or mobile application. It can also be used to generate short blurbs of text in interactive conversations (a chatbot) which might even be read out by a text-to-speech system.
- ↑ Dale, Robert; Reiter, Ehud (2000). Building natural language generation systems. Cambridge, U.K.: Cambridge University Press. ISBN 978-0-521-02451-8.
2018
- (Lee et al., 2018) ⇒ Chris van der Lee, Emiel Krahmer, and Sander Wubben. (2018). “Automated Learning of Templates for Data-to-text Generation: Comparing Rule-based, Statistical and Neural Methods.” In: Proceedings of the 11th International Conference on Natural Language Generation (INLG 2018). DOI:http://dx.doi.org/10.18653/v1/W18-6504
- (Song et al., 2018) ⇒ Linfeng Song, Yue Zhang, Zhiguo Wang, and Daniel Gildea. (2018). “A Graph-to-Sequence Model for AMR-to-Text Generation.” In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL 2018) Volume 1: Long Papers. DOI:10.18653/v1/P18-1150
- (Guo et al., 2018) ⇒ Jiaxian Guo, Sidi Lu, Han Cai, Weinan Zhang, Yong Yu, and Jun Wang. (2018). “Long Text Generation via Adversarial Training with Leaked Information.” In: Proceedings of the Thirty-Second (AAAI) Conference on Artificial Intelligence (AAAI-18), the 30th innovative Applications of Artificial Intelligence (IAAI-18), and the 8th (AAAI) Symposium on Educational Advances in Artificial Intelligence (EAAI-18).
- (Fedus et al., 2018) ⇒ William Fedus, Ian Goodfellow, and Andrew M Dai. (2018). “MaskGAN: Better Text Generation via Filling in the ________". In: Proceedings of the Sixth International Conference on Learning Representations (ICLR-2018).
- (Clark et al., 2018) ⇒ Elizabeth Clark, Yangfeng Ji, and Noah A. Smith. (2018). “Neural Text Generation in Stories Using Entity Representations As Context.” In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), Volume 1 (Long Papers). DOI:10.18653/v1/N18-1204.
- (Kudo & Richardson, 2018) ⇒ Taku Kudo, and John Richardson. (2018). “SentencePiece: A Simple and Language Independent Subword Tokenizer and Detokenizer for Neural Text Processing.” In: arXiv preprint arXiv:1808.06226.
- (Zhu et al., 2018) ⇒ Yaoming Zhu, Sidi Lu, Lei Zheng, Jiaxian Guo, Weinan Zhang, Jun Wang, and Yong Yu. (2018). “Texygen: A Benchmarking Platform for Text Generation Models.” In: Proceedings of The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval (SIGIR 2018). DOI:10.1145/3209978.3210080.
2017
- (Semeniuta et al., 2017) ⇒ Stanislau Semeniuta, Aliaksei Severyn, and Erhardt Barth. (2017). “A Hybrid Convolutional Variational Autoencoder for Text Generation.” In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (EMNLP 2017). DOI:10.18653/v1/D17-1066.
- (Zhang et al., 2017) ⇒ Yizhe Zhang, Zhe Gan, Kai Fan, Zhi Chen, Ricardo Henao, Dinghan Shen, and Lawrence Carin. (2017). “Adversarial Feature Matching for Text Generation". In: Proceedings of the 34th International Conference on Machine Learning (ICML 2017).
- (Li et al., 2017) ⇒ Jiwei Li, Will Monroe, Tianlin Shi, Sebastien Jean, Alan Ritter, and Dan Jurafsky. (2017). “Adversarial Learning for Neural Dialogue Generation.” In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (EMNLP 2017). DOI:10.18653/v1/D17-1230.
- (Lin, Li, et al., 2017) ⇒ Kevin Lin, Dianqi Li, Xiaodong He, Ming-ting Sun, and Zhengyou Zhang. (2017). “Adversarial Ranking for Language Generation.” In: Proceedings of Advances in Neural Information Processing Systems 30 (NIPS-2017).
- (Che et al., 2017) ⇒ Tong Che, Yanran Li, Ruixiang Zhang, R. Devon Hjelm, Wenjie Li, Yangqiu Song, and Yoshua Bengio. (2017). “Maximum-Likelihood Augmented Discrete Generative Adversarial Networks.” In: ArXiv Preprint: 1702.07983.
- (Yu et al., 2017a) ⇒ Lantao Yu, Weinan Zhang, Jun Wang, and Yong Yu. (2017). “SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient.” In: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI 2017).
2017h
- https://github.com/pytorch/examples/tree/master/word_language_model
- QUOTE: This example trains a multi-layer RNN (Elman, GRU, or LSTM) on a language modeling task. By default, the training script uses the WikiText-2 dataset, provided. The trained model can then be used by the generate script to generate new text.
2016
- (Kusner & Hernndez-Lobato, 2016) ⇒ Matt J. Kusner, and Jose Miguel Hernndez-Lobato. (2016). “GANS for Sequences of Discrete Elements with the Gumbel-softmax Distribution". In: arXiv:1611.04051.
2015a
- (Bahdanau et al., 2015) ⇒ Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. (2015). “Neural Machine Translation by Jointly Learning to Align and Translate.” In: Proceedings of the Third International Conference on Learning Representations, (ICLR-2015).