Intrinsic Natural Language Generation (NLG) Performance Measure

An Intrinsic Natural Language Generation (NLG) Performance Measure is a NLG measure that is an intrinsic measure (it evaluates generated text quality independently of its impact on real-world systems).

Context:
- It can focus on aspects such as Grammatical Correctness, Lexical Richness, Text Coherence, and Style Consistency.
- It can be an input to an NLG Evaluation Task (applied to various NLG systems).
- It can range from being a Human-Performed Intrinsic NLG Measure to being an Automated Intrinsic NLG Measure.
- It can range from being a Syntax-based NLG Performance Measure to being a Semantics-based NLG Performance Measure.
- …
Example(s):
- an Automated Intrinsic NLG Measure, such as:
  - A ROUGE Measure, used in evaluating the quality of summaries generated by a Text Summarization System.
  - A BLEU Measure, although primarily used for Machine Translation, can be adapted to evaluate the quality of other types of Generated Text.
- A Human-Led Evaluation, where a panel of experts or users rates the Generated Text on metrics like Fluency, Coherence, and Relevance.
- A Lexical Diversity Measure, assessing the variety and richness of vocabulary used in the Generated Text.
- A Syntax-based NLG Performance Measure (such as grammatical correctness).
- …
Counter-Example(s):
- An Extrinsic NLG Performance Measure which focuses on the effectiveness of the generated text in achieving a specific task.
- A Search Engine Optimization (SEO) Performance Measure, which evaluates text based on its effectiveness in ranking on search engines.
- A User Engagement Metric in Social Media Analytics, which measures the impact of generated content on user interaction and engagement.
See: ROUGE, BLEU, METEOR, Human Evaluation of Text, Automated Text Evaluation, NLG System.

References

2011

(Crossley & McNamara, 2011) ⇒ Scott A. Crossley, and Danielle S. McNamara. (2011). “Understanding Expert Ratings of Essay Quality: Coh-Metrix Analyses of First and Second Language Writing.” International Journal of Continuing Engineering Education and Life Long Learning, 21(2-3).
- ABSTRACT: This article reviews recent studies in which human judgements of essay quality are assessed using Coh-Metrix, an automated text analysis tool. The goal of these studies is to better understand the relationship between linguistic features of essays and human judgements of writing quality. Coh-Metrix reports on a wide range of linguistic features, affording analyses of writing at various levels of text structure, including surface, text-base, and situation model levels. Recent studies have examined linguistic features of essay quality related to co-reference, connectives, syntactic complexity, lexical diversity, spatiality, temporality, and lexical characteristics. These studies have analysed essays written by both first language and second language writers. The results support the notion that human judgements of essay quality are best predicted by linguistic indices that correlate with measures of language sophistication such as lexical diversity, word frequency, and syntactic complexity. In contrast, human judgements of essay quality are not strongly predicted by linguistic indices related to cohesion. Overall, the studies portray high quality writing as containing more complex language that may not facilitate text comprehension.

Intrinsic Natural Language Generation (NLG) Performance Measure

References

2011

Navigation menu

Search