Automated Text Understanding (NLU) Task

An Automated Text Understanding (NLU) Task is a text understanding task/ text comprehension task that is an automated text processing task.

Context:
- Task Input: a Digital Text Item.
- Task Output: a Semantic Representation.
  - optional: a Reading Comprehension Goal, (e.g. who are the characters in “War and Peace").
- performance measures: Time, Recall, Precision, Comprehension Level (reading comprehension measure).
  - such as: Text Intent Classification; Named Entity Recognition Measure; Entity Mention Disambiguation.
- It can be solved by an Automated Text Understanding System (that implements a text understanding algorithm).
- It can range from being a Shallow NLU Task (e.g. text intent understanding) to being a Deep NLU Task (Machine Reading / Machine Comprehension of Text (MCT)).
- It can range from being a Language-specific NLU Task (English Understanding Task, Mandarin Understanding Task, ...) to being a Language-independent NLU Task.
- It can range from being a General Text Understanding Task to being a Domain-Specific Text Understanding Task.
- It can range from being a Single Utterance NL Understanding to being a Multiple Utterance NL Utterance Discourse Interpretation.
- It can range from being a Short Text Understanding Task to being a Long Text Understanding Task.
- It can be an Information Seeking Task, such as a Strategic Reading Task.
- It can be instantiated in an Automated Reading Act.
- It can be preceded by or include a Text Preprocessing Task, such as Tokenization, Part-of-Speech Tagging, or Syntactic Parsing.
- It can be supported by a Semantic Parsing Task.
- It can be supported by a Semantic Natural Language Processing Task (such as semantic parsing).
- It can support a Natural Language Interaction Task.
- It can be associated to a Reading Comprehension Assessment Task.
- ...
Example(s):
- NLU Task Types by Complexity:
  - Shallow NLU Tasks, such as: Text Intent Classification, Named Entity Recognition.
  - Deep NLU Tasks, such as: Machine Reading Comprehension, Textual Entailment Recognition.
- NLU Task Types by Input Length:
  - Short Text Understanding Tasks:
    - Tweet Sentiment Analysis.
    - SMS Spam Detection.
  - Long Text Understanding Tasks:
    - Document-level Sentiment Analysis.
    - Legal Contract Review.
- NLU Task Types by Domain:
  - Domain-Specific NLU Tasks, such as: Biomedical Named Entity Recognition, Legal Text Classification.
  - General NLU Tasks, such as: Open-domain Question Answering, General Sentiment Analysis.
- NLU Task Types by Linguistic Unit:
  - Word-level NLU Tasks, such as: Word Sense Disambiguation.
  - Sentence-level NLU Tasks, such as: Sentence Textual Similarity.
  - Paragraph-level NLU Tasks, such as: Paragraph Coherence Evaluation.
  - Document-level NLU Tasks, such as: Document Summarization Evaluation.
- NLU Benchmark Tasks:
  - GLUE Benchmark Tasks, such as: MNLI, QNLI, RTE, SST-2, MRPC, QQP.
  - SuperGLUE Benchmark Tasks, such as: BoolQ, CB, COPA, MultiRC, ReCoRD, RTE, WiC, WSC.
- Reading Comprehension Tasks:
  - Extractive Question Answering, such as: SQuAD, NewsQA.
  - Multiple-Choice Question Answering, such as: RACE, MCTest.
- Information Extraction Tasks:
  - Relation Extraction, such as: TACRED, SemEval-2010 Task 8.
  - Event Extraction, such as: ACE 2005, TAC KBP Event Nugget Detection.
- ...
Counter-Example(s):
See: Ambiguity Resolution, Contextual Query Language, Knowledge Representation, Language Modeling, NLP Application, Question Answering System, Semantic Web Technology, Syntax-Driven Semantic Parsing, Text Analytics.

References

2017

(Seo et al., 2017) ⇒ Minjoon Seo, Aniruddha Kembhavi, Ali Farhadi, and Hannaneh Hajishirzi. (2017). “Bidirectional Attention Flow for Machine Comprehension.” In: Proceedings of ICLR 2017.
- QUOTE: Machine comprehension (MC), answering a query about a given context paragraph, requires modeling complex interactions between the context and the query. …
  … The tasks of machine comprehension (MC) and question answering (QA) have gained significant popularity over the past few years within the natural language processing and computer vision communities.

2013

(Burges, 2013) ⇒ Christopher J.C. Burges. (2013). “Towards the Machine Comprehension of Text: An Essay.” In: Microsoft Research Technical Report MSR-TR-2013-125 Journal.
- QUOTE: The Machine Comprehension of Text (MCT)1 has been a central goal of Artificial Intelligence for over fifty years. How does one even define “machine comprehension”? Researchers often invoke the Turing test to this end (a machine attains human level intelligence if its responses in a dialog with a human are indistinguishable from those of another human (Turing, 1950), but as Levesque (2013) recently pointed out, this definition has resulted in workers focusing on the wrong task, namely, fooling humans, rather than achieving machine intelligence. But even if researchers could be persuaded to focus on the AI part of the Turing test, the test is still a false goal, in the sense that the typical user would be happy to know that she is having a dialog with a machine if this were a result of her knowing that no human could possibly be that smart. Perhaps shoehorning the research to meet the goal of appearing human-like is a red herring. Levesque also suggests multiple choice tests that require world knowledge (for example, to solve the anaphora problem) as a suitable replacement for the Turing test.

2017b

https://futureoflife.org/wp-content/uploads/2017/01/Yoshua-Bengio.pdf
- QUOTE: What’s Missing (to achieve AGI) … Actually understanding language (also solves generating), requiring enough world knowledge / commonsense

2016

(Liang, 2016) ⇒ Percy Liang. (2016). “Learning Executable Semantic Parsers for Natural Language Understanding.” In: Communications of the ACM Journal, 59(9). doi:10.1145/2866568
- QUOTE: A long-standing goal of artificial intelligence (AI) is to build systems capable of understanding natural language.

2015

(Zhang & LeCun, 2015) ⇒ Xiang Zhang, and Yann LeCun. (2015). “Text Understanding from Scratch.” In: arXiv:1502.01710 Journal.
- QUOTE: Text understanding consists in reading texts formed in natural languages, determining the explicit or implicit meaning of each elements such as words, phrases, sentences and paragraphs, and making inferences about the implicit or explicit properties of these texts (Norvig, 1987). This problem has been traditionally difficult because of the extreme variability in language formation (Linell, 1982). To date, most ways to handle text understanding, be it a handcrafted parsing program or a statistically learnt model, have been resorted to the means of matching words statistics.

2013

(Waltinger et al., 2013) ⇒ Ulli Waltinger, Dan Tecuci, Mihaela Olteanu, Vlad Mocanu, and Sean Sullivan. (2013). “USI Answers: Natural Language Question Answering Over (Semi-) Structured Industry Data.” In: Twenty-Fifth IAAI Conference.
- QUOTE: Natural Language Understanding (NLU) has long been a goal of AI. Considered an AI-complete task, it consists of mapping natural language sentence into a complete, unambiguous, formal meaning representation expressed in a formal language which supports other tasks such as automated reasoning, or question answering.
  Natural Language access to databases (NLIDB) is a NLU task where the target language is a structured query language (e.g. SQL).

2010

(Poon et al., 2010) ⇒ Hoifung Poon, Janara Christensen, Pedro Domingos, Oren Etzioni, Raphael Hoffmann, Chloe Kiddon, Thomas Lin, Xiao Ling, Mausam, Alan Ritter, Stefan Schoenmackers, Stephen Soderland, DanWeld, FeiWu, Congle Zhang. (2010). “Machine Reading at the University of Washington.” In: Proceedings of the 24th Conference on Artificial Intelligence (AAAI 2010).
- Machine reading, or learning by reading, aims to extract knowledge automatically from unstructured text and apply the extracted knowledge to end tasks such as decision making and question answering.
- Ideally, a machine reading system should strive to satisfy the following desiderata:
  - End-to-end: the system should input raw text, extract knowledge, and be able to answer questions and support other end tasks;
  - High quality: the system should extract knowledge with high accuracy;
  - Large-scale: the system should acquire knowledge at Web-scale and be open to arbitrary domains, genres, and languages;
  - Maximally autonomous: the system should incur minimal human effort;
  - Continuous learning from experience: the system should constantly integrate new information sources (e.g., new text documents) and learn from user questions and feedback (e.g., via performing end tasks) to continuously improve its performance.

2009

(Jin et al., 2009) ⇒ Wei Jin, Hung Hay Ho, and Rohini K Srihari. (2009). “OpinionMiner: A Novel Machine Learning System for Web Opinion Mining and Extraction.” In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2009). doi:10.1145/1557019.1557148
- QUOTE: Unfortunately, reading through all customer reviews is difficult, especially for popular items, the number of reviews can be up to hundreds or even thousands.

2006

(Etzioni et al., 2006) ⇒ Oren Etzioni, Michele Banko, and Michael J. Cafarella. (2006). “Machine Reading.” In: Proceedings of the 21st AAAI Conference (AAAI 2006).
- QUOTE: The time is ripe for the AI community to set its sights on Machine Reading - the automatic, unsupervised understanding of text. … By “understanding text” I mean the formation of a coherent set of beliefs based on a textual corpus and a background theory.