2019 EncodeTagRealizeHighPrecisionTe

From GM-RKB
Jump to navigation Jump to search

Subject Headings: LaserTagger; Sequence Tagging System; Text Generation Task; Text Editing Task; BERT-Encoder Network; Sentence Fusion Task; Sentence Splitting Task; Abstractive Summarization Task; Grammar Error Correction Task.

Notes

Cited By

Quotes

Abstract

We propose LaserTagger - a sequence tagging approach that casts text generation as a text editing task. Target texts are reconstructed from the inputs using three main edit operations: keeping a token, deleting it, and adding a phrase before the token. To predict the edit operations, we propose a novel model, which combines a BERT encoder with an autoregressive Transformer decoder. This approach is evaluated on English text on four tasks: sentence fusion, sentence splitting, abstractive summarization, and grammar correction. LaserTagger achieves new state-of-the-art results on three of these tasks, performs comparably to a set of strong seq2seq baselines with a large number of training examples, and outperforms them when the number of examples is limited. Furthermore, we show that at inference time tagging can be more than two orders of magnitude faster than comparable seq2seq models, making it more attractive for running in a live environment.

1. Introduction

Neural sequence-to-sequence (seq2seq) models provide a powerful framework for learning to translate source texts into target texts. Since their first application to machine translation (MT) (Sutskever et al., 2014) they have become the de facto approach for virtually every text generation task, including summarization (Tan et al., 2017), image captioning (Xu et al., 2015), text style transfer (Rao and Tetreault, 2018; Nikolov and Hahnloser, 2018; Jin et al., 2019), and grammatical error correction (Chollampatt and Ng, 2018; Grundkiewicz et al., 2019).

We observe that in some text generation tasks, such as the recently introduced sentence splitting and sentence fusion tasks, output texts highly overlap with inputs. In this setting, learning a seq2seq model to generate the output text from scratch seems intuitively wasteful. Copy mechanisms (Gu et al., 2016; See et al., 2017) allow for choosing between copying source tokens and generating arbitrary tokens, but although such hybrid models help with out-of-vocabulary words, they still require large training sets as they depend on output vocabularies as large as those used by the standard seq2seq approaches.

Encode Tag
Turing was born in 1912 . Turing died in 1954 .
Turing was born in 1912 and he died in 1954 .
Realize
KEEP KEEP KEEP KEEP KEEP and heDELETE DELETE KEEP KEEP KEEP KEEP
Figure 1: LASERTAGGER applied to sentence fusion.

In contrast, we propose learning a text editing model that applies a set of edit operations on the input sequence to reconstruct the output. We show that it is often enough to use a relatively small set of output tags representing text deletion, rephrasing and word reordering to be able to reproduce a large percentage of the targets in the training data. This results in a learning problem with a much smaller vocabulary size, and the output length fixed to the number of words in the source text. This, in turn, greatly reduces the number training examples required to train accurate models, which is particularly important in applications where only a small number of human-labeled data is available.

Our tagging approach, LASERTAGGER, consists of three steps (Fig. 1): (i) Encode builds a representation of the input sequence, (ii) Tag assigns edit tags from a pre-computed output vocabulary to the input tokens, and (iii) Realize applies a simple set of rules to convert tags into the output text tokens. An experimental evaluation of LASERTAGGER on four different text generation tasks shows that it yields comparable results to seq2seq models when we have tens of thousands of training examples and clearly outperforms them when the number of examples is smaller.

Our contributions are the following:

1) We demonstrate that many text generation tasks with overlapping inputs and outputs can be effectively treated as text editing tasks.

2) We propose LASERTAGGER — a sequence tagging-based model for text editing, together with a method for generating the tag vocabulary from the training data.

3) We describe two versions of the tagging model: (i) LASERTAGGERFF—a tagger based on BERT (Devlin et al., 2019) and (ii) LASERTAGGERAR—a novel tagging model combining the BERT encoder with an autoregressive Transformer decoder, which further improves the results over the BERT tagger.

4) We evaluate LASERTAGGER against strong seq2seq baseline models based on the BERT architecture. Our baseline models outperform previously reported state-of-the-art results on two tasks.

5) We demonstrate that a) LASERTAGGERAR achieves state-of-the-art or comparable results on 3 out of 4 examined tasks, b) LASERTAGGERFF is up to 100x faster at inference time with performance comparable to the state-of-the-art seq2seq models. Furthermore, both models: c) require much less training data compared to the seq2seq models, d) are more controllable and interpretable than seq2seq models due to the small vocabulary of edit operations, e) are less prone to typical seq2seq model errors, such as hallucination. The code will be available at: lasertagger.page.link/code

2 Related Work

Recent work discusses some of the difficulties of learning neural decoders for text generation (Wiseman et al., 2018; Prabhakaran et al., 2018). Conventional seq2seq approaches require large amounts of training data, are hard to control and to constrain to desirable outputs. At the same time, many NLP tasks that appear to be full-fledged text generation tasks are natural testbeds for simpler methods. In this section we briefly review some of these tasks.

Text Simplification is a paraphrasing task that is known to benefit from modeling edit operations. A simple instance of this type are sentence compression systems that apply a drop operation at the token/phrase level (Filippova and Strube, 2008; Filippova et al., 2015), while more intricate systems also apply splitting, reordering, and lexical substitution (Zhu et al., 2010). Simplification has also been attempted with systems developed for phrasebased MT (Xu et al., 2016a), as well as with neural encoder-decoder models (Zhang and Lapata, 2017).

Independent of this work, Dong et al. (2019) recently proposed a text-editing model, similar to ours, for text simplification. The main differences to our work are: (i) They introduce an interpreter module which acts as a language model for the so-far-realized text, and (ii) they generate added tokens one-by-one from a full vocabulary rather than from an optimized set of frequently added phrases. The latter allows their model to generate more diverse output, but it may negatively effect the inference time, precision, and the data efficiency of their model. Another recent model similar to ours is called Levenshtein Transformer Gu et al. (2019), which does text editing by performing a sequence of deletion and insertion actions.

Single-document summarization is a task that requires systems to shorten texts in a meaningpreserving way. It has been approached with deletion-based methods on the token level (Filippova et al., 2015) and the sentence level (Narayan et al., 2018; Liu, 2019). Other papers have used neural encoder-decoder methods (Tan et al., 2017; Rush et al., 2015; Paulus et al., 2017) to do abstractive summarization, which allows edits beyond mere deletion. This can be motivated by the work of Jing and McKeown (2000), who identified a small number of fundamental high-level editing operations that are useful for producing summaries (reduction, combination, syntactic transformation, lexical paraphrasing, generalization/specification, and reordering). See et al. (2017) extended a neural encoder-decoder model with a copy mechanism to allow the model to more easily reproduce input tokens during generation.

Out of available summarization datasets (Dernoncourt et al., 2018), we find the one by Toutanova et al. (2016) particularly interesting because (1) it specifically targets abstractive summarization systems, (2) the lengths of texts in this dataset (short paragraphs) seem well-suited for text editing, and (3) an analysis showed that the dataset covers many different summarization operations.

In Grammatical Error Correction (Ng et al., 2013, 2014) a system is presented with input texts written usually by a language learner, and is tasked with detecting and fixing grammatical (and other) mistakes. Approaches to this task often incorporate task-specific knowledge, e.g., by designing classifiers for specific error types (Knight and Chander, 1994; Rozovskaya et al., 2014) that can be trained without manually labeled data, or by adapting statistical machine-translation methods (Junczys- Dowmunt and Grundkiewicz, 2014). Methods for the sub-problem of error detection are similar in spirit to sentence compression systems, in that they are implemented as word-based neural sequence labelers (Rei, 2017; Rei et al., 2017). Neural encoderdecoder methods are also commonly applied to the error correction task (Ge et al., 2018; Chollampatt and Ng, 2018; Zhao et al., 2019), but suffer from a lack of training data, which is why taskspecific tricks need to be applied (Kasewa et al., 2018; Junczys-Dowmunt et al., 2018).

3 Text Editing as a Tagging Problem

Our approach to text editing is to cast it into a tagging problem. Here we describe its main components: (1) the tagging operations, (2) how to convert plain-text training targets into a tagging format, as well as (3) the realization step to convert tags into the final output text.

References

BibTeX

@inproceedings{2019_EncodeTagRealizeHighPrecisionTe,
  author    = {Eric Malmi and
               Sebastian Krause and
               Sascha Rothe and
               Daniil Mirylenka and
               Aliaksei Severyn},
  editor    = {Kentaro Inui and
               Jing Jiang and
 [[Vincent Ng]] and
               Xiaojun Wan},
  title     = {Encode, Tag, Realize: High-Precision Text Editing},
  booktitle = {Proceedings of the 2019 Conference on Empirical Methods in Natural
               Language Processing and the 9th International Joint Conference on
               Natural Language Processing (EMNLP-IJCNLP 2019)},
  pages     = {5053--5064},
  publisher = {Association for Computational Linguistics},
  year      = {2019},
  url       = {https://doi.org/10.18653/v1/D19-1510},
  doi       = {10.18653/v1/D19-1510},
}


 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2019 EncodeTagRealizeHighPrecisionTeSascha Rothe
Aliaksei Severyn
Eric Malmi
Sebastian Krause
Daniil Mirylenka
Encode, Tag, Realize: High-Precision Text Editing2019