LSPGS Wikipedia Long Sentences Summarization Task

AKA: Text Summarization via Transformer Neural Network, Liu-Saleh-Pot-Goodrich-Sepassi Wikipedia Long Sentences Summarization Task, Liu's Wikipedia Long Sentences Summarization Task.
Context:
- Task Input(s): Wikipedia Topic.
- Task Output(s): Wikipedia generated article.
- Task Requirement(s):
  - Benchmark Datasets:
    - Enclish Gigaword dataset (Graff & Cieri, 2003),
    - CNN and DailyMail dataset (Nallapati et al., 2016),
    - WikiSum dataset (Liu et al., 2018)
  - Performance Metrics:
    - ROUGE scores.
    - Human evaluation ratings scores.
  - Baseline Models:
    - Extractive summarization models:
      - Identity Model-a trivial baseline extractor;
      - Term Frequency Inverse Document Frequency (TF-IDF) (Ramos, 2003);
      - TextRank (Mihalcea & Tarau, 2004);
      - Sumbasic (Nenkova & Vanderwende,2005);
      - Cheating Extractor Model;
    - Abstractive summarization models:
      - LSTM Encoder-Decoder with Attention (seq2seq-attn) (Bahdanau et al. 2014) ;
      - Transformer Neural Network (Vaswani et al., 2017) models:
        Transformer Decoder (T-D);
        
        Transformer Encoder-Decoder (T-ED);
        
        Transformer Decoder with Memory-Compressed Attention (T-DMCA).
  - It can be solved LSPGS Wikipedia Long Sentences Summarization System that implements LSPGS Wikipedia Long Sentences Summarization Algorithms.
Example(s):
- Extractive methods comparison (Liu et al., 2018):

Extractor	Corpus	Test log-perplexity	ROUGE-L
cheating	combined	1.72975	59.3
tf-idf	combined	2.46645	34.2
tf-idf	citations-only	3.04299	22.6
tf-idf	search-only	3.56593	2.8
identity	combined	4.80215	4.0

Models Performance (Liu et al., 2018):
{{class="wikitable" style="border:1px; text-align:left; solid black; border-spacing:1px; margin: 1em auto; width: 80%"

|- ! Model !!Test perplexity!! ROUGE-L. |- |seq2seq-attention, $L = 500 || 5.04952|| 12.7 |- |Transformer-ED, $L = 500 || 2.46645|| 34.2 |- |Transformer-D, $L = 4000$|| 2.22216 ||33.6 |- | Transformer-DMCA, no MoE-layer, $L = 11000$ ||2.05159 ||36.2 |- |Transformer-DMCA, MoE-128, $L = 11000 || 1.92871|| 37.9 |- |Transformer-DMCA, MoE-256, $L = 7500$|| 1.90325|| 38.8 |- |}

References

(Nenkova & Vanderwende, 2005) ⇒ Ani Nenkova, and Lucy Vanderwende (2005). “The Impact of Frequency on Summarization". Microsoft Research, Redmond, Washington, Tech. Rep. MSR-TR-2005, 101.

(Graff & Cieri, 2003) ⇒ David Graff, and Christopher Cieri (2003). "English Gigaword". Linguistic Data Consortium, Philadelphia, 2003.
- QUOTE: English Gigaword was produced by Linguistic Data Consortium (LDC) catalog number LDC2003T05 and ISBN 1-58563-260-0, and is distributed on DVD. This is a comprehensive archive of newswire text data in English that has been acquired over several years by the LDC.
  Four distinct international sources of English newswire are represented here:

(Ramos, 2003) ⇒ Juan Ramos (2003). “Using TF-IDF to Determine Word Relevance in Document Queries". In: Proceedings of the first instructional conference on Machine Learning, volume 242, pp. 133–142.