PPLRE Preprocessor

From GM-RKB
Jump to navigation Jump to search

This section of the PPLRE Detailed Application Design documentation describes the PPLRE Preprocessor subsystem.

Overview

The PPLRE Preprocessor is the document preprocessing subsystem developed for the PPLRE Project. It extracts the relevant text from each document.

  • Input: A document in HTML or XML format
  • Output: The abstract, title, authors, year of publication extracted, converted to ASC characters, and placed into separate files.

Wish List

  • Support the extraction from PDF documents