FASTR System
A FASTR System is a Term Extraction System and Indexing System developed by Christian Jacquemin.
- AKA: FASTR.
- Context:
- It makes use of the TreeTagger System.
- See: Christian Jacquemin.
References
2017
- (FASTR, 2017) ⇒ http://perso.limsi.fr/jacquemi/FASTR/presentation-fastr.html Retrieved: 2017-07-02
- QUOTE: Fastr is a parser for term and variant recognition. Fastr take as input a corpus and a list of terms and ouputs the indexed corpus in which terms and variants are recognized.
Fastr can be used in two modes:
- controlled indexing: input consists of a corpus and a list of terms,
- free indexing: input only consists of a corpus, the list of terms is automatically acquired from the corpus.
- QUOTE: Fastr is a parser for term and variant recognition. Fastr take as input a corpus and a list of terms and ouputs the indexed corpus in which terms and variants are recognized.
- Fastr uses the following resources:
- the corpus and the list of terms are tagged by the TreeTagger:
- if available, a list of morphological families and a list of semantic links are used to calculate morphological and semantic variation. See sample files
- ./lib/der-families-xx
- ./lib/sem-classes-xx or ./lib/sem-links-xx
- for the format (xx is the name of the language [en|fr]).
- Perl modules are provided in order to generate these data from WordNet and CELEX for the English language.
- The formalism of Fastr is close to PATR-II.
- Fastr uses the following resources:
2009
- http://perso.limsi.fr/jacquemi/FASTR/
- FASTR - A Tool for Automatic Indexing (1988-2001)
- Fastr is a parser for term and variant recognition. Fastr take as input a corpus and a list of terms and ouputs the indexed corpus in which terms and variants are recognized.
- Fastr can be used in two modes:
- controlled indexing: input consists of a corpus and a list of terms,
- free indexing: input only consists of a corpus, the list of terms is automatically acquired from the corpus.
2001
- (Jacquemin, 2001) ⇒ Christian Jacquemin. (2001). “Spotting and Discovering Terms Through Natural Language Processing." MIT Press. ISBN:0262100851
- QUOTE: In this book Christian Jacquemin shows how the power of natural language processing (NLP) can be used to advance text indexing and information retrieval (IR). Jacquemin's novel tool is FASTR, a parser that normalizes terms and recognizes term variants. Since there are more meanings in a language than there are words, FASTR uses a metagrammar composed of shallow linguistic transformations that describe the morphological, syntactic, semantic, and pragmatic variations of words and terms. The acquired parsed terms can then be applied for precise retrieval and assembly of information.
The use of a corpus-based unification grammar to define, recognize, and combine term variants from their base forms allows for intelligent information access to, or “linguistic data tuning” of, heterogeneous texts. FASTR can be used to do automatic controlled indexing, to carry out content-based Web searches through conceptually related alternative query formulations, to abstract scientific and technical extracts, and even to translate and collect terms from multilingual material. Jacquemin provides a comprehensive account of the method and implementation of this innovative retrieval technique for text processing.
- QUOTE: In this book Christian Jacquemin shows how the power of natural language processing (NLP) can be used to advance text indexing and information retrieval (IR). Jacquemin's novel tool is FASTR, a parser that normalizes terms and recognizes term variants. Since there are more meanings in a language than there are words, FASTR uses a metagrammar composed of shallow linguistic transformations that describe the morphological, syntactic, semantic, and pragmatic variations of words and terms. The acquired parsed terms can then be applied for precise retrieval and assembly of information.
1997
- (Jacquemin et al., 1997) ⇒ Christian Jacquemin, Judith Klavans, and Evelyne Tzoukermann. (1997). "Expansion of Multi-Word Terms for Indexing and Retrieval Using Morphology and Syntax" In: Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics (ACL 1997). DOI:10.3115/976909.979621.
- ABSTRACT: A system for the automatic production of controlled index terms is presented using linguistically-motivated techniques. This includes a finite-state part of speech tagger, a derivational morphological processor for analysis and generation, and a unification-based shallow-level parser using transformational rules over syntactic patterns. The contribution of this research is the successful combination of parsing over a seed term list coupled with derivational morphology to achieve greater coverage of multi-word terms for indexing and retrieval. Final results are evaluated for precision and recall, and implications for indexing and retrieval are discussed.
1994
- (Jacquemin, 1994) ⇒ Christian Jacquemin. (1994). “FASTR : A Unification-based Front-End to Automatic Indexing.” In: RIAO 1994: 34-48
- ABSTRACT: Most natural language processing approaches to full-text information retrieval are based on indexing documents by the occurrences of controlled terms they contain. An important problem with this approach is that terms accept numerous variations, and can therefore cause many documents not to be retrieved although being relevant. For example, "myeloid leukaemia cells" and "myeloid and erythoid cell" are two occurrences of "myeloid cell" which cannot be detected without an account of local morpho-syntactic variations.
In this paper, we present a linguistic analysis of the observed variations and a three-tier constraint-based formalism for representing them. This technique has been implemented and results in FASTR, a natural language processing tool that extracts terms and their variants from full-text documents. We justify the choice of a unification-based formalism by its expressivity and by the addition of conceptual and computational devices which make the parser computationally tractable. Contrary to the generally accepted idea, high quality natural language processing through unification and industrial requirements can fit together, provided that the application is carefully designed in order to control and minimize data accesses and computation times.
The effectiveness of FASTR for extracting correct occurrences is supported by experiments on two English corpora of scientific abstracts and a list of 71,623 controlled terms. We report that an account of three kinds of variants (insertions, permutations and coordinations) increases recall by 16.7% without altering precision.
- ABSTRACT: Most natural language processing approaches to full-text information retrieval are based on indexing documents by the occurrences of controlled terms they contain. An important problem with this approach is that terms accept numerous variations, and can therefore cause many documents not to be retrieved although being relevant. For example, "myeloid leukaemia cells" and "myeloid and erythoid cell" are two occurrences of "myeloid cell" which cannot be detected without an account of local morpho-syntactic variations.