TermExtractor System
A TermExtractor System was a terminology extraction system largely developed by Francesco Sclano at the Linguistic Computing Laboratory (LCL) at the University of Roma "La Sapienza".
- Example(s):
- Output on KDD-2009 Abstracts
Term Term Weight |Domain Relevance |Domain Consensus |Lexical Cohesion |Artificial Frequency| classification problem 0.846 |1.000 |0.864 |0.068 |0.714| social tag 0.687 |1.000 |0.511 |0.076 |0.571| optimization problem 0.629 |1.000 |0.373 |0.113 |1.000|
- See: OntoLearn, GlossExtractor.
References
2008
- (Velardi et al., 2008) ⇒ Paola Velardi, Roberto Navigli, and Pierluigi D'Amadio. (2008). “Mining the Web to Create Specialized Glossaries.” In: IEEE Intelligent Systems Journal, 23(5). doi:10.1109/MIS.2008.88
2007
- http://lcl2.uniroma1.it/termextractor/
- TermExtractor is a FREE software package for Terminology Extraction. The software helps a web community to extract and validate relevant domain terms in their interest domain, by submitting an archive of domain-related documents in any format.
Furthermore, TermExtractor is a very useful starting point for Domain Ontology construction, Semantic Similarity, Knowledge Management, etc., since it allows the identification of domain-relevant terms, constituting the linguistic surface manifestation of domain concepts.
- TermExtractor is a FREE software package for Terminology Extraction. The software helps a web community to extract and validate relevant domain terms in their interest domain, by submitting an archive of domain-related documents in any format.
- (Sclano & Velardi, 2007) ⇒ Francesco Sclano, and Paola Velardi. (2007). “TermExtractor: A web application to learn the common terminology of interest groups and research communities.” In: Proceedings of the 9th Conference on Terminology and Artificial Intelligence (TIA 2007).
2006
- http://lists.w3.org/Archives/Public/semantic-web/2006Oct/0117.html
- TermExtractor is a software package for automatic building, validation and maintenance of glossaries in english language.
TermExtractor extracts terminology consensually referred in a specific application domain. The package takes as input a corpus of domain documents, parses the documents, and extracts a list of "syntactically plausible" terms (e.g. compounds, adjective-nouns, etc.). Documents parsing assigns a greater importance to terms with text layouts (title, bold, italic, underlined, etc.). Two entropy-based measures, called Domain Relevance and Domain Consensus, are then used. Domain Consensus is used to select only the terms which are consensually referred throughout the corpus documents. Domain Relevance to select only the terms which are relevant to the domain of interest, Domain Relevance is computed with reference to a set of contrastive terminologies from different domains. Finally, extracted terms are further filtered using Lexical Cohesion, that measures the degree of association of all the words in a terminological string. Accept files formats are: txt, pdf, ps, dvi, tex, doc, rtf, ppt, xls, xml, html/htm, chm, wpd and also zip archives.
- TermExtractor is a software package for automatic building, validation and maintenance of glossaries in english language.
2002
- (Navigli & Velardi, 2002) ⇒ Roberto Navigli, and Paola Velardi. (2002). “Semantic Interpretation of Terminological Strings.” In: Proceedings of the 6th International Conference on Terminology and Knowledge Engineering (TKE 2002)
- Keywords_: OntoLearn
2001
- (Velardi et al., 2001) ⇒ Paola Velardi, Michele Missikoff, and Roberto Basili. (2001). “Identification of Relevant Terms to Support the Construction of Domain Ontologies.” In: Proceedings of the workshop on Human Language Technology and Knowledge Management (HLTKM 2001). doi:10.3115/1118220.1118225