TermExtractor System

Term	Term Weight \|Domain Relevance \|Domain Consensus \|Lexical Cohesion \|Artificial Frequency\|
classification problem	0.846 \|1.000 \|0.864 \|0.068 \|0.714\|
social tag	0.687 \|1.000 \|0.511 \|0.076 \|0.571\|
optimization problem	0.629 \|1.000 \|0.373 \|0.113 \|1.000\|

References

http://lcl2.uniroma1.it/termextractor/
- TermExtractor is a FREE software package for Terminology Extraction. The software helps a web community to extract and validate relevant domain terms in their interest domain, by submitting an archive of domain-related documents in any format.
  Furthermore, TermExtractor is a very useful starting point for Domain Ontology construction, Semantic Similarity, Knowledge Management, etc., since it allows the identification of domain-relevant terms, constituting the linguistic surface manifestation of domain concepts.
(Sclano & Velardi, 2007) ⇒ Francesco Sclano, and Paola Velardi. (2007). “TermExtractor: A web application to learn the common terminology of interest groups and research communities.” In: Proceedings of the 9th Conference on Terminology and Artificial Intelligence (TIA 2007).

http://lists.w3.org/Archives/Public/semantic-web/2006Oct/0117.html
- TermExtractor is a software package for automatic building, validation and maintenance of glossaries in english language.
  TermExtractor extracts terminology consensually referred in a specific application domain. The package takes as input a corpus of domain documents, parses the documents, and extracts a list of "syntactically plausible" terms (e.g. compounds, adjective-nouns, etc.). Documents parsing assigns a greater importance to terms with text layouts (title, bold, italic, underlined, etc.). Two entropy-based measures, called Domain Relevance and Domain Consensus, are then used. Domain Consensus is used to select only the terms which are consensually referred throughout the corpus documents. Domain Relevance to select only the terms which are relevant to the domain of interest, Domain Relevance is computed with reference to a set of contrastive terminologies from different domains. Finally, extracted terms are further filtered using Lexical Cohesion, that measures the degree of association of all the words in a terminological string. Accept files formats are: txt, pdf, ps, dvi, tex, doc, rtf, ppt, xls, xml, html/htm, chm, wpd and also zip archives.

(Navigli & Velardi, 2002) ⇒ Roberto Navigli, and Paola Velardi. (2002). “Semantic Interpretation of Terminological Strings.” In: Proceedings of the 6th International Conference on Terminology and Knowledge Engineering (TKE 2002)
- Keywords_: OntoLearn