TermExtractor System

From GM-RKB
Jump to navigation Jump to search

A TermExtractor System was a terminology extraction system largely developed by Francesco Sclano at the Linguistic Computing Laboratory (LCL) at the University of Roma "La Sapienza".

  • Example(s):
Term Term Weight |Domain Relevance |Domain Consensus |Lexical Cohesion |Artificial Frequency|
classification problem 0.846 |1.000 |0.864 |0.068 |0.714|
social tag 0.687 |1.000 |0.511 |0.076 |0.571|
optimization problem 0.629 |1.000 |0.373 |0.113 |1.000|


References

2008

2007

2006

  • http://lists.w3.org/Archives/Public/semantic-web/2006Oct/0117.html
    • TermExtractor is a software package for automatic building, validation and maintenance of glossaries in english language.

      TermExtractor extracts terminology consensually referred in a specific application domain. The package takes as input a corpus of domain documents, parses the documents, and extracts a list of "syntactically plausible" terms (e.g. compounds, adjective-nouns, etc.). Documents parsing assigns a greater importance to terms with text layouts (title, bold, italic, underlined, etc.). Two entropy-based measures, called Domain Relevance and Domain Consensus, are then used. Domain Consensus is used to select only the terms which are consensually referred throughout the corpus documents. Domain Relevance to select only the terms which are relevant to the domain of interest, Domain Relevance is computed with reference to a set of contrastive terminologies from different domains. Finally, extracted terms are further filtered using Lexical Cohesion, that measures the degree of association of all the words in a terminological string. Accept files formats are: txt, pdf, ps, dvi, tex, doc, rtf, ppt, xls, xml, html/htm, chm, wpd and also zip archives.

2002

2001