Translated Labs TermFinder System

From GM-RKB
Jump to navigation Jump to search

A Translated Labs TermFinder System is an Terminology Extraction System managed by Translated Labs.

  • Example(s):

| # | Extracted term | Score | 1 | sequential gradient descent optimization | 71% | 2 | robust predictive modeling technique | 69% | 3 | piecewise metric index structure | 68% | 4 | estimated upper-bound confidence interval | 66% | 5 | user-supplied arbitrary likelihood function | 65% | 6 | algorithm | 63% | 7 | clustering | 61% | 8 | clustering quality measure | 60% | 9 | nearest neighbor algorithm | 58% | 10 | semi-nonnegative matrix tri-factorization | 58% | 11 | algorithms | 57% | 12 | datasets | 57%



References

2009

  • http://labs.translated.net/
    • Introduction
      • Terminology is the sum of the terms which identify a specific topic. Extracting terminology is the process of extracting terminology from a text.

The idea is to compare the frequency of words in a given document with their frequency in the language. Words which appear very frequently in the document but rarely in the language are probably terms.

    • Technology
      • It uses Poisson statistics, the Maximum Likelihood Estimation and Inverse Document Frequency between the frequency of words in a given document and a generic corpus of 100 million words per language. It uses a probabilistic part of speech tagger to take into account the probability that a particular sequence could be a term. It creates n-grams of words by minimizing the relative entropy.

Why have we developed this?

      • Translated has developed this technology to help its translators to be aware of the difficulties in a document and to simplify the process of creating glossaries.
      • We also use it to improve search results in traditional search engines (es. Google) by giving a better estimation of how much a keyword is relevant to a document.