2012 BootstrappedLanguageIdentificat
Jump to navigation
Jump to search
- (Mayer, 2012) ⇒ Uwe F. Mayer. (2012). “Bootstrapped Language Identification for Multi-site Internet Domains.” In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2012). ISBN:978-1-4503-1462-6 doi:10.1145/2339530.2339622
Subject Headings:
Notes
Cited By
- http://scholar.google.com/scholar?q=%222012%22+Bootstrapped+Language+Identification+for+Multi-site+Internet+Domains
- http://dl.acm.org/citation.cfm?id=2339530.2339622&preflayout=flat#citedby
Quotes
Author Keywords
Abstract
We present an algorithm for language identification, in particular of short documents, for the case of an Internet domain with sites in multiple countries with differing languages. The algorithm is significantly faster than standard language identification methods, while providing state-of-the-artidentification. We bootstrap the algorithm based on the language identification based on the site alone, a methodology suitable for any supervised language identification algorithm. We demonstrate the bootstrapping and algorithm on eBay email data and on Twitter status updates data. The algorithm is deployed at eBay as part of the back-office development data repository.
References
;
Author | volume | Date Value | title | type | journal | titleUrl | doi | note | year | |
---|---|---|---|---|---|---|---|---|---|---|
2012 BootstrappedLanguageIdentificat | Uwe F. Mayer | Bootstrapped Language Identification for Multi-site Internet Domains | 10.1145/2339530.2339622 | 2012 |