2009 NameEthnicityClassificationfrom
- (Ambekar et al., 2009) ⇒ Anurag Ambekar, Charles Ward, Jahangir Mohammed, Swapna Male, and Steven Skiena. (2009). “Name-ethnicity Classification from Open Sources.” In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2009). doi:10.1145/1557019.1557032
Subject Headings:
Notes
- Categories and Subject Descriptors: I.2.1 Applications and Expert Systems: Cartography.
- General Terms: Algorithms, Experimentation.
Cited By
- http://scholar.google.com/scholar?q=%22Name-ethnicity+classification+from+open+sources%22+2009
- http://portal.acm.org/citation.cfm?doid=1557019.1557032&preflayout=flat#citedby
Quotes
Author Keywords
Ethnicity Detection, Name Classification, News Analysis, Social Science Research
Abstract
The problem of ethnicity identification from names has a variety of important applications, including biomedical research, demographic studies, and marketing. Here we report on the development of an ethnicity classifier where all training data is extracted from public, non-confidential (and hence somewhat unreliable) sources. Our classifier uses hidden Markov models (HMMs) and decision trees to classify names into 13 cultural/ethnic groups with individual group accuracy comparable accuracy to earlier binary (e.g., Spanish/non-Spanish) classifiers. We have applied this classifier to over 20 million names from a large-scale news corpus, identifying interesting temporal and spatial trends on the representation of particular cultural/ethnic groups.
References
,
Author | volume | Date Value | title | type | journal | titleUrl | doi | note | year | |
---|---|---|---|---|---|---|---|---|---|---|
2009 NameEthnicityClassificationfrom | Anurag Ambekar Charles Ward Jahangir Mohammed Swapna Male Steven Skiena | Name-ethnicity Classification from Open Sources | KDD-2009 Proceedings | 10.1145/1557019.1557032 | 2009 |