2012 MultiDomainActiveLearningforTex

From GM-RKB
Jump to navigation Jump to search

Subject Headings:

Notes

Cited By

Quotes

Author Keywords

Abstract

Active learning has been proven to be effective in reducing labeling efforts for supervised learning. However, existing active learning work has mainly focused on training models for a single domain. In practical applications, it is common to simultaneously train classifiers for multiple domains. For example, some merchant web sites (like Amazon.com) may need a set of classifiers to predict the sentiment polarity of product reviews collected from various domains (e.g., electronics, books, shoes). Though different domains have their own unique features, they may share some common latent features. If we apply active learning on each domain separately, some data instances selected from different domains may contain duplicate knowledge due to the common features. Therefore, how to choose the data from multiple domains to label is crucial to further reducing the human labeling efforts in multi-domain learning. In this paper, we propose a novel multi-domain active learning framework to jointly select data instances from all domains with duplicate information considered. In our solution, a shared subspace is first learned to represent common latent features of different domains. By considering the common and the domain-specific features together, the model loss reduction induced by each data instance can be decomposed into a common part and a domain-specific part. In this way, the duplicate information across domains can be encoded into the common part of model loss reduction and taken into account when querying. We compare our method with the state-of-the-art active learning approaches on several text classification tasks: sentiment classification, newsgroup classification and email spam filtering. The experiment results show that our method reduces the human labeling efforts by 33.2%, 42.9% and 68.7% on the three tasks, respectively.

References

;

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2012 MultiDomainActiveLearningforTexJian-Tao Sun
Lianghao Li
Xiaoming Jin
Sinno Jialin Pan
Multi-domain Active Learning for Text Classification10.1145/2339530.23397012012