2009 UsingGraphbasedMetricswithEmpir
- (Macskassy, 2009) ⇒ Sofus A. Macskassy. (2009). “Using Graph-based Metrics with Empirical Risk Minimization to Speed Up Active Learning on Networked Data.” In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2009). doi:10.1145/1557019.1557087
Subject Headings:
Notes
- Categories and Subject Descriptors: I.2.6 Artificial Intelligence: Learning — concept learning; J.4 Social and Behavioral Sciences: Miscellaneous; E.1 Data Structures: Graphs and Networks; G.2.2 Discrete Mathematics: Graph Theory — graph algorithms
- General Terms: Algorithms, Design, Experimentation, Performance
Cited By
- http://scholar.google.com/scholar?q=%22Using+graph-based+metrics+with+empirical+risk+minimization+to+speed+up+active+learning+on+networked+data%22+2009
- http://portal.acm.org/citation.cfm?doid=1557019.1557087&preflayout=flat#citedby
Quotes
Author Keywords
Active Learning, Statistical Relational Learning, Semisupervised Learning, Social Network Analysis, Betweenness Centrality, Closeness Centrality, Community Finding, Clustering, Empirical Risk Minimization, Within-Network Learning
Abstract
Active and semi-supervised learning are important techniques when labeled data are scarce. Recently a method was suggested for combining active learning with a semi-supervised learning algorithm that uses Gaussian fields and harmonic functions. This classifier is relational in nature : it relies on having the data presented as a partially labeled graph (also known as a within-network learning problem). This work showed yet again that empirical risk minimization (ERM) was the best method to find the next instance to label and provided an efficient way to compute ERM with the semi-supervised classifier. The computational problem with ERM is that it relies on computing the risk for all possible instances. If we could limit the candidates that should be investigated, then we can speed up active learning considerably. In the case where the data is graphical in nature, we can leverage the graph structure to rapidly identify instances that are likely to be good candidates for labeling. This paper describes a novel hybrid approach of using of community finding and social network analytic centrality measures to identify good candidates for labeling and then using ERM to find the best instance in this candidate set. We show on real-world data that we can limit the ERM computations to a fraction of instances with comparable performance.
References
,
Author | volume | Date Value | title | type | journal | titleUrl | doi | note | year | |
---|---|---|---|---|---|---|---|---|---|---|
2009 UsingGraphbasedMetricswithEmpir | Sofus A. Macskassy | Using Graph-based Metrics with Empirical Risk Minimization to Speed Up Active Learning on Networked Data | KDD-2009 Proceedings | 10.1145/1557019.1557087 | 2009 |