Active Learning Task
An Active Learning Task is an iterative supervised learning task that allows request for additional labeled training records from a learning record set.
- Context:
- It can help with tasks with Small Training Data.
- It can be solved by an Active Learning System.
- …
- Example(s):
- an Active Annotation Task.
- …
- Counter-Example(s):
- See: Active Learning Theory.
References
2017a
- (Cohn, 2017) ⇒ Cohn D. (2017) Active Learning. In: Sammut, C., Webb, G.I. (eds) Encyclopedia of Machine Learning and Data Mining. Springer, Boston, MA
- QUOTE: The term Active Learning is generally used to refer to a learning problem or system where the learner has some role in determining on what data it will be trained. This is in contrast to Passive Learning, where the learner is simply presented with a training set over which it has no control. Active learning is often used in settings where obtaining labeled data is expensive or time-consuming; by sequentially identifying which examples are most likely to be useful, an active learner can sometimes achieve good performance, using far less training data than would otherwise be required.
2017b
- (Dasgupta, 2017) ⇒ Dasgupta S. (2017) Active Learning Theory. In: Sammut, C., Webb, G.I. (eds) Encyclopedia of Machine Learning and Data Mining. Springer, Boston, MA
- QUOTE: The term active learning applies to a wide range of situations in which a learner is able to exert some control over its source of data. For instance, when fitting a regression function, the learner may itself supply a set of data points at which to measure response values, in the hope of reducing the variance of its estimate. Such problems have been studied for many decades under the rubric of experimental design (Chernoff 1972; Fedorov 1972). More recently, there has been substantial interest within the machine learning community in the specific task of actively learning binary classifiers. This task presents several fundamental statistical and algorithmic challenges, and an understanding of its mathematical underpinnings is only gradually emerging. This brief survey will describe some of the progress that has been made so far.
2017c
- (Wikipedia, 2017) ⇒ https://en.wikipedia.org/wiki/Active_learning_(machine_learning) Retrieved:2017-12-24.
- Active learning is a special case of semi-supervised machine learning in which a learning algorithm is able to interactively query the user (or some other information source) to obtain the desired outputs at new data points. In statistics literature it is sometimes also called optimal experimental design.
There are situations in which unlabeled data is abundant but manually labeling is expensive. In such a scenario, learning algorithms can actively query the user/teacher for labels. This type of iterative supervised learning is called active learning. Since the learner chooses the examples, the number of examples to learn a concept can often be much lower than the number required in normal supervised learning. With this approach, there is a risk that the algorithm be overwhelmed by uninformative examples.
Recent developments are dedicated to multi-label active learning, hybrid active learning and active learning in a single-pass (on-line) context, combining concepts from the field of Machine Learning (e.g., conflict and ignorance) with adaptive, incremental learning policies in the field of Online machine learning.
- Active learning is a special case of semi-supervised machine learning in which a learning algorithm is able to interactively query the user (or some other information source) to obtain the desired outputs at new data points. In statistics literature it is sometimes also called optimal experimental design.
2010
- (Tomanek, 2010) ⇒ Katrin Tomanek. (2010). “Resource-aware Annotation through Active Learning.” Ph.D. Thesis, Dortmund University.
- ABSTRACT: The annotation of corpora has become a crucial prerequisite for information extraction systems which heavily rely on supervised machine learning techniques and therefore require large amounts of annotated training material. Annotation, however, requires human intervention and is thus an extremely costly, labor-intensive, and error-prone process. The burden of annotation is one of the major obstacles when well-established information extraction systems are to be applied to real-world problems and so a pressing research question is how annotation can be made more efficient. Most annotated corpora are built by collecting the documents to be annotated on a random sampling basis or based on simple keyword search. Only recently, more sophisticated approaches to select the base material in order to reduce annotation effort are being investigated. One promising direction is known as Active Learning (AL) where only examples of high utility for classifier training are selected for manual annotation. Because of this intelligent selection, classifiers of a certain target performance can be yieled with less labeled data points. This thesis centers around the question how AL can be applied as resource-aware strategy for linguistic annotation. A set of requirements is defined and several approaches and adaptations to the standard form of AL are proposed to meet these requirements. This includes: (1) a novel method to monitor and stop the AL-driven annotation process; (2) an approach to semi-supervised AL where only highly critical tokens have to actually be manually annotated while the rest is automatically tagged; (3) a discussion and empirical investigation of the reusability of actively drawn samples; (4) a comparative study how class imbalance can be reduced right upfront during AL-driven data acquisition; (5) two methods for selective sampling of examples which are useful for multiple learning problems; (6) an extensive evaluation of the proposed approaches to AL for Named Entity Recognition with respect to both savings in corpus size and actual annotation time; and finally (7) three methods how these approaches can be made cost-conscious so as to reduce annotation time even more.
2008a
- (Settles, 2008) ⇒ Burr Settles. (2008). “Curious Machines: Active Learning with Structured Instances.” PhD Thesis, University of Wisconsin-Madison.
2008b
- (Olsson, 2008) ⇒ Fredrik Olsson. (2008). “Bootstrapping Named Entity Annotation by Means of Active Machine Learning.” PhD thesis. University of Gothenburg.