Averaged Perceptron Algorithm
Jump to navigation
Jump to search
An Averaged Perceptron Algorithm is a Perceptron Algorithm that is based on the averaged parameters method.
- Example(s):
- Counter-Example(s):
- See: Perceptron Algorithm, Generalized Perceptron Model, Perceptron, Perceptron Training Algorithm.
References
2002
- (Collins, 2002b) ⇒ Michael Collins. (2002). “Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with the Perceptron Algorithm.” In: Proceedings of the ACL Conference on Empirical Methods in Natural Language Processing, (EMNLP 2002). doi:10.3115/1118693.1118694
- QUOTE: There is a simple refinement to the algorithm in figure 1, called the “averaged parameters" method. Define $\alpha^{t,i}_s$ to be the value for the $s$'th parameter after the $i$'th training example has been processed in pass $t$ over the training data. Then the “averaged parameters" are defined as $\gamma_s = \sum_{t=1 \cdots T , i=1 \cdots n} \alpha^{t,i}_s / nT$ for all $s = 1 \cdots d$. It is simple to modify the algorithm to store this additional set of parameters. Experiments in section 4 show that the averaged parameters perform signicantly better than the final parameters $\alpha^{T,n}_s$. The theory in the next section gives justication for the averaging method.
Inputs: A training set of tagged sentences, $\left(w^i_{[1:n_i ]} , t^i_{[1:n_i]}\right)$ for $i = 1 \cdots n$. A parameter $T$ specifying number of iterations over the training set. A “local representation” $\Phi$ which is a function that maps history/tag pairs to d-dimensional feature vectors. The global representation $\Phi$ is defined through $\phi$ as in Eq. 1. |
Initialization: Set parameter vector $ \overline{\alpha}= 0$. |
Algorithm: For $t = 1 \cdots T, i = 1 \cdot n$
|