CRF Training Algorithm
Jump to navigation
Jump to search
A CRF Training Algorithm is a model training algorithm that can solve a CRF training task (produce a conditional random field function).
- AKA: CRF Trainer.
- Context:
- It can range from being a Fully-Supervised CRF Training Algorithm to being a Semi-Supervised CRF Training Algorithm.
- It can be a Part Of a CRF-based Tagging Algorithm.
- It can require a CRF Feature Template.
- Example(s):
- it can be based on an Iterative Scaling Algorithm (Lafferty & al, 2001)
- it can be based on a Limited-Memory Quasi-Newton Algorithm/L-BFGS. (Tasi & al, 2006)
- it can be based on a Convex Optimization Algorithm (Sha & Pereira, 2003a)
- it can be based on a Stochastic Gradient Optimization Algorithm (Vishwanathan & al, 2006).
- Counter-Example(s):
- See: CRF Label Bias, CRF Length Bias.
References
2010
- (Lavergne & al, 2010) ⇒ Thomas Lavergne, Olivier Cappé, and François Yvon. (2010). "Practical Very Large Scale CRFs." In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, (ACL 2010).
- QUOTE: In this paper, we address the issue of training very large CRFs, containing up to hundreds output labels and several billion features. Efficiency stems here from the sparsity induced by the use of a l1 penalty term. Based on our own implementation, we compare three recent proposals for implementing this regularization strategy.
2006
- (Vishwanathan & al, 2006) ⇒ S. V. N. Vishwanathan, Nicol N. Schraudolph, Mark W. Schmidt, and Kevin P. Murphy. (2006). "Accelerated Training of Conditional Random Fields with Stochastic Gradient Methods." In: Proceedings of the 23rd international conference on Machine learning (ICML-2006). doi:10.1145/1143844.1143966
- QUOTE: We apply Stochastic Meta-Descent (SMD), a stochastic gradient optimization method with gain vector adaptation, to the training of Conditional Random Fields (CRFs). On several large data sets, the resulting optimizer converges to the same quality of solution over an order of magnitude faster than limited-memory BFGS, the leading method reported to date. We report results for both exact and inexact inference techniques.
2003
- (Sha & Pereira, 2003a) ⇒ Fei Sha, and Fernando Pereira. (2003). "Shallow Parsing with Conditional Random Fields." In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology (HLT-NAACL 2003). doi:10.3115/1073445.1073473
- QUOTE: To obtain these results, we had to abandon the original iterative scaling CRF training algorithm for convex optimization algorithms with better convergence properties.