CRF Training Algorithm: Difference between revisions

From GM-RKB
Jump to navigation Jump to search
No edit summary
 
m (Text replacement - "> ↵" to "> ")
 
(41 intermediate revisions by 4 users not shown)
Line 1: Line 1:
A [[CRF Training Algorithm]] is a [[model training algorithm]] that can solve a [[CRF training task]] (produce a [[conditional random field function]]).
A [[CRF Training Algorithm]] is a [[graphical model training algorithm]] that accepts a [[CRF model family]] and can be applied by a [[CRF training system]] (to solve a [[CRF training task]] to produce a [[CRF structure]]).
* <B><U>AKA</U>:</B> [[CRF Trainer]].
* <B>AKA:</B> [[CRF Training Algorithm|Conditional Random Field Trainer]].
* <B><U>Context</U>:</B>
* <B>Context</U>:</B>
** It can range from being a [[Fully-Supervised CRF Training Algorithm]] to being a [[Semi-Supervised CRF Training Algorithm]].
** It can range from being a [[Fully-Supervised CRF Training Algorithm]] to being a [[Semi-Supervised CRF Training Algorithm]].
** It can be a [[Part Of]] a [[CRF-based Tagging Algorithm]].
** It can be a [[Part Of]] a [[CRF-based Tagging Algorithm]].
** It can require a [[CRF Feature Template]].
** It can require a [[CRF Feature Template]].
* <B><U>Example(s)</U>:</B>  
** …
** it can be based on an [[Iterative Scaling Algorithm]] ([[2001_ConditionalRandomFields|Lafferty & al, 2001]])
* <B>Example(s):</B>
** it can be based on a [[Limited-Memory Quasi-Newton Algorithm]]/[[L-BFGS]]. ([[2006_IntegratingLingKnowIntoACRF|Tasi & al, 2006]])
** It can be based on an [[Iterative Scaling Algorithm]] ([[2001_ConditionalRandomFields|Lafferty et al., 2001]])
** it can be based on a [[Convex Optimization Algorithm]] ([[2003_ShallowParsingwithConditionalRa|Sha & Pereira, 2003a]])
** It can be based on a [[Limited-Memory Quasi-Newton Algorithm]]/[[L-BFGS]]. ([[2006_IntegratingLinguisticKnowledgeI|Tasi et al., 2006]])
** it can be based on a [[Stochastic Gradient Optimization Algorithm]] ([[2006_AcceleratedTrainingofConditiona|Vishwanathan & al, 2006]]).
** It can be based on a [[Convex Optimization Algorithm]] ([[2003_ShallowParsingwithConditionalRa|Sha & Pereira, 2003a]])
* <B><U>Counter-Example(s)</U>:</B>  
** It can be based on a [[Stochastic Gradient Optimization Algorithm]] ([[2006_AcceleratedTrainingofConditiona|Vishwanathan et al., 2006]]).
** …
* <B>Counter-Example(s):</B>
** a [[MEMM (Maximum-entropy Markov Model) Training Algorithm]].
** a [[CRF Inference Algorithm]].
** a [[CRF Inference Algorithm]].
* <B><U>See</U>:</B> [[CRF Label Bias]], [[CRF Length Bias]].
** an [[HMM Training Algorithm]].
** an [[RNN Training Algorithm]].
* <B>See:</B> [[CRF Label Bias]], [[CRF Length Bias]].
 
----
----
----
----
==References==


===2010===
== References ==
* ([[2010_PracticalVeryLargeScaleCRFs|Lavergne & al, 2010]]) &rArr; [[Thomas Lavergne]], [[Olivier Cappé]], and [[François Yvon]]. ([[2010]]). "[http://www.aclweb.org/anthology-new/P/P10/P10-1052.pdf Practical Very Large Scale CRFs]." In: [[Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics]], ([[ACL 2010]]).
 
** QUOTE: In [[2010_PracticalVeryLargeScaleCRFs|this paper]], we address the issue of [[CRF Training Task|training]] [[Very Large CRF|very large CRFs]], containing up to [[Hundreds|hundreds]] [[Output Label|output labels]] and [[Billions|several billion]] [[Predictor Feature|features]]. [[Efficient Algorithm|Efficiency]] stems here from the [[Sparse Data Structure|sparsity]] induced by the use of a <i>l</i><sup>1</sup> [[CRF Penalty Term|penalty term]]. Based on our own [[CRF System|implementation]], we compare three recent [[Proposed Algorithm|proposals]] for [[implementing]] this [[CRF Regularization Strategy|regularization strategy]].
=== 2016 ===
* http://www.datasciencecentral.com/profiles/blogs/conditional-random-fields-crf-short-survey
** QUOTE: The time complexity of the [[CRF Training Algorithm|training process]] is large enough: <math>\begin{equation*} O(mNTQ2nS), \end{equation*}</math>, where:
*** m is the number of training iterations
*** N is the number of training data sequences
*** T is the average length of training sequences
*** Q is the number of class labels
*** n is the number of CRF features
*** S is the searching time of the optimization algorithm (for example, L-BFGS algorithm, which is considered good for this).
** In practical implementation, the computational time is often larger due to many other operations like numerical scaling, smoothing etc.
 
=== 2010 ===
* ([[2010_PracticalVeryLargeScaleCRFs|Lavergne et al., 2010]]) [[Thomas Lavergne]], [[Olivier Cappé]], and [[François Yvon]]. ([[2010]]). [http://www.aclweb.org/anthology-new/P/P10/P10-1052.pdf Practical Very Large Scale CRFs].In: [[Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics]], ([[ACL 2010]]).
** QUOTE: In [[2010_PracticalVeryLargeScaleCRFs|this paper, we]] address the issue of [[CRF Training Task|training]] [[Very Large CRF|very large CRFs]], containing up to [[Hundreds|hundreds]] [[Output Label|output labels]] and [[Billions|several billion]] [[Predictor Feature|features]]. [[Efficient Algorithm|Efficiency]] stems here from the [[Sparse Data Structure|sparsity]] induced by the use of a <i>l</i><sup>1</sup> [[CRF Penalty Term|penalty term]]. Based on our own [[CRF System|implementation]], we compare three recent [[Proposed Algorithm|proposals]] for [[implementing]] this [[CRF Regularization Strategy|regularization strategy]].


===2006===
=== 2006 ===
* ([[2006_AcceleratedTrainingofConditiona|Vishwanathan & al, 2006]]) &rArr; [[S. V. N. Vishwanathan]], [[Nicol N. Schraudolph]], [[Mark W. Schmidt]], and [[Kevin P. Murphy]]. ([[2006]]). "[http://people.cs.ubc.ca/~murphyk/Papers/icml06_camera.pdf Accelerated Training of Conditional Random Fields with Stochastic Gradient Methods]." In: [[Proceedings of the 23rd international conference on Machine learning]] ([[ICML-2006]]). [http://dx.doi.org/10.1145/1143844.1143966 doi:10.1145/1143844.1143966]
* ([[2006_AcceleratedTrainingofConditiona|Vishwanathan et al., 2006]]) [[S. V. N. Vishwanathan]], [[Nicol N. Schraudolph]], [[Mark W. Schmidt]], and [[Kevin P. Murphy]]. ([[2006]]). [http://people.cs.ubc.ca/~murphyk/Papers/icml06_camera.pdf Accelerated Training of Conditional Random Fields with Stochastic Gradient Methods].In: [[Proceedings of the 23rd International Conference on Machine learning]] ([[ICML-2006]]). [http://dx.doi.org/10.1145/1143844.1143966 doi:10.1145/1143844.1143966]
** QUOTE: We apply [[Stochastic Meta-Descent (SMD)]], a [[stochastic gradient optimization method]] with [[gain vector]] adaptation, to the [[CRF Training Task|training of Conditional Random Fields (CRFs)]]. On several [[large data set]]s, the resulting [[CRF Training Algorithm|optimizer]] converges to the same quality of solution over an [[order of magnitude faster]] than [[limited-memory BFGS]], the leading method reported to date. We report results for both [[exact]] and [[inexact inference technique]]s.
** QUOTE: [[We]] apply [[Stochastic Meta-Descent (SMD)]], a [[stochastic gradient optimization method]] with [[gain vector]] adaptation, to the [[CRF Training Task|training of Conditional Random Fields (CRFs)]]. On several [[large data set]]s, the resulting [[CRF Training Algorithm|optimizer]] converges to the same quality of solution over an [[order of magnitude faster]] than [[limited-memory BFGS]], the leading method reported to date. We report results for both [[exact]] and [[inexact inference technique]]s.


===2003===
=== 2003 ===
* ([[2003_ShallowParsingwithConditionalRa|Sha & Pereira, 2003a]]) &rArr; Fei Sha, and [[Fernando Pereira]]. (2003). "[http://acl.ldc.upenn.edu/N/N03/N03-1028.pdf Shallow Parsing with Conditional Random Fields]." In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology (HLT-NAACL 2003). [http://dx.doi.org/10.3115/1073445.1073473 doi:10.3115/1073445.1073473]
* ([[2003_ShallowParsingwithConditionalRa|Sha & Pereira, 2003a]]) ⇒ [[Fei Sha]], and [[Fernando Pereira]]. ([[2003]]). [http://acl.ldc.upenn.edu/N/N03/N03-1028.pdf Shallow Parsing with Conditional Random Fields].In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology (HLT-NAACL 2003). [http://dx.doi.org/10.3115/1073445.1073473 doi:10.3115/1073445.1073473]
** <U>QUOTE</U>: To obtain these results, we had to abandon the original [[iterative scaling]] [[CRF Training Algorithm|CRF training algorithm]] for [[Convex Optimization Algorithm|convex optimization algorithms]] with better [[convergence]] [[properties]].
** QUOTE: To obtain these results, we had to abandon the original [[iterative scaling]] [[CRF Training Algorithm|CRF training algorithm]] for [[Convex Optimization Algorithm|convex optimization algorithm]]s with better [[convergence]] [[properties]].


----
----

Latest revision as of 02:40, 27 March 2024

A CRF Training Algorithm is a graphical model training algorithm that accepts a CRF model family and can be applied by a CRF training system (to solve a CRF training task to produce a CRF structure).



References

2016

  • http://www.datasciencecentral.com/profiles/blogs/conditional-random-fields-crf-short-survey
    • QUOTE: The time complexity of the training process is large enough: [math]\displaystyle{ \begin{equation*} O(mNTQ2nS), \end{equation*} }[/math], where:
      • m is the number of training iterations
      • N is the number of training data sequences
      • T is the average length of training sequences
      • Q is the number of class labels
      • n is the number of CRF features
      • S is the searching time of the optimization algorithm (for example, L-BFGS algorithm, which is considered good for this).
    • In practical implementation, the computational time is often larger due to many other operations like numerical scaling, smoothing etc.

2010

2006

2003