MALLET Software Toolkit: Difference between revisions

From GM-RKB
Jump to navigation Jump to search
m (Text replacement - "<P> [[" to "<P>  [[")
m (Text replacement - "ions]] " to "ion]]s ")
 
Line 33: Line 33:
=== 2011 ===
=== 2011 ===
* http://mallet.cs.umass.edu/
* http://mallet.cs.umass.edu/
** [[MALLET Software Toolkit|MALLET]] is a [[Java-based package]] for [[statistical natural language processing]], [[document classification]], [[clustering]], [[topic modeling]], [[information extraction]], and other [[machine learning applications to text]].        <P>         [[MALLET Software Toolkit|MALLET]] includes sophisticated [[tool]]s for <B>[[document classification]]</B>: efficient routines for converting text to "features", a wide variety of algorithms (including [[Naïve Bayes]], [[Maximum Entropy]], and [[Decision Trees]]), and code for evaluating [[classifier performance]] using several commonly used metrics. [http://mallet.cs.umass.edu/classification.php Quick Start] [http://mallet.cs.umass.edu/classifier-devel.php Developer's Guide]        <P>        In addition to [[supervised classification|classification]], [[MALLET Software Toolkit|MALLET]] includes tools for <B>[[sequence tagging]]</B> for applications such as [[named-entity extraction from text]]. Algorithms include [[Hidden Markov Models]], [[Maximum Entropy Markov Models]], and [[Conditional Random Fields]]. These methods are implemented in an extensible system for [[finite state transducer]]s. [http://mallet.cs.umass.edu/sequences.php Quick Start] [http://mallet.cs.umass.edu/fst.php Developer's Guide]        <P>         [[Topic model]]s are useful for analyzing [[large collections of unlabeled text]]. The [[MALLET Software Toolkit|MALLET]] <B>topic modeling</B> toolkit contains efficient, [[sampling-based implementation]]s of [[Latent Dirichlet Allocation]], [[Pachinko Allocation]], and [[Hierarchical LDA]]. [http://mallet.cs.umass.edu/topics.php Quick Start]        <P>        Many of the algorithms in [[MALLET Software Toolkit|MALLET]] depend on <B>numerical optimization</B>. [[MALLET Software Toolkit|MALLET]] includes an efficient implementation of [[Limited Memory BFGS]], among many other [[optimization method]]s. [http://mallet.cs.umass.edu/optimization.php Developer's Guide]        <P>        In addition to sophisticated [[ML Tool|Machine Learning application]]s, [[MALLET Software Toolkit|MALLET]] includes [[routines for transforming text documents into numerical representations]] that can then be processed efficiently. This process is implemented through a flexible system of "pipes", which handle distinct tasks such as [[tokenizing strings]], [[removing stopwords]], and [[converting sequences into count vectors]]. [http://mallet.cs.umass.edu/import.php Quick Start] [http://mallet.cs.umass.edu/import-devel.php Developer's Guide]        <P>        An add-on package to [[MALLET Software Toolkit|MALLET]], called [[GRMM]], contains support for [[inference in general graphical models]], and [[training of CRFs with arbitrary graphical structure]]. [http://mallet.cs.umass.edu/grmm/index.php About GRMM]
** [[MALLET Software Toolkit|MALLET]] is a [[Java-based package]] for [[statistical natural language processing]], [[document classification]], [[clustering]], [[topic modeling]], [[information extraction]], and other [[machine learning applications to text]].        <P>         [[MALLET Software Toolkit|MALLET]] includes sophisticated [[tool]]s for <B>[[document classification]]</B>: efficient routines for converting text to "features", a wide variety of algorithms (including [[Naïve Bayes]], [[Maximum Entropy]], and [[Decision Trees]]), and code for evaluating [[classifier performance]] using several commonly used metrics. [http://mallet.cs.umass.edu/classification.php Quick Start] [http://mallet.cs.umass.edu/classifier-devel.php Developer's Guide]        <P>        In addition to [[supervised classification|classification]], [[MALLET Software Toolkit|MALLET]] includes tools for <B>[[sequence tagging]]</B> for applications such as [[named-entity extraction from text]]. Algorithms include [[Hidden Markov Models]], [[Maximum Entropy Markov Models]], and [[Conditional Random Fields]]. These methods are implemented in an extensible system for [[finite state transducer]]s. [http://mallet.cs.umass.edu/sequences.php Quick Start] [http://mallet.cs.umass.edu/fst.php Developer's Guide]        <P>         [[Topic model]]s are useful for analyzing [[large collections of unlabeled text]]. The [[MALLET Software Toolkit|MALLET]] <B>topic modeling</B> toolkit contains efficient, [[sampling-based implementation]]s of [[Latent Dirichlet Allocation]], [[Pachinko Allocation]], and [[Hierarchical LDA]]. [http://mallet.cs.umass.edu/topics.php Quick Start]        <P>        Many of the algorithms in [[MALLET Software Toolkit|MALLET]] depend on <B>numerical optimization</B>. [[MALLET Software Toolkit|MALLET]] includes an efficient implementation of [[Limited Memory BFGS]], among many other [[optimization method]]s. [http://mallet.cs.umass.edu/optimization.php Developer's Guide]        <P>        In addition to sophisticated [[ML Tool|Machine Learning application]]s, [[MALLET Software Toolkit|MALLET]] includes [[routines for transforming text documents into numerical representation]]s that can then be processed efficiently. This process is implemented through a flexible system of "pipes", which handle distinct tasks such as [[tokenizing strings]], [[removing stopwords]], and [[converting sequences into count vectors]]. [http://mallet.cs.umass.edu/import.php Quick Start] [http://mallet.cs.umass.edu/import-devel.php Developer's Guide]        <P>        An add-on package to [[MALLET Software Toolkit|MALLET]], called [[GRMM]], contains support for [[inference in general graphical models]], and [[training of CRFs with arbitrary graphical structure]]. [http://mallet.cs.umass.edu/grmm/index.php About GRMM]


=== 2002 ===
=== 2002 ===

Latest revision as of 07:29, 22 August 2024

A MALLET Software Toolkit is a Java-based machine learning toolkit that is designed for solving natural language processing tasks.



References

2011

2002