WVTool System
(Redirected from WVTool)
Jump to navigation
Jump to search
A WVTool System is a Java-based open source data transformation system for natural language processing tasks.
- AKA: Word & Web Vector Tool.
- See: RapidMiner System, GPL.
References
2009
- http://nemoz.org/joomla/content/view/43/83/lang,en/
- The Word & Web Vector Tool is a flexible Java library for statistical language modeling and integration of Web and Webservice based data sources. It supports the creation of word vector representations of text documents in the vector space model that is the point of departure for many text processing applications (e.g. text classification or information retrieval). Furthermore, it offers convenient interactive methods to extract data from structured sources, such was HTML or XML files. Finally, it allows to integrate external data by using Webservice APIs in a mashup-like way (e.g. for geo-mapping).
- The aim of the WVTool is to provide a simple to use, simple to extend pure Java library for text and webmining. It is tightly integrated with the RapidMiner Data Mining suite (formerly known as Yale) allowing to apply data to text and web data in a convenient way. The WVTool bridges a gap between highly sophisticated linguistic packages on the one side and proprietary or specialized partial solutions on the other side.
- Related Links
- The WVTool contains an interface to the Websphinx personal web crawler.
- We integrate WordNet support using the Java WordNet Library.
- The WVTool contains the Snowball stemmer package.
- To support XPath expressions on any kind of bad html code, we use the Tagsoup HTML Parser.
- The support of PDF documents is provided by the PDF Box package.