Text-Data Analytics Task
A Text-Data Analytics Task is a data analytics task that is a text processing task (whose task input is a text-rich dataset).
- Context:
- Input: Text Dataset.
- output: Structured Data.
- It can (often) include Text Visualization.
- It can (often) be performed by a Text-Data Analyst.
- It can be supported by a Text-Data Analytics-Supporting System (such as a text mining system).
- It can range from being a General Text Data Analytics Task to being a Domain-Specific Text Data Analytics Task.
- It can range from being a Heuristic Text Mining Task to being a Data-Driven Text Mining Task.
- It can be supported by an Natural Language Processing (NLP) Task.
- It can be performed by a Text Analyst.
- ...
- Example(s):
- a Text Clustering Task.
- a Topic Tracking Task.
- a Text Classification Task.
- a Information Extraction Task.
- a Terminology Mining Task.
- a Sentiment Detection Task.
- a Biomedical Text Mining Task, Text Mining for Advertising, Text Mining for News and Blogs Analysis, Text Mining for Spam Filtering, ...
- a Literature Mining Task if the text dataset is restricted to a domain literature.
- a Knowledge-based Text Mining Task, such as ontological text mining.
- a Domain-Specific Text-Data Analytics Task, such as legal text analytics.
- Text Mining for the Semantic Web.
- …
- Counter-Example(s):
- See: Semantic Analysis Task; Cross-lingual Text Mining; Feature Construction in Text Mining; Feature Selection in Text Mining.
References
2015a
- (Wikipedia, 2015) ⇒ http://en.wikipedia.org/wiki/text_mining Retrieved:2015-4-1.
- Text mining, also referred to as text data mining, roughly equivalent to text analytics, refers to the process of deriving high-quality information from text. High-quality information is typically derived through the devising of patterns and trends through means such as statistical pattern learning. Text mining usually involves the process of structuring the input text (usually parsing, along with the addition of some derived linguistic features and the removal of others, and subsequent insertion into a database), deriving patterns within the structured data, and finally evaluation and interpretation of the output. 'High quality' in text mining usually refers to some combination of relevance, novelty, and interestingness. Typical text mining tasks include text categorization, text clustering, concept/entity extraction, production of granular taxonomies, sentiment analysis, document summarization, and entity relation modeling (i.e., learning relations between named entities).
Text analysis involves information retrieval, lexical analysis to study word frequency distributions, pattern recognition, tagging/annotation, information extraction, data mining techniques including link and association analysis, visualization, and predictive analytics. The overarching goal is, essentially, to turn text into data for analysis, via application of natural language processing (NLP) and analytical methods.
A typical application is to scan a set of documents written in a natural language and either model the document set for predictive classification purposes or populate a database or search index with the information extracted.
- Text mining, also referred to as text data mining, roughly equivalent to text analytics, refers to the process of deriving high-quality information from text. High-quality information is typically derived through the devising of patterns and trends through means such as statistical pattern learning. Text mining usually involves the process of structuring the input text (usually parsing, along with the addition of some derived linguistic features and the removal of others, and subsequent insertion into a database), deriving patterns within the structured data, and finally evaluation and interpretation of the output. 'High quality' in text mining usually refers to some combination of relevance, novelty, and interestingness. Typical text mining tasks include text categorization, text clustering, concept/entity extraction, production of granular taxonomies, sentiment analysis, document summarization, and entity relation modeling (i.e., learning relations between named entities).
2015b
- (Wikipedia, 2015) ⇒ http://en.wikipedia.org/wiki/text_mining#Text_analysis_processes Retrieved:2015-4-1.
- Subtasks — components of a larger text-analytics effort — typically include:
- Information retrieval or identification of a corpus is a preparatory step: collecting or identifying a set of textual materials, on the Web or held in a file system, database, or content management system, for analysis.
- Although some text analytics systems apply exclusively advanced statistical methods, many others apply more extensive natural language processing, such as part of speech tagging, syntactic parsing, and other types of linguistic analysis.
- Subtasks — components of a larger text-analytics effort — typically include:
- Named entity recognition is the use of gazetteers or statistical techniques to identify named text features: people, organizations, place names, stock ticker symbols, certain abbreviations, and so on. Disambiguation — the use of contextual clues — may be required to decide where, for instance, "Ford" can refer to a former U.S. president, a vehicle manufacturer, a movie star, a river crossing, or some other entity.
- Recognition of Pattern Identified Entities: Features such as telephone numbers, e-mail addresses, quantities (with units) can be discerned via regular expression or other pattern matches.
- Coreference: identification of noun phrases and other terms that refer to the same object.
- Relationship, fact, and event Extraction: identification of associations among entities and other information in text
- Sentiment analysis involves discerning subjective (as opposed to factual) material and extracting various forms of attitudinal information: sentiment, opinion, mood, and emotion. Text analytics techniques are helpful in analyzing sentiment at the entity, concept, or topic level and in distinguishing opinion holder and opinion object. * Quantitative text analysis is a set of techniques stemming from the social sciences where either a human judge or a computer extracts semantic or grammatical relationships between words in order to find out the meaning or stylistic patterns of, usually, a casual personal text for the purpose of psychological profiling etc.
2015c
- (Wikipedia, 2015) ⇒ http://en.wikipedia.org/wiki/text_mining#Applications Retrieved:2015-4-1.
- The technology is now broadly applied for a wide variety of government, research, and business needs. Applications can be sorted into a number of categories by analysis type or by business function. Using this approach to classifying solutions, application categories include:
- Enterprise Business Intelligence Data Mining, Competitive Intelligence.
- E-Discovery, Records Management
- National Security/Intelligence
- Scientific discovery, especially Life Sciences
- Sentiment Analysis Tools, Listening Platforms
- Natural Language/Semantic Toolkit or Service
- Publishing
- Automated ad placement.
- Search/Information Access
- Social media monitoring
- The technology is now broadly applied for a wide variety of government, research, and business needs. Applications can be sorted into a number of categories by analysis type or by business function. Using this approach to classifying solutions, application categories include:
2011
- (Mladenić, 2011b) ⇒ Dunja Mladenić. (2011). “Text Mining.” In: (Sammut & Webb, 2011) p.962
- QUOTE: The term textmining is used analogous to data mining when the data is text. As there are some data specificities when handling text compared to handling data from databases, text mining has a number of specific methods and approaches. Some of these are extensions of data mining and machine learning methods, while other are rather text-specific. Text mining approaches combine methods from several related fields, including machine learning, data mining, information retrieval, natural language processing, statistical learning, and the Semantic Web. Basic text mining approaches are also extended to enable handling different natural languages (cross-lingual text mining) and are combined with methods for handling different data types, such as links and graphs (link mining and link discovery, graph mining), images and video (multimedia mining).
2010
- (Wikipedia - Text Analytics, 2010) ⇒ http://en.wikipedia.org/wiki/Text_analytics
- The term text analytics describes a set of linguistic, statistical, and machine learning techniques that model and structure the information content of textual sources for business intelligence, exploratory data analysis, research, or investigation.[1] The term is roughly synonymous with text mining; indeed, Prof. Ronen Feldman modified a 2000 description of "text mining"[2] in 2004 to describe "text analytics."[3] The latter term is now used more frequently in business settings while "text mining" is used in some of the earliest application areas, dating to the 1980s,[4] notably life-sciences research and government intelligence.
Text analytics involves information retrieval, lexical analysis to study word frequency distributions, pattern recognition, tagging/annotation, information extraction, data mining techniques including link and association analysis, visualization, and predictive analytics. The overarching goal is, essentially, to turn text into data for analysis via application of natural language processing (NLP) and analytical methods.
The term also describes that application of text analytics to respond to business problems, whether independently or in conjunction with query and analysis of fielded, numerical data. It is a truism that 80 percent of business-relevant information originates in unstructured form, primarily text.[5] These techniques and processes discover and present knowledge – facts, business rules, and relationships – that is otherwise locked in textual form, impenetrable to automated processing.
A typical application is to scan a set of documents written in a natural language and either model the document set for predictive classification purposes or populate a database or search index with the information extracted.
- The term text analytics describes a set of linguistic, statistical, and machine learning techniques that model and structure the information content of textual sources for business intelligence, exploratory data analysis, research, or investigation.[1] The term is roughly synonymous with text mining; indeed, Prof. Ronen Feldman modified a 2000 description of "text mining"[2] in 2004 to describe "text analytics."[3] The latter term is now used more frequently in business settings while "text mining" is used in some of the earliest application areas, dating to the 1980s,[4] notably life-sciences research and government intelligence.
2009
- (Li, 2009) ⇒ Yanjun Li, organizer. (2009). “Special Session on Text and Web Mining - Call for Papers."
- QUOTE: Text mining has been defined as 'the automated discovery of new, previously unknown information by automatically extracting information from different written resources.' Text mining operates on structured data from XML files or unstructured or semi-structured data sets (such as email, full-text documents, and HTML files). Text mining applications include information extraction, topic tracking, summarization, categorization, clustering, concept linkage, information visualization, and question answering. Web mining is the application of data mining techniques to discover patterns from the World Wide Web and includes web usage mining, web content mining, and web structure mining. Web mining applications are in high demand since they can be used to improve the effectiveness of search engines.
2008
- (Bilisoly, 2008) ⇒ Roger Bilisoly. (2008). “Practical Text Mining with Perl.” Wiley Series on Methods and Applications in Data Mining
2005a
- (Kao & Poteet, 2005) ⇒ Anne Kao, and Steve Poteet. (2005). “Text Mining and Natural Language Processing: introduction for the special issue.” In: ACM SIGKDD Explorations Newsletter, 7(1). doi:10.1145/1089815.1089816
2005b
- Hsinchun Chen, Sherrilynne S. Fuller, William ... - 2005 - Medical - 647 pages
- Examples of text mining applications include document classification, document clustering, entity extraction, information extraction, and summarization.
2005c
- (Chen et al., 2005) ⇒ Hsinchun Chen, Sherrilynne S. Fuller, and William Hersh. (2005). “Medical Informatics: knowledge management and data mining in biomedicine." Springer. ISBN:038724381X,
- QUOTE:Text mining aims to extract useful knowledge from textual data or documents (Hearst, 1999; Chen, 2001). Although text mining is often considered a subfield of data mining, some text mining techniques have originated from other disciplines, such as information retrieval, information visualization, computational linguistics, and information science. Examples of text mining applications include document classification, document clustering, entity extraction, information extraction, and summarization.
Most knowledge management, data mining, and text mining techniques involve learning patterns from existing data or information, and are therefore built upon the foundations of machine learning and artificial intelligence. In the following, we review several major paradigms in machine learning, important evaluation methodologies, and their applicability in biomedicine.
- QUOTE:Text mining aims to extract useful knowledge from textual data or documents (Hearst, 1999; Chen, 2001). Although text mining is often considered a subfield of data mining, some text mining techniques have originated from other disciplines, such as information retrieval, information visualization, computational linguistics, and information science. Examples of text mining applications include document classification, document clustering, entity extraction, information extraction, and summarization.
2004
- (Weiss et al., 2004) ⇒ Sholom M. Weiss, Nitin Indurkhya, Tong Zhang, and Fred Damerau. (2004). “Text Mining: Predictive Methods for Analyzing Unstructured Information.” Springer.
2001
- (Chen, 2001).