Data-Driven Sentiment Analysis Task
Jump to navigation
Jump to search
A Data-Driven Sentiment Analysis Task is a sentiment analysis task that is a data-driven analysis task.
- AKA: Opinion Mining.
- Context:
- Input: a Linguistic Expression (often a text item).
- optional: an Annotated Corpus.
- output: a Sentiment Category Label.
- It can be solved by an Sentiment Analysis System that applies an (Sentiment Analysis Algorithm.
- It can be a Sentiment Classification Task, such as:
- …
- Input: a Linguistic Expression (often a text item).
- Example(s):
- based on Stanford Sentiment Treebank.
- …
- Counter-Example(s):
- See: Opinion Word.
References
2015
- (Liu, 2015) ⇒ Bing Liu. (2015). “Sentiment Analysis: Opinions, Sentiment, and Emotion in Text." Cambridge University Press. ISBN:9781107017894
- QUOTE: Sentiment analysis is the computational study of people's opinions, sentiments, emotions, and attitudes.
- (Wikipedia, 2015) ⇒ http://en.wikipedia.org/wiki/sentiment_analysis Retrieved:2015-5-25.
- Sentiment analysis (also known as opinion mining) refers to the use of natural language processing, text analysis and computational linguistics to identify and extract subjective information in source materials.
Generally speaking, sentiment analysis aims to determine the attitude of a speaker or a writer with respect to some topic or the overall contextual polarity of a document. The attitude may be his or her judgment or evaluation (see appraisal theory), affective state (that is to say, the emotional state of the author when writing), or the intended emotional communication (that is to say, the emotional effect the author wishes to have on the reader).
- Sentiment analysis (also known as opinion mining) refers to the use of natural language processing, text analysis and computational linguistics to identify and extract subjective information in source materials.
2011
- (Sammut & Webb, 2011) ⇒ Claude Sammut, and Geoffrey I. Webb. (2011). “Opinion Mining.” In: (Sammut & Webb, 2011) p.743
2008
- (Ding et al., 2008) ⇒ Xiaowen Ding, Bing Liu, and Philip S. Yu. (2008). “A Holistic Lexicon-based Approach to Opinion Mining.” In: Proceedings of the International Conference on Web Search and Web Data Mining (WSDM 2008). doi:10.1145/1341531.1341561
- QUOTE: One of the important types of information on the Web is the opinions expressed in the user generated content, e.g., customer reviews of products, forum posts, and blogs. In this paper, we focus on customer reviews of products. In particular, we study the problem of determining the semantic orientations (positive, negative or neutral) of opinions expressed on product features in reviews. This problem has many applications, e.g., opinion mining, summarization and search.
- (Pang & Lee) ⇒ Bo Pang, and Lillian Lee. (2008). “Opinion Mining and Sentiment Analysis." Now Publishers Inc.
- QUOTE: Thus, when broad interpretations are applied, "sentiment analysis” and "opinion mining” denote the same field of study (which itself can be considered a sub-are of subjectivity analysis). We have attempted to use these terms more or less interchangeably in this survey. This is in no small part because we view the field as representing a unified body of work, and would thus like to encourage researchers in the area to share terminology regardless of the publication venues at which their papers might appear.
2007
- (Archak et al., 2007) ⇒ Nikolay Archak, Anindya Ghose, and Panagiotis G. Ipeirotis. (2007). “Show Me the Money!: Deriving the pricing power of product features by mining consumer reviews.” In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2007). doi:10.1145/1281192.1281202.
- ABSTRACT: The increasing pervasiveness of the Internet has dramatically changed the way that consumers shop for goods. Consumer-generated product reviews have become a valuable source of information for customers, who read the reviews and decide whether to buy the product based on the information provided. In this paper, we use techniques that decompose the reviews into segments that evaluate the individual characteristics of a product (e.g., image quality and battery life for a digital camera). Then, as a major contribution of this paper, we adapt methods from the econometrics literature, specifically the hedonic regression concept, to estimate: (a) the weight that customers place on each individual product feature, (b) the implicit evaluation score that customers assign to each feature, and (c) how these evaluations affect the revenue for a given product. Towards this goal, we develop a novel hybrid technique combining text mining and econometrics that models consumer product reviews as elements in a tensor product of feature and evaluation spaces. We then impute the quantitative impact of consumer reviews on product demand as a linear functional from this tensor product space. We demonstrate how to use a low-dimension approximation of this functional to significantly reduce the number of model parameters, while still providing good experimental results. We evaluate our technique using a data set from Amazon.com consisting of sales data and the related consumer reviews posted over a 15-month period for 242 products. Our experimental evaluation shows that we can extract actionable business intelligence from the data and better understand the customer preferences and actions. We also show that the textual portion of the reviews can improve product sales prediction compared to a baseline technique that simply relies on numeric data.
2006
- (Esuli & Sebastiani, 2006) ⇒ Andrea Esuli, and Fabrizio Sebastiani. (2006). “SentiWordNet: A Publicly Available Lexical Resource for Opinion Mining.” In: Proceedings of LREC 2006.
- QUOTE:Opinion mining (OM) is a recent subdiscipline at the crossroads of information retrieval and computational linguistics which is concerned not with the topic a document is about, but with the opinion it expresses. OM has a rich set of applications, ranging from tracking users’ opinions about products or about political candidates as expressed in online forums, to customer relationship management. In order to aid the extraction of opinions from text, recent research has tried to automatically determine the “PN-polarity” of subjective terms, i.e. identify whether a term that is a marker of opinionated content has a positive or a negative connotation. Research on determining whether a term is indeed a marker of opinionated content (a subjective term) or not (an objective term) has been, instead, much more scarce. In this work we describe SENTIWORDNET, a lexical resource in which eachWORDNET synset s is associated to three numerical scores Obj(s), Pos(s) and Neg(s), describing how objective, positive, and negative the terms contained in the synset are. The method used to develop SENTIWORDNET is based on the quantitative analysis of the glosses associated to synsets, and on the use of the resulting vectorial term representations for semi-supervised synset classification. The three scores are derived by combining the results produced by a committee of eight ternary classifiers, all characterized by similar accuracy levels but different classification behaviour. SENTIWORDNET is freely available for research purposes, and is endowed with a Web-based graphical user interface.
- (Choi et al., 2006) ⇒ Yejin Choi, Eric Breck, and Claire Cardie. (2006). “Joint Extraction of Entities and Relations for Opinion Recognition.” In: Proceedings of Empirical Methods in Natural Language Processing (EMNLP 2006).
- QUOTE: We present an approach for the joint extraction of entities and relations in the context of opinion recognition and analysis. We identify two types of opinion-related entities — expressions of opinions and sources of opinions — along with the linking relation that exists between them. …
2005
- (Choi et al., 2005) ⇒ Yejin Choi, Claire Cardie, Ellen Riloff, and Siddharth Patwardhan. (2005). “Identifying Sources of Opinions with Conditional Random Fields and Extraction Patterns.” In: Proceedings of the Human Language Technology Conference/Conference on Empirical Methods in Natural Language Processing (HLT-EMNLP 2005).
- QUOTE: Recent systems have been developed for sentiment classification, opinion recognition, and opinion analysis (e.g., detecting polarity and strength). We pursue another aspect of opinion analysis: identifying the sources of opinions, emotions, and sentiments. We view this problem as an information extraction task and adopt a hybrid approach that combines Conditional Random Fields (Lafferty et al., 2001) and a variation of AutoSlog (Riloff, 1996a). While CRFs model source identification as a sequence tagging task, AutoSlog learns extraction patterns. Our results show that the combination of these two methods performs better than either one alone. ...
- (Popescu & Etzioni, 2005) ⇒ Ana-Maria Popescu, and Oren Etzioni. (2005). “Extracting Product Features and Opinions from Reviews.” In: Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing (HLT-EMNLP 2005).
- QUOTE:Consumers are often forced to wade through many on-line reviews in order to make an informed product choice. This paper introduces Opine, an unsupervised information-extraction system which mines reviews in order to build a model of important product features, their evaluation by reviewers, and their relative quality across products. …
2004
- (Hu & Liu, 2004) ⇒ Minqing Hu, and Bing Liu . (2004). “Mining and Summarizing Customer Reviews.” In: Proceedings of the tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2004).
- QUOTE: … This summarization task is different from traditional text summarization because we only mine the features of the product on which the customers have expressed their opinions and whether the opinions are positive or negative. We do not summarize the reviews by selecting a subset or rewrite some of the original sentences from the reviews to capture the main points as in the classic text summarization. Our task is performed in three steps: (1) mining product features that have been commented on by customers; (2) identifying opinion sentences in each review and deciding whether each opinion sentence is positive or negative; (3) summarizing the results. This paper proposes several novel techniques to perform these tasks. Our experimental results using reviews of a number of products sold online demonstrate the effectiveness of the techniques.
- (Kim & Hovy, 2004) ⇒ Soo-Min Kim, and Eduard Hovy. (2004). “Determining the Sentiment of Opinions.” In: Proceedings of the 20th International Conference on Computational Linguistics (ACL 2004). doi:10.3115/1220355.1220555
- QUOTE: Identifying sentiments (the affective parts of opinions) is a challenging problem. We present a system that, given a topic, automatically finds the people who hold opinions about that topic and the sentiment of each opinion. The system contains a module for determining word sentiment and another for combining sentiments within a sentence. We experiment with various models of classifying and combining sentiment at word and sentence levels, with promising results.
- (Wiebe et al., 2004) ⇒ Janyce M. Wiebe, Theresa Wilson, Rebecca Bruce, Matthew Bell, and Melanie Martin. (2004). “Learning Subjective Language.” In: Computational Linguistics, 30(3). doi:10.1162/0891201041850885
2003
- (Dave et al., 2003) ⇒ Kushal Dave, Steve Lawrence, and David M. Pennock. (2003). “Mining the Peanut Gallery: Opinion extraction and semantic classification of product reviews.” In: Proceedings of the 12th International Conference on World Wide Web (WWW 2003). doi:10.1145/775152.775226
- QUOTE: The web contains a wealth of product reviews, but sifting through them is a daunting task. Ideally, an opinion mining tool would process a set of search results for a given item, generating a list of product attributes (quality, features, etc.) and aggregating opinions about each of them (poor, mixed, good).
2002
- (Pang et al., 2002) ⇒ Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan. (2002). “Thumbs up?: Sentiment Classification Using Machine Learning Techniques.” In: Proceedings of the ACL-2002 Conference on Empirical Methods in Natural Language Processing
- QUOTE: We consider the problem of classifying documents not by topic, but by overall sentiment, e.g., determining whether a review is positive or negative. Using movie reviews as data, we find that standard machine learning techniques definitively outperform human-produced baselines. However, the three machine learning methods we employed (Naive Bayes, maximum entropy classification, and support vector machines) do not perform as well on sentiment classification as on traditional topic-based categorization. We conclude by examining factors that make the sentiment classification problem more challenging.
- (Turney, 2002) ⇒ Peter D. Turney. (2002). “Thumbs up or Thumbs Down?: Semantic orientation applied to unsupervised classification of reviews.” In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics (ACL 2002).
- QUOTE: This paper presents a simple unsupervised learning algorithm for classifying reviews as recommended (thumbs up) or not recommended (thumbs down). The classification of a review is predicted by the average semantic orientation of the phrases in the review that contain adjectives or adverbs. A phrase has a positive semantic orientation when it has good associations (e.g., "subtle nuances") and a negative semantic orientation when it has bad associations (e.g., "very cavalier"). In this paper, the semantic orientation of a phrase is calculated as the mutual information between the given phrase and the word “excellent” minus the mutual information between the given phrase and the word “poor”. A review is classified as recommended if the average semantic orientation of its phrases is positive. The algorithm achieves an average accuracy of 74% when evaluated on 410 reviews from Epinions, sampled from four different domains (reviews of automobiles, banks, movies, and travel destinations). The accuracy ranges from 84% for automobile reviews to 66% for movie reviews.