2008 LearningfromMultiTopicWebDocume: Difference between revisions

From GM-RKB
Jump to navigation Jump to search
m (Text replacement - "ments]]" to "ment]]s")
m (Text replacement - "ers]] " to "er]]s ")
 
Line 19: Line 19:
=== Abstract ===
=== Abstract ===


[[Contextual advertising|Contextual]] [[advertising on web pages]] has become very popular recently and it poses its own set of unique [[text mining]] [[challenges]]. Often [[advertisers]] wish to either [[target]] (or avoid) some specific [[content]] on [[Webpage|web pages]] which may appear only in a small part of the [[Webpage|page]]. [[Learning]] for these [[targeting tasks]] is difficult since most [[Training Record|training]] [[Webpage|pages]] are [[multi-topic]] and need expensive [[Human Labeling Task|human labeling]] at the [[sub-document level]] for [[accurate training]]. In [[2008_LearningFromMultiTopicWebDocs|this paper]] we investigate ways to learn for [[sub-document classification]] when only [[page level]] [[Training Label|labels]] are available - these [[labels]] only indicate if the [[relevant]] [[Webpage Content|content]] exists in the given [[Webpage|page]] or not. [[We]] propose the application of [[multiple-instance learning]] to [[this task]] to improve the effectiveness of traditional [[methods]]. [[We]] apply [[sub-document classification]] to two different [[task problem|problem]]s in [[contextual advertising]]. One is “[[sensitive content detection]]” where the [[advertiser]] wants to avoid [[content]] relating to [[war content|war]], [[violence content|violence]], [[pornography content|pornography]], etc. even if they occur only in a small part of a [[Webpage|page]]. The second [[task problem|problem]] involves [[opinion mining]] from [[review sites]] - the [[advertiser]] wants to [[detect]] and [[avoid]] [[Negative Opinion|negative opinion]] about their [[product]] when [[Positive Sentiment|positive]], [[Negative Sentiment|negative]] and [[Neutral Sentiment|neutral sentiment]]s co-exist on a [[Webpage|page]]. In both these scenarios we present [[Experimental Result|experimental results]] to show that our [[proposed system]] is able to get good [[block level]] [[labeling]] for free and improve the [[performance]] of [[Existing Algorithm|traditional]] [[Learning Algorithm|learning methods]].
[[Contextual advertising|Contextual]] [[advertising on web pages]] has become very popular recently and it poses its own set of unique [[text mining]] [[challenges]]. Often [[advertiser]]s wish to either [[target]] (or avoid) some specific [[content]] on [[Webpage|web pages]] which may appear only in a small part of the [[Webpage|page]]. [[Learning]] for these [[targeting tasks]] is difficult since most [[Training Record|training]] [[Webpage|pages]] are [[multi-topic]] and need expensive [[Human Labeling Task|human labeling]] at the [[sub-document level]] for [[accurate training]]. In [[2008_LearningFromMultiTopicWebDocs|this paper]] we investigate ways to learn for [[sub-document classification]] when only [[page level]] [[Training Label|labels]] are available - these [[labels]] only indicate if the [[relevant]] [[Webpage Content|content]] exists in the given [[Webpage|page]] or not. [[We]] propose the application of [[multiple-instance learning]] to [[this task]] to improve the effectiveness of traditional [[methods]]. [[We]] apply [[sub-document classification]] to two different [[task problem|problem]]s in [[contextual advertising]]. One is “[[sensitive content detection]]” where the [[advertiser]] wants to avoid [[content]] relating to [[war content|war]], [[violence content|violence]], [[pornography content|pornography]], etc. even if they occur only in a small part of a [[Webpage|page]]. The second [[task problem|problem]] involves [[opinion mining]] from [[review sites]] - the [[advertiser]] wants to [[detect]] and [[avoid]] [[Negative Opinion|negative opinion]] about their [[product]] when [[Positive Sentiment|positive]], [[Negative Sentiment|negative]] and [[Neutral Sentiment|neutral sentiment]]s co-exist on a [[Webpage|page]]. In both these scenarios we present [[Experimental Result|experimental results]] to show that our [[proposed system]] is able to get good [[block level]] [[labeling]] for free and improve the [[performance]] of [[Existing Algorithm|traditional]] [[Learning Algorithm|learning methods]].


----
----

Latest revision as of 00:43, 19 August 2024

Subject Headings:

Notes

Cited By

Quotes

Author Keywords

sub-document classification, contextual advertising, sensitive content detection, opinion mining

Abstract

Contextual advertising on web pages has become very popular recently and it poses its own set of unique text mining challenges. Often advertisers wish to either target (or avoid) some specific content on web pages which may appear only in a small part of the page. Learning for these targeting tasks is difficult since most training pages are multi-topic and need expensive human labeling at the sub-document level for accurate training. In this paper we investigate ways to learn for sub-document classification when only page level labels are available - these labels only indicate if the relevant content exists in the given page or not. We propose the application of multiple-instance learning to this task to improve the effectiveness of traditional methods. We apply sub-document classification to two different problems in contextual advertising. One is “sensitive content detection” where the advertiser wants to avoid content relating to war, violence, pornography, etc. even if they occur only in a small part of a page. The second problem involves opinion mining from review sites - the advertiser wants to detect and avoid negative opinion about their product when positive, negative and neutral sentiments co-exist on a page. In both these scenarios we present experimental results to show that our proposed system is able to get good block level labeling for free and improve the performance of traditional learning methods.



References

,

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2008 LearningfromMultiTopicWebDocumeJohn C. Platt
Arun C. Surendran
Yi Zhang
Mukund Narasimhan
Learning from Multi-topic Web Documents for Contextual AdvertisementKDD-2008 Proceedingshttp://research.microsoft.com/en-us/um/people/acsuren/kdd593-zhang.pdf10.1145/1401890.14020152008