2008 PLEDS

From GM-RKB
Jump to navigation Jump to search

Subject Headings: Concept Mention Recognition Task.

Notes

Quotes

Index Terms:

  • Web mining, entity detection, personalization

Abstract

  • With the expansion of the internet, many specialized, high-profile sites have become available that bring very technical subject matter to readers with non-technical backgrounds. While the theme of these sites may be of interest to these readers, the posts themselves may contain terms that non-experts may be unfamiliar with and may wish to know more about. We developed PLEDS, a personalized entity detection system which identifies interesting entities and provides related information for individual users by mining web logs and query logs. The experimental results of a systemic user study shows that with PLEDS's aid, users can experience the benefits of an enriched internet surfing experience.

I. INTRODUCTION

  • With the rapid expansion of the internet, many specialized, high-profile sites have become available that bring highly technical subject matter to readers with non-technical backgrounds. For example, Gizmodo (http://gizmodo.com), Engadget (http://www.engadget.com), and Boing Boing (http://www.boingboing.net) are all popular user-driven sites that present articles containing terms a non-technical reader might not be familiar with. As such, although readers are interested in the themes of such sites, they may get lost in such overly technical terminology, which may result in decreased readership for the site and a negative experience for the reader. For example, one post on digital cameras [1] discusses white balance, but the style of the camera is such that it is likely to be used by more amateur users who may be unfamiliar with the term. They may spend more time researching white balance on other sites, or may feel frustrated by the article and be less likely to return in future. In either case, the user is drawn away from the website and is left with a negative experience overall.
  • It is important to provide services to readers so that they can not only find additional information about more technical terms, but find it quickly as well. In fact, usability studies have shown that this is one of the chief concerns users express when reading articles online [8]. A na¨ıve solution is to create hyperlinks for terms that contain more detailed information on a separate page, and thus allow users to navigate to those pages via the hyperlinks. This idea is exhibited by Wikipedia (http://www.wikipedia.com) in particular, but is problematic for several reasons. Less experienced users are often reluctant to navigate through hyperlinks as they worry about “getting lost” [8] and overlooking the original task. Also, this navigation away from the article presents an interruption in the flow of reading [9], resulting in a negative experience for the reader. Finally, using hyperlinks in this way presents the same information to all users. Some users may find that too many terms are tagged, while others find that not enough are tagged. In the former case, the extra information is not only unnecessary but, depending on the manner of display, excessive tagging may be distracting. In the latter case, frustration may arise from not being able to quickly find the information required. As such, it is necessary to develop a tool that is not only inline, but also personalized for each of its users.
  • To address the above challenges, we developed PLEDS, a personalized entity detection system which identifies interesting entities and provides related information inline. Here, an entity is a keyword or a meaningful short sequence of keywords. PLEDS mines individual and global query logs to find popular concepts, and tags entities related to those concepts, thus finding different entities for each user that they are likely to be interesting to the user. Information is presented in a small pop-up window only when a user clicks on a tagged entity, which solves the problem of numerous pop-up windows appearing as the user unintentionally moves their mouse across the screen.

IV. EXPERIMENTAL RESULTS

  • In our user studies, a large, real web search query log from AOL (http://www.aol.com/) was used, although reduced in size through data cleaning and to increase performance. Data cleaning consisted of removing tuples that consisted solely of punctuation symbols or single letters. At the start of user testing, the size of the global query log used by PLEDS contained 97,471 tuples (the size increases as the system is used). On average, each user had 140 tuples in their local web query log, with 696 users initially in the system. This initial global query log results in 43,014 distinct co-occurrence phrases and this is reduced to 4,287 distinct co-occurrence phrases once they are normalized and passed through a threshold filter as described above.

A. Evaluation Methodology

  • Our evaluation methodology consisted of usability testing, which was conducted once optimized settings for certain parameters in PLEDS had been set. The goal of the evaluation was to measure the quality of entity recommendations provided by the system for a specific navigational session. We present the results of our study in the following section. Usability testing was conducted on PLEDS using a set of volunteers with varying backgrounds, from non-technical users to those who are highly skilled in browsing and navigating the internet. In total we have 6 participants. Participants had a range of educational backgrounds, with 16:7% participants with a highschool diploma, 33:3% participants with a Bachelor’s degree, and 50% participants with a Master’s degree. Participants’ use of computers also ranged from 6¡10 hours per week to 50+ hours per week, indicating that some have more opportunities to become familiar with the internet and other technical computer skills than others. Testing was conducted to determine the ability of PLEDS to adapt to a user’s interest and its performance in comparison with Wikipedia.
  • Each user session consisted of two stages. In the first, participants used PLEDS to browse various Wikipedia pages for 15 minutes. During this time they were permitted to click on entities they were interested in, thus training the system. At the end of this period, users were asked to look at one article of interest to them. The tagging of entities was disabled, so users only saw plain text. They were then asked to identify entities they were interested in. When the session was over, these entities were compared with those identified by the system in its initial and final states. They were also compared with entities as tagged by Wikipedia, and the precision and recall of all three stages were measured.

,

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2008 PLEDSKathleen Tsoukalas
Bin Zhou
Jian Pei
Davor Cubranic
PLEDS: A Personalized Entity Detection System Based on Web Log Mining Techniques10.1109/WAIM.2008.62