2000 WebUsageMiningDiscoveryandAppli

(Srivastava et al., 2000) ⇒ Jaideep Srivastava, Robert Cooley, Mukund Deshpande, and Pang-Ning Tan. (2000). “Web Usage Mining: Discovery and Applications of Usage Patterns from Web Data.” In: ACM SIGKDD Explorations Newsletter Journal, 1(2). doi:10.1145/846183.846188

Subject Headings: Web Visitor Clustering.

Notes

Cited By

Quotes

Keywords

Data Mining, World Wide Web, Web Usage Mining.

Abstract

Web usage mining is the application of data mining techniques to discover usage patterns from Web data, in order to understand and better serve the needs of Web-based applications. Web usage mining consists of three phases, namely preprocessing, pattern discovery, and pattern analysis. This paper describes each of these phases in detail. Given its application potential, Web usage mining has seen a rapid increase in interest, from both the research and practice communities. This paper provides a detailed taxonomy of the work in this area, including research efforts as well as commercial offerings. An up-to-date survey of the existing work is also provided. Finally, a brief overview of the WebSIFT system as an example of a prototypical Web usage mining system is given.

1. INTRODUCTION

The ease and speed with which business transactions can be carried out over the Web has been a key driving force in the rapid growth of electronic commerce. Specifically, ecommerce activity that involves the end user is undergoing a significant revolution. The ability to track users' browsing behavior down to individual mouse clicks has brought the vendor and end customer closer than ever before. It is now possible for a vendor to personalize his product message for individual customers at a massive scale, a phenomenon that is being referred to as mass customization.

The scenario described above is one of many possible applications of Web Usage mining, which is the process of applying data mining techniques to the discovery of usage patterns from Web data, targeted towards various applications. Data mining efforts associated with the Web, called Web mining, can be broadly divided into three classes, i.e. content mining, usage mining, and structure mining. Web Structure mining projects such as [34; 54] and Web Content mining projects such as [47; 21] are beyond the scope of this survey. An early taxonomy of Web mining is provided in [29], which also describes the architecture of the WebMiner system [42], one of the first systems for Web Usage mining. The proceedings of the recent WebKDD workshop [41], held in conjunction with the KDD-1999 conference, provides a sampling of some of the current research being performed in the area of Web Usage Analysis, including Web Usage mining. This paper provides an up-to-date survey of Web Usage mining, including both academic and industrial research efforts, as well as commercial offerings. Section 2 describes the various kinds of Web data that can be useful for Web Usage mining. Section 3 discusses the challenges involved in discovering usage patterns from Web data. The three phases are preprocessing, pattern discovery, and patterns analysis. Section 4 provides a detailed taxonomy and survey of the existing efforts in Web Usage mining, and Section 5 gives an overview of the WebSIFT system [31], as a prototypical example of a Web Usage mining system, finally, Section 6 discusses privacy concerns and Section 7 concludes the paper.

2. WEB DATA

One of the key steps in Knowledge Discovery in Databases [33] is to create a suitable target data set for the data mining tasks. In Web Mining, data can be collected at the server-side, client-side, proxy servers, or obtained from an organization's database (which contains business data or consolidated Web data). Each type of data collection differs not only in terms of the location of the data source, but also the kinds of data available, the segment of population from which the data was collected, and its method of implementation.

There are many kinds of data that can be used in Web Mining. This paper classifies such data into the following types:

Content: The real data in the Web pages, i.e. the data the Web page was designed to convey to the users. This usually consists of, but is not limited t6; " text and graphics.
Structure: Data which describes the organization of the content. Intra-page structure information includes the arrangement of various HTML or XML tags within a given page. This can be represented as a tree structure, where the (html) tag becomes the root of the tree. The principal kind of inter-page structure information is hyper-links connecting one page to another.
Usage: Data that describes the pattern of usage of Web pages, such as IP addresses, page references, and the date and time of accesses.
User Profile: Data that provides demographic information about users of the Web site. This includes registration data and customer profile information.

2.1 Data Sources

…

References

;

	Author	volume	Date Value	title	type	journal	titleUrl	doi	note	year
2000 WebUsageMiningDiscoveryandAppli	Pang-Ning Tan Jaideep Srivastava Robert Cooley Mukund Deshpande			Web Usage Mining: Discovery and Applications of Usage Patterns from Web Data				10.1145/846183.846188		2000