2007 ExtractingChatbotKnowledgefromO

(Huang et al., 2007) ⇒ Jizhou Huang, Ming Zhou, and Dan Yang. (2007). “Extracting Chatbot Knowledge from Online Discussion Forums.” In: Proceedings of the 20th international joint conference on Artifical intelligence.

Subject Headings: Chatbot, Online Discussion Forum.

Notes

Cited By

Quotes

Abstract

This paper presents a novel approach for extracting high-quality 'thread-title, reply' pairs as chat knowledge from online discussion forums so as to efficiently support the construction of a chatbot for a certain domain. Given a forum, the high-quality 'thread-title, reply' pairs are extracted using a cascaded framework. First, the replies logically relevant to the thread title of the root message are extracted with an SVM classifier from all the replies, based on correlations such as structure and content. Then, the extracted 'thread-title, reply' pairs are ranked with a ranking SVM based on their content qualities. Finally, the Top-N thread-title, reply' pairs are selected as chatbot knowledge. Results from experiments conducted within a movie forum show the proposed approach is effective.

1 Introduction

A chatbot is a conversational agent that interacts with users in a certain domain or on a certain topic with natural language sentences. Normally, a chatbot works by a user asking a question or making a comment, with the chatbot answering the question, or making a comment, or initiating a new topic. Many chatbots have been deployed on the Internet for the purpose of seeking information, site guidance, FAQ answering, and so on, in a strictly limited domain. Existing famous chatbot systems include ELIZA [Weizenbaum, 1966], PARRY [Colby, 1973] and ALICE[1]. Most existing chatbots consist of dialog management modules to control the conversation process and chatbot knowledge bases to response to user input. Typical implementation of chatbot knowledge bases contains a set of templates that match user inputs and generate responses. Templates currently used in chatbots, however, are hand coded. Therefore, the construction of chatbot knowledge bases is time consuming, and difficult to adapt to new domains.

An online discussion forum is a web community that allows people to discuss common topics, exchange ideas, and share information in a certain domain, such as sports, movies, and so on. Creating threads and posting replies are major user behaviors in forum discussions. Large repositories of archived threads and reply records in online discussion forums contain a great deal of human knowledge on many topics. In addition to rich information, the reply styles from authors are diverse. We believe that high-quality replies of a thread, if mined, could be of great value to the construction of a chatbot for certain domains. In this paper, we propose a novel approach for extracting high-quality <thread-title, reply> pairs from online discussion forums to supplement chatbot knowledge base. Given a forum, the high-quality <thread-title, reply> pairs are extracted using a cascaded framework. First, the replies logically relevant to the thread title of the root message are extracted with an SVM classifier from all the replies, based on correlations such as structure and content. Then, the extracted <thread-title, reply> pairs are ranked with a ranking SVM based on their content qualities. Finally, the Top-N <thread-title, reply> pairs are selected as chatbot knowledge. The rest of this paper is organized as follows. Important related work is introduced in Section 2. Section 3 outlines the characteristics of online discussion forums with the explanations of the challenges of extracting stable <threadtitle, reply> pairs. Section 4 presents our proposed cascaded framework. Experimental results are reported in Section 5. Section 6 presents comparison of our approach with other related work. The conclusion and the future work are provided in Section 7. 2 Related Work By “chatbot knowledge extraction” throughout this paper, we mean extracting the pairs of <input, response> from online resources. Based on our study of the literature, there is no published

References

;

	Author	volume	Date Value	title	type	journal	titleUrl	doi	note	year
2007 ExtractingChatbotKnowledgefromO	Ming Zhou Jizhou Huang Dan Yang			Extracting Chatbot Knowledge from Online Discussion Forums						2007