2004 SimplSemantAnnotationsInText
Jump to navigation
Jump to search
- (Gerke, 2004) ⇒ Sebastian Gerke. (2004). “Simplifying Semantic Annotations in Text.” In: Thesis.
Subject Headings:
Notes
Cited By
~ 1 http://scholar.google.com/scholar?cluster=4578543490266638172
Quotes
Abstract
- The advantages of Semantic annotations are extensive: They allow more detailed queries than conventional search engines, precise question-answering instead of returning more or less relevant resources, and rule-based reasoning. Briefly: They can save work. But the creation of semantic metadata requires additional effort that decreases the overall benefit of semantic technologies. The advantage of semantic metadata should outweigh its additional creation cost. This thesis proposes different techniques for simplifying the creation of semantic annotations in text, with a focus on semantic wikis as a semantic authoring environment. Different problems of semantic authoring are tackled in this thesis: First, a wiki parser which allows a syntax-independent storage of content is presented. It allows users to use their preferred wiki syntax, they don’t have to adopt to a fixed syntax. A wiki data model is introduced to store the wiki content. Existing data on the desktop is often spread into “data islands”: different applications already store their data machine-readable, but usually it is not directly available to other applications. It is not possible to create links between resources from different applications. In this thesis, an architecture that facilitates integration of existing data into one semantic application is presented. It extends ActiveRDF, a data store independent RDF library that allows object-oriented access to RDF data, to provide access to this desktop data as if it were RDF data. That lets programmers seamlessly combine data from RDF data stores and desktop applications in their programs. Another problem in semantic knowledge acquisition and authoring is that users have to use a shared vocabulary to create meaning out of annotations. This is a tedious task because finding the appropriate vocabulary often implies looking up vocabulary specifications to ensure a correct usage of vocabulary. To simplify vocabulary lookup, two systems for suggesting annotations for a resource are presented in this thesis. The first suggestion system is based on natural language text, whilst the second is based on existing annotations. The natural language based system applies keyword matching in the text against keywords and local names of URIs in RDF schemas. Due to this simple approach, the results of this system are not optimal. It can serve as a foundation for more sophisticated algorithm for annotation suggestion based on natural language text. For the second approach, where the suggestions are based on existing annotations, two different algorithms are proposed. One is using a classifier approach to identify similar resources and then suggest annotations that are often used among these similar resources. Similar resources are determinedby a similarity measure. Different variations of the algorithm were tested. The qualitative results are quite good (F1 ≥ 0.85), but the runtime performance does not scale well, it is linearly dependent from the number of resources in the knowledge base. Using only a subset of all resources does not yield satisfactory runtime performance (> 2s) without sacrificing too much qualitative performance. The second algorithm for annotation suggestion based on existing annotations uses co-occurrences of predicates. A lookup table containing co-occurrences of all predicates can be computed beforehand. At query time, only this table is used to generate suggestions. The query runtime performance then only depends on the number of different predicates used, not on the number of resources. That means that a bigger knowledge base does not necessarily results in a worse runtime performance. Evaluations of the algorithm show that suggestions are generated in about 0.01 seconds, yielding a F1 measure that is slightly better than those of the similarity-based algorithm (F1 = 0.87).
References
- E. Adar, D. Kargar, and L. A. Stein. Haystack: per-user information environments. In CIKM ’99: Proceedings of the eighth International Conference on Information and knowledge management, pp. 413–422. ACM Press, New York, NY, USA, 1999.
- R. Agrawal, T. Imieliński, and A. Swami. Mining association rules between sets of items in large databases. In SIGMOD ’93: Proceedings of the 1993 ACM SIGMOD International Conference on Management of data, pp. 207–216. ACM Press, New York, NY, USA, 1993.
- J. Aycock and R. N. Horspool. Schr¨odinger’s token. Software - Practice and Experience, 31:803–814, 2001.
- D. Beckett. Turtle - Terse RDF Triple Language. http://www.dajobe.org/2004/01/turtle/, 2004.
- T. Berners-Lee. Notation 3 - a readable language for data on the web. http://www.w3.org/DesignIssues/Notation3.html, 1998.
- T. Berners-Lee and M. Fischetti. Weaving the Web -The Original Design and Ultimate Destiny of the World Wide Web by its Inventor. Harper San Francisco, 1999.
- D. Brickley and R. Guha. RDF Vocabulary Description Language 1.0: RDF Schema. Recommendation, W3C, February 2004.
- A. Z. Broder, S. C. Glassman, M. S. Manasse, and G. Zweig. Syntactic clustering of the web. In Selected papers from the sixth International Conference on World Wide Web, pp. 1157–1166. Elsevier Science Publishers Ltd., Essex, UK, 1997.
- F. Ciravegna. (LP)2, an adaptive algorithm for information extraction from web-related texts. In IJCAI-2001 Workshop on Adaptive Text Extraction and Mining. 2001.
- B. Davis, S. Handschuh, H. Cunningham, and V. Tablan. Further use of controlled natural language for semantic annotation. In: Proceedings ofBIBLIOGRAPHY BIBLIOGRAPHY 71 the 1st International Workshop on Applications and Business Aspects of the Semantic Web (SEBIZ 2006). 2006.
- D. Dhyani, W. K. Ng, and S. S. Bhowmick. A survey of web metrics. ACM Comput. Surv., 34(4):469–503, 2002.
- D. Fensel, H. Lausen, A. Polleres, M. Stollberg, et al. Enabling Semantic Web Services. Springer, Berlin, October 2006.
- S. Handschuh. Creating Ontology-based Metadata by Annotation for the Semantic Web. Ph.D. thesis, University of Karlsruhe, 2005.
- C. Hayes, P. Massa, P. Avesani, and P. Cunningham. An on-line evaluation framework for recommender systems. In Workshop on Personalization and Recommendation in E-Commerce. 2002.
- J. Hayes and C. Gutierrez. Bipartite graphs as intermediate model for RDF. In Third International Semantic Web Conference (ISWC2004), vol. 3298 of Lecture Notes in Computer Science, pp. 47 – 61. SpringerVerlag, Hiroshima, Japan, November 2004.
- J. L. Herlocker, J. A. Konstan, L. G. Terveen, and J. T. Riedl. Evaluating collaborative filtering recommender systems. ACM Trans. Inf. Syst., 22(1):5–53, 2004.
- I. Herman. W3C Semantic Web Activity. http://www.w3.org/2001/sw/.
- M. A. W. Houtsma and A. N. Swami. Set-oriented mining for association rules in relational databases. In ICDE ’95: Proceedings of the Eleventh International Conference on Data Engineering, pp. 25–33. IEEE Computer Society, Washington, DC, USA, 1995.
- J. Kahan, M. Koivunen, E. Prud’Hommeaux, and R. Swick. Annotea: An open RDF infrastructure for shared web annotations. In WWW Conf., pp. 623–632. 2001.
- G. Klyne and J. J. Carroll. Resource description framework (RDF): Concepts and abstract syntax. Recommendation, W3C, February 2004. http://www.w3.org/TR/2004/REC-rdf-concepts-20040210/.
- B. Lund, T. Hammond, M. Flack, and T. Hannay. Social Bookmarking Tools (II): A Case Study - Connotea. D-Lib Magazine, 11(4), April 2005.
- A. Maedche and S. Staab. Semi-automatic engineering of ontologies from text. In: Proceedings of the 12th Internal Conference on Software and Knowledge Engineering. Chicago, USA. July 2000.
,
Author | volume | Date Value | title | type | journal | titleUrl | doi | note | year | |
---|---|---|---|---|---|---|---|---|---|---|
2004 SimplSemantAnnotationsInText | Sebastian Gerke | Simplifying Semantic Annotations in Text | Thesis | http://www.eyaloren.org/pubs/gerke thesis.pdf | 2004 |