2014 DigitalLibraryTechnologiesCompl
- (Fox & Torres, 2014) ⇒ Edward A. Fox, and Ricardo da Silva Torres (editors). (2014). “Digital Library Technologies: Complex Objects, Annotation, Ontologies, Classification, Extraction, and Security.” In: Synthesis Lectures on Information Concepts, Retrieval, and Services Journal, March 2014.In: Synthesis Lectures on Information Concepts, Retrieval, and Services Journal, March 2014. doi:10.2200/S00566ED1V01Y201401ICR033
Subject Headings: Digital Library
Notes
Cited By
Quotes
Author Keywords
- 5S framework; annotation; CINET; classification; complex objects; Crisis/Tragedy/ Recovery network (CTRnet); digital libraries (DLs); ETDs; fingerprints; Flickr; formalization; network science; OAI-ORE; ontologies; security; subdocuments; SuperIDR; superimposed information; text extraction
Abstract
Digital libraries (DLs) have introduced new technologies, as well as leveraging, enhancing, and integrating related technologies, since the early 1990s. These efforts have been enriched through a formal approach, e.g., the 5S (Societies, Scenarios, Spaces, Structures, Streams) framework, which is discussed in two earlier volumes in this series. This volume should help advance work not only in DLs, but also in the WWW and other information systems.
Drawing upon four (Kozievitch, Murthy, Park, Yang) completed and three (Elsherbiny, Farag, Srinivasan) in-process dissertations, as well as the efforts of collaborating researchers and scores of related publications, presentations, tutorials, and reports, this book should advance the DL field with regard to at least six key technologies. By integrating surveys of the state-of-the-art, new research, connections with formalization, case studies, and exercises / projects, this book can serve as a computing or information science textbook. It can support studies in cyber-security, document management, hypertext / hypermedia, IR, knowledge management, LIS, multimedia, and machine learning.
Chapter 1, with a case study on fingerprint collections, focuses on complex (composite, compound) objects, connecting DL and related work on buckets, DCC, and OAI-ORE. Chapter 2, discussing annotations, as in hypertext / hypermedia, emphasizes parts of documents, including images as well as text, managing superimposed information. The SuperIDR system, and prototype efforts with Flickr, should motivate further development and standardization related to annotation, which would benefit all DL and WWW users. Chapter 3, on ontologies, explains how they help with browsing, query expansion, focused crawling, and classification. This chapter connects DLs with the Semantic Web, and uses CTRnet as an example. Chapter 4, on (hierarchical) classification, leverages LIS theory, as well as machine learning, and is important for DLs as well as the WWW. Chapter 5, on extraction from text, covers document segmentation, as well as how to construct a database from heterogeneous collections of references (from ETDs); i.e., converting strings to canonical forms. Chapter 6 surveys the security approaches used in information systems, and explains how those approaches can apply to digital libraries which are not fully open.
Given this rich content, those interested in DLs will be able to find solutions to key problems, using the right technologies and methods. We hope this book will help show how formal approaches can enhance the development of suitable technologies and how they can be better integrated with DLs and other information systems.
Preface
Because of the importance of digital libraries, we integrated, organized, and condensed our related findings and publications into a single volume version of this book series, ultimately over 600 pages in length, that was successfully used in a semester-long class in 2011, as well as field tested at different universities. To make it easier for others to address their need for a digital library textbook, we have re-organized the original book into four parts, to cover: introduction and theoretical foundations, key issues, technologies/extensions, and applications. We are confident that this third book, and the others in the series, address digital library-related needs in many computer science, information science, and library science (e.g., LIS) courses, as well as the requirements of researchers, developers, and practitioners.
The main reason for our confidence is that our 5S (Societies, Scenarios, Spaces, Structures, Streams) framework has broad descriptive power. This is proved in part by the recent expansion of interest related to each of the five Ss, e.g., Social networks, Scenario-based design, geoSpatial databases, Structure-based approaches (e.g., databases, metadata, ontologies, XML), and data Stream management systems.
The first book, Theoretical Foundations for Digital Libraries, the essential opening to the four book series, has three main parts. Chapter 1 is the key to 5S, providing a theoretical foundation for the field of digital libraries in a gentle, intuitive, and easy-to-apply manner. Chapter 2 explains how 5S can be applied to digital libraries in two ways. First, it covers the most important services of digital libraries: browsing, searching, discovery, and visualization. Second, it demonstrates how 5S helps with the design, implementation, and evaluation of an integrated digital library (ETANADL, for archaeology). The third part of book 1, made up of five appendices, demonstrates how 5S enables a formal treatment of digital libraries. It is freely accessible online, at https://sites.google .com/a/morganclaypool.com/dlibrary/.
Book 1 Appendix A gives a small set of definitions that cover the mathematical preliminaries underlying our work. Appendix B builds on that set to define each of the five Ss, and then uses them to define what we consider a minimal digital library. Thus, we allow people asking “Is X a digital library?” to answer that question definitively. Appendix C moves from a minimalist perspective to show how 5S can be used in a real, interesting, and complex application domain: archaeology. Appendix D builds upon all the definitions in Appendices A-C, to describe some key results of using 5S. This includes lemmas, proofs, and 5SSuite (software based on 5S). Finally, Appendix E, the Glossary, explains key terminology. Concluding book 1 is an extensive bibliography and a helpful Index.
The second book in the series, Key Issues Regarding Digital Libraries: Evaluation and Integration, discusses key issues in the digital library field: evaluation and integration. It covers the Information Life Cycle, metrics, and software to help evaluate digital libraries. It uses both archaeology and electronic theses and dissertations to provide additional context, since addressing quality in highly distributed digital libraries is particularly challenging. The following two books of this series are further elaborations of the 5S framework, as well as a comprehensive overview of related work on digital libraries.
This book, third in the series, describes six case studies of extensions beyond a minimal digital library. Its chapters cover: Complex Objects, Annotation, Ontologies, Classification, Text Extraction, and Security. Regarding Complex Objects: While many digital libraries focus on digital objects and/or metadata objects, with support for complex objects, they could easily be extended to handle aggregation and packaging. Fingerprint matching provides a useful context, since there are complex inter-relationships among crime scenes, latent fingerprints, individuals, hands, fingers, fingerprints, and images. Regarding Annotation: This builds upon work on superimposed information, closely related to hypertext, hypermedia, and subdocuments. A case study covers the management of fish images. Regarding Ontologies: We address this key area of knowledge management, also integral to the Semantic Web. As a context, we consider our Crisis, Tragedy, and Recovery Network. That is quite broad, and involves interesting ontology development problems. Regarding Classification: We cover this core area of information retrieval and machine learning, as well as Library and Information Science (LIS). The context is electronic theses and dissertations (ETDs), since many of these works have no categories that can be found in their catalog or metadata records, and since none are categorized at the level of chapters. Regarding Text Extraction: Our coverage also is in the context of ETDs, where the high-level structure should be identified, and where the valuable and voluminous sets of references can be isolated and shifted to canonical representations. Regarding Security:While many digital libraries support open access, it has been clear since the early 1990s that industrial acceptance of digital library systems and technologies depends on their being trusted, requiring an integrated approach to security.
The final book, Digital Library Applications: CBIR, Education, Social Networks, eScience/ Simulation, and GIS, fourth in the series, focuses on digital library applications from a 5S perspective. Regarding CBIR: We move into the multimedia field, focusing on Content-based Image Retrieval (CBIR) — making use, for context, of the previously discussed work on fish images and CTRnet. Regarding Education: We describe systems for collecting, sharing, and providing access to educational resources, namely the AlgoViz and Ensemble systems. This is important since there has been considerable investment in digital libraries to help in education, all based on the fact that devising high-quality educational resources is expensive, making sharing and reuse highly beneficial. Regarding Social Networks: We address very popular current issues, on the Societies side, namely Social Networks and Personalization. Regarding e-Science/Simulation: There has only been a limited adaptation and extension of digital libraries to this important domain. Simulation aids many disciplines to test models and predictions on computers, addressing questions not feasible through other approaches to experimentation. More broadly, in keeping with progress toward e-Science, where data sets and shared information support much broader theories and investigations, we cover (using the SimDL and CINET projects as context) storing and archiving, as well as access and visualization, dealing not only with metadata, but also with specifications of experiments, experimental results, and derivative versions: summaries, findings, reports, and publications. Regarding Geospatial Information (GIS): Many GIS-related technologies are now readily available in cell phones, cameras, and GPS systems. Our coverage (that uses the CTRnet project as context) connects that with metadata, images, and maps.
How can computer scientists connect with all this? Although some of the early curricular guidelines for computing advocated coverage of information, and current guidelines refer to the area of Information Management, generally, courses in this area have focused instead either on data or knowledge. Fortunately, Virginia Tech has had graduate courses on information retrieval since the early 1970s and a senior course on “Multimedia, Hypertext, and Information Access” since the early 1990s. Now, there are offerings at many universities on multimedia, or with titles including keywords like “Web” or “search”. Perhaps parts of this book series will provide a way for computing programs to address all areas of information management, building on a firm, formal, integrated approach. Further, computing professionals should feel comfortable with particular Ss, especially Structures (as in data structures) and Spaces (as in vector spaces), and to lesser extents Streams (related to multimedia) and Scenarios (related to human-computer interaction). Today, especially, there is growing interest in Societies (as in social networks).
How can information scientists connect with all this? Clearly, they are at home with “information” as a key construct. Streams (e.g., sequences of characters or bitstreams) provide a first basis for all types of information. Coupled with Structures, they lead to all types of structured streams, as in documents and multimedia. Spaces may be less clear, but GIS systems are becoming ubiquitous, connecting with GPS, cell phone, Twitter, and other technologies. Scenarios, especially in the form of Services, are at the heart of most information systems. Societies, including users, groups, organizations, and a wide variety of social networks, are central, especially with human-centered design. Thus, information science can easily connect with 5S, and digital libraries are among the most important types of information systems. Accordingly, this book series may fit nicely into capstone courses in information science or information systems. Further, our handling of “information” goes well beyond the narrow view associated with electrical engineering or even computer science; we connect content representations with context and application, across a range of human endeavors, and with semantics, pragmatics, and knowledge.
How can library scientists connect with all this? One might argue that many of the librarians of the future must be trained as digital librarians. Thus, all four books should fit nicely into library science programs. While they could fit into theory or capstone courses, they also might serve well in introductory courses, if the more formal parts are skipped. On the other hand, they could be distributed across the program. Thus, the first book might work well early in a library school program, the second book could fit midway in the program, and the last two books might be covered in specialized courses that connect with technologies or applications. Further, those studying archival science might find the entire series to be of interest, though some topics like preservation are not covered in detail.
How can researchers connect with all this?We hope that those interested in formal approaches will help us expand the coverage of concepts reported herein. A wonderful goal would be to have an elegant formal basis and useful framework for all types of information systems. We also hope that the theses and dissertations related to this volume, all online (thanks to Virginia Tech’s ETD initiative), will provide an even more in-depth coverage of the key topics covered herein. We hope you can build on this foundation to aid in your own research, as you advance the field further. How can developers connect with all this?We hope that concepts, ideas, methods, techniques, systems, and approaches described herein will guide you to develop, implement, and deploy even better digital libraries. There should be less time “reinventing the wheel.” Perhaps this will stimulate the emergence of a vibrant software and services industry as more and more digital libraries emerge. Further, if there is agreement on key concepts, then there should be improvements in: interoperability, integration, and understanding. Accordingly, we hope you can leverage this work to advance practices as well as provide better systems and services.
Even if you, the reader, do not fit clearly into the groups discussed above, we hope you nevertheless will find this book series interesting. Given the rich content, we trust that those interested in digital libraries, or in related systems, will find this book to be intellectually satisfying, illuminating, and helpful. We hope the full series will help move digital libraries forward into a science as well as a practice. We hope too that this four book series will broadly address the needs of the next generation of digital librarians. Please share with us and others what ways you found these books to be useful and helpful!
References
;
Author | volume | Date Value | title | type | journal | titleUrl | doi | note | year | |
---|---|---|---|---|---|---|---|---|---|---|
2014 DigitalLibraryTechnologiesCompl | Edward A. Fox Ricardo da Silva Torres | Digital Library Technologies: Complex Objects, Annotation, Ontologies, Classification, Extraction, and Security | 10.2200/S00566ED1V01Y201401ICR033 | 2014 |