This paper deals with the problem of modeling Web information resources using expert knowledge and personalized user information for improved Web searching capabilities. We propose a "Web information space" model, which is composed of Web-based information resources (HTML/XML [Hypertext Markup Language/Extensible Markup Language] documents on the Web), expert advice repositories (domain-expert-specified metadata for information resources), and personalized information about users (captured as user profiles that indicate users' preferences about experts as well as users' knowledge about topics).Expert advice, the heart of the Web information space model, is specified using topics and relationships among topics (called metalinks), along the lines of the recently proposed topic maps. Topics and metalinks constitute metadata that describe the contents of the underlying HTML/XML Web resources. The metadata specification process is semiautomated, and it exploits XML DTDs (Document Type Definition) to allow domain-expert guided mapping of DTD elements to topics and metalinks. The expert advice is stored in an object-relational database management system (DBMS).To demonstrate the practicality and usability of the proposed Web information space model, we created a prototype expert advice repository of more than one million topics/metalinks for DBLP (Database and Logic Programming) Bibliography data set. We also present a query interface that provides sophisticated querying facilities for DBLP Bibliography resources using the expert advice repository.
IntroductionDue to the enormous growth of the World Wide Web in the last decade, today the Web hosts very large information repositories containing huge volumes of data of almost every kind of media. However, due to the lack of a centralized authority governing the Web and a strict schema characterizing the data on the Web-which obviously promotes this incredible growth-finding relevant information on the Web is a major struggle.At the moment, 85% of the Internet users are reported to be using search engines for information retrieval on the Web. Most of these search engines employ either manual or automatic indexing with various refinements and optimizations (such as ranking algorithms that make use of links, etc.) (Kobayashi & Takeda, 2000). Yet, the biggest of these engines cannot cover more than 40% of the available Web pages, and the need for better search services to retrieve the most relevant information is increasing (Barfourosh, Nezhad, Anderson, & Perlis, 2002). To this end, a more recent and promising approach is indexing the Web by using metadata and annotations. It may be impossible to provide metadata for all Web resources, but still several information-rich resources and domains can benefit from such an approach. Along with the very fast approval of XML (Extensible Markup Language) (Bray, Paoli, Sperberg-McQuenn, & Maler, 2000) as a Web data exchange format, several frameworks to capture and model the Web in terms of metadata objects are proposed (i.e., Semant...