PurposeGrid computing, cloud computing (CC), utility computing and software as a service are emerging technologies predicted to result in massive consolidation as meta‐level computing services of everything beneath one umbrella in the future. The purpose of this study is to foster the understanding and differentiation, by using the three aforementioned types of computing technologies and software, as a service by both public and private libraries to meet their expectations and strategic objectives.Design/methodology/approachThe approach in this study is a review based on comparing the four computing technologies with a brief analysis for researching and designing the mind map of a new meta‐level computing service approach, taking into consideration the need for new economic tariff and pricing models as well as service‐level agreements.FindingsSince it is anticipated that there will be likely potential consolidation and integration of computing services, a study of these four most advanced computing technologies and their methodologies is presented through their definition, characteristics, functionalities, advantages and disadvantages. This is a well‐timed technological advancement for libraries.Practical implicationsIt appears that the future of library services will become even more integrated, running over CC platforms based on usage rather than just storage of data.Social implicationsLibraries will become an open useful resource to all and sundry in a global context, and that will have huge societal benefits never imagined before.Originality/valueConcisely addresses the strategies, functional characteristics, advantages and disadvantages by comparing these technologies from several service aspects with a view to assisting in creating the next generation outer space computing.
Indexing the Web is becoming a laborious task for search engines as the Web exponentially grows in size and distribution. Presently, the most effective known approach to overcome this problem is the use of focused crawlers. A focused crawler applies a proper algorithm in order to detect the pages on the Web that relate to its topic of interest. For this purpose we proposed a custom method that uses specific HTML elements of a page to predict the topical focus of all the pages that have an unvisited link within the current page. These recognized on-topic pages have to be sorted later based on their relevance to the main topic of the crawler for further actual downloads. In the Treasure-Crawler, we use a hierarchical structure called the T-Graph which is an exemplary guide to assign appropriate priority score to each unvisited link. These URLs will later be downloaded based on this priority. This paper outlines the architectural design and embodies the implementation, test results and performance evaluation of the Treasure-Crawler system. The Treasure-Crawler is evaluated in terms of information retrieval criteria such as recall and precision, both with values close to 0.5. Gaining such outcome asserts the significance of the proposed approach.
Abstract-The two significant tasks of a focused Web crawler are finding relevant topic-specific documents on the Web and analytically prioritizing them for later effective and reliable download. For the first task, we propose a sophisticated custom algorithm to fetch and analyze the most effective HTML structural elements of the page as well as the topical boundary and anchor text of each unvisited link, based on which the topical focus of an unvisited page can be predicted and elicited with a high accuracy. Thus, our novel method uniquely combines both link-based and content-based approaches. For the second task, we propose a scoring function of the relevant URLs through the use of T-Graph (Treasure Graph) to assist in prioritizing the unvisited links that will later be put into the fetching queue. Our Web search system is called the Treasure-Crawler. This research paper embodies the architectural design of the Treasure-Crawler system which satisfies the principle requirements of a focused Web crawler, and asserts the correctness of the system structure including all its modules through illustrations and by the test results.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.