Although software reuse presents clear advantages for programmer productivit.y and code reliability, it is not practiced enough. One of the reasons for the only moderate success of reuse is the lack of software libr'lries that facilitate the actual locating and understanding of reusable components. This paper dpscrihes II technology for automatically I\Ssembling large softwar~ libraries that promote softwarp rpuse by helping the user locate the components closest to hfO'r/his neeels Softwarp libraries are aut.omatically assemblfO'd from a set. of Ilnorganizpd component.s by using informat.ion rptripval techniques. The constrllction of the library is done in !.wo steps. First, attribut.es ;up alJtornal.ically pxtracted from nat.urlll language docllmpntat.ion by IIsing a npIV inJexing scheme ba~fO'd an t.he nntinns of lexical affinities anci quantity of inforrnllt.ion. Then, a hierarchy for browsing is alJtomat.iclllly generaterl using a clustering techniq1le that draws only nn the informlltion provided by the attribulps. Thanks to the frfO'p-!.pxt inrlexing se-heme, tools following this approach can accept free-style nat'Jral language 'luprips. This tpChllOlogy h1\.S bfO'en impll"ment .. d in the GURU ~ystpm, whirh h
Most of the work on XML query and search has stemmed from the publishing and database communities, mostly for the needs of business applications. Recently, the Information Retrieval community began investigating the XML search issue to answer information discovery needs. Following this trend, we present here an approach where information needs can be expressed in an approximate manner as pieces of XML documents or "XML fragments" of the same nature as the documents that are being searched. We present an extension of the vector space model for searching XML collections via XML fragments and ranking results by relevance. We describe how we have extended a fulltext search engine to comply with this model. The value of the proposed method is demonstrated by the relative high precision of our system, which was among the top performers in the recent INEX workshop. Our results indicate that certain queries are more appropriate than others for the extended vector space model. Specifically, queries with relatively specific contexts but vague information needs are best situated to reap the benefit of this model. Finally our results show that one method may not fit all types of queries and that it could be worthwhile to use different solutions for different applications.
The ability to aggregate huge volumes of queries over a large population of users allows search engines to build precise models for a variety of query-assistance features such as query recommendation, correction, etc. Yet, no matter how much data is aggregated, the long-tail distribution implies that a large fraction of queries are rare. As a result, most query assistance services perform poorly or are not even triggered on long-tail queries. We propose a method to extend the reach of query assistance techniques (and in particular query recommendation) to long-tail queries by reasoning about rules between query templates rather than individual query transitions, as currently done in query-flow graph models. As a simple example, if we recognize that 'Montezuma' is a city in the rare query "Montezuma surf" and if the rule '
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.