XML retrieval is facing new challenges when applied to heterogeneous XML documents, where next to nothing about the document structure can be taken for granted. We have developed solutions where some of the heterogeneity issues are addressed. Our fragment selection algorithm selectively divides a heterogeneous document collection into equi-sized fragments with full-text content. If the content is considered too data-oriented, it is not accepted. The algorithm needs no information about element names. In addition, three techniques for fragment expansion are presented, all of which yield a 13-17% average improvement in average precision. These techniques and algorithms are among the first steps in developing document-type-independent indexing methods for the full text in heterogeneous XML collections.
Article disponible en ligne : http://www.cs.otago.ac.nz/homepages/andrew/papers/2010-4.pdfInternational audienceINEX investigates focused retrieval from structured documents by providing large test collections of structured documents, uniform evaluation measures, and a forum for organizations to compare their results. This paper reports on the INEX 2009 evaluation campaign, which consisted of a wide range of tracks: Ad hoc, Book, Efficiency, Entity Ranking, Interactive, QA, Link the Wiki, and XML Mining. INEX in running entirely on volunteer effort by the IR research community: anyone with an idea and some time to spend, can have a major impact
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.