We compare the performance of two database selection algorithms reported in the literature. Their performance is compared using a common testbed designed specifically for database selection techniques. The testbed is a decomposition of the TREC TIPSTER data into 236 subcollections. The databases from our testbed were ranked using both the gGlOSS and CORI techniques and compared to a baseline derived from TREC relevance judgements. We examined the degree to which CORI and gGlOSS approximate this baseline. Our results con rm our earlier observation that the gGlOSS Ideall ranks do not estimate relevancebased ranks well. We also nd that CORI is a uniformly better estimator of relevance-based ranks than gGlOSS for the test environment used in this study. Part of the advantage of the CORI algorithm can be explained by a strong correlation between gGlOSS and a size-based baseline SBR. We also nd that CORI produces consistently accurate rankings on testbeds ranging from 100 921 sites. However for a given level of recall, search e ort appears to scale linearly with the number of databases.
The proliferation of online information resources increases the importance of effective and efficient distributed searching. Distributed searching is cast in three parts -database selection, query processing, and results merging. In this paper we examine the effect of database selection on retrieval performance. We look at retrieval performance in three different distributed retrieval testbeds and distill some general results. First we find that good database selection can result in better retrieval effectiveness than can be achieved in a centralized database. Second we find that good performance can be achieved when only a few sites are selected and that the performance generally increases as more sites are selected. Finally we find that when database selection is employed, it is not necessary to maintain collection wide information (CWI), e.g. global idf. Local information can be used to achieve superior performance. This means that distributed systems can be engineered with more autonomy and less cooperation. This work suggests that improvements in database selection can lead to broader improvements in retrieval performance, even in centralized (i.e. single database) systems. Given a centralized database and a good selection mechanism, retrieval performance can be improved by decomposing that database conceptually and employing a selection step.
We describe a testbed for database selection techniques and an experiment conducted using this testbed. The testbed is a decomposition of the TREC/TIPSTER data that allows analysis of the data along multiple dimensions, including collection-based and temporal-based analysis. We characterize the subcollections in this testbed in terms of number of documents, queries against which the document,s have been evaluated for relevance, and distribution of relevant documents.We then present initial results from a study conducted using this testbed that examines the effectiveness of the gGlOSS approach to database selection. The databases from our testbed were ranked using the gGl0S.S techniques and compared to the gGlOSS I&l(l) baseline and a baseline derived from TREC relevance judgements.We have examined the degree to which several gGlOSS estimate functions approximate these baselines. Our initial results confirm that the gGZOSS estimators are excellent predictors of the Ideal(Z) ranks but that the Ideal(l) ranks do not estimate relevance-based ranks well.
This paper describes an algorithm for calculating the biovolume of cells with simple shapes, such as bacteria, flagellates, and simple ciliates, from a 2-dimensional digital image. The method can be adapted to any image analysis system which allows access to the binary cell image-(i.e., the pixels, or (x,y) points, composing the cell. The cell image is rotated to a standard orientation (horizontal), Accurate measurements of the biomass of bacteria and protists from environmental samples depend primarily upon visual microscopic methods for enumeration and cell sizing. Epifluorescence microscopy is the method of choice in aquatic sciences, since algal pigments fluoresce specific colors and the use of fluorochromes allows the discrimination of living cells from detrital particles (9,11,13,17). Since these methods are tedious and subjective when conducted visually, there is much interest in automation by computerized image analysis (4,19,20). This technique allows numerous, more precise, and more detailed cell measurements to be made. For the purpose of ecological energy and nutrient flow modelling, cell biovolume must be converted into biomass units, usually carbon or nitrogen. Accurate measures of both cell biovolume and carbon or nitrogen cell content are necessary for good conversion factors. Since these volume estimates are based on cubed linear measurements, their errors can equal or exceed those associated with the carbon or nitrogen determinations (4). The variation in estimated cell volume due to linear measurement error can easily exceed the variation of different biovolume-to-carbon conversion factors, which for bacteria are currently controversial (6,7,16) and, until recently (5), were unmeasured for protozoa. Previous methods of estimating the biovolume of microscopic organisms have generally inand a solid of revolution is calculated by digital integration. Verification and a critical assessment of the method are presented. The algorithm accounts for irregularities in cell shape that conventional methods based on length, width, and geometrical formulas do not.Key terms: Bacteria, protist, biomass, digital image analysis volved measuring overall cell size in two dimensions and applying a geometric formula to infer a three-dimensional structure (1,3,10,15). A typical method would be to measure length and width and use the formulas for a sphere and cylinder to calculate volume.We have developed a method using a n image analyzed epifluorescence microscopy system. The method is similar to one used by Brownlee (8) to measure ciliate volumes, but is more flexible and automated. We feel that the implicit assumptions of this method are met and that it will be of benefit wherever there is difficulty obtaining direct measurements of the object in question. MATERIALS AND METHODS Algorithm DescriptionA prerequisite to implementing this algorithm is the ability to obtain a digital image of the object in question. The main assumption of the integration algorithm is that the shape of the object to be measured is symme...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.