In this paper we present an analysis of an AltaVista Search Engine query log consisting of approximately 1 billion entries for search requests over a period of six weeks. This represents almost 285 million user sessions, each an attempt to fill a single information need. We present an analysis of individual queries, query duplication, and query sessions. We also present results of a correlation analysis of the log entries, studying the interaction of terms within queries. Our data supports the conjecture that web users differ significantly from the user assumed in the standard information retrieval literature. Specifically, we show that web users type in short queries, mostly look at the first 10 results only, and seldom modify the query. This suggests that traditional information retrieval techniques may not work well for answering web search requests. The correlation analysis showed that the most highly correlated items are constituents of phrases. This result indicates it may be useful for search engines to consider search terms as parts of phrases even if the user did not explicitly specify them as such.
We present algorithms for computing similarity relations of labeled graphs. Similarity relations have applications for the refinement and verification of reactive systems. For finite graphs, we present an O(mn) algorithm for computing the similarity relation of a graph with n vertices and m edges (assuming m 2 n). For effectively presented infinite graphs, we present a symbolic similarity-checking procedure that terminates if a finite similarity relation exists. We show that 2D rectangular automata, which model discrete reactive systems with continuous environments, define effectively presented infinite graphs with finite similarity relations. It follows that the refinement problem and the VCTL' model-checking problem are decidable for 2D rectangular automata.
We give a linear-time algorithm for single-source shortest paths in planar graphs with nonnegative edge-lengths. Our algorithm also yields a linear-time algorithm for maximum flow in a planar graph with the source and sink on the same face. For the case where negative edge-lengths are allowed, we give an algorithm requiring O(n 4Â3 log(nL)) time, where L is the absolute value of the most negative length. This algorithm can be used to obtain similar bounds for computing a feasible flow in a planar network, for finding a perfect matching in a planar bipartite graph, and for finding a maximum flow in a planar graph when the source and sink are not on the same face. We also give parallel and dynamic versions of these algorithms. ] 1997 Academic Press
Consider the following Online Boolean Matrix-Vector Multiplication problem: We are given an n × n matrix M and will receive n column-vectors of size n, denoted by v 1 , . . . , v n , one by one. After seeing each vector v i , we have to output the product M v i before we can see the next vector. A naive algorithm can solve this problem using O(n 3 ) time in total, and its running time can be slightly improved to O(n 3 / log 2 n) [Williams SODA'07]. We show that a conjecture that there is no truly subcubic (O(n 3− )) time algorithm for this problem can be used to exhibit the underlying polynomial time hardness shared by many dynamic problems. For a number of problems, such as subgraph connectivity, Pagh's problem, dfailure connectivity, decremental single-source shortest paths, and decremental transitive closure, this conjecture implies tight hardness results. Thus, proving or disproving this conjecture will be very interesting as it will either imply several tight unconditional lower bounds or break through a common barrier that blocks progress with these problems. This conjecture might also be considered as strong evidence against any further improvement for these problems since refuting it will imply a major breakthrough for combinatorial Boolean matrix multiplication and other long-standing problems if the term "combinatorial algorithms" is interpreted as "Strassenlike algorithms" [Ballard et al. SPAA'11].The conjecture also leads to hardness results for problems that were previously based on diverse problems and conjectures -such as 3SUM, combinatorial Boolean matrix multiplication, triangle detection, and multiphase -thus providing a uniform way to prove polynomial hardness results for dynamic algorithms; some of the new proofs are also simpler or even become trivial. The conjecture also leads to stronger and new, non-trivial, hardness results, e.g., for the fullydynamic densest subgraph and diameter problems.
When using traditional search engines, users have to formulate queries to describe their information need. This paper discusses a different approach to Web searching where the input to the search process is not a set of query terms, but instead is the URL of a page, and the output is a set of related Web pages. A related Web page is one that addresses the same topic as the original page. For example, www.washingtonpost.com is a page related to www.nytimes.com, since both are online newspapers.We describe two algorithms to identify related Web pages. These algorithms use only the connectivity information in the Web (i.e., the links between pages) and not the content of pages or usage information. We have implemented both algorithms and measured their runtime performance. To evaluate the effectiveness of our algorithms, we performed a user study comparing our algorithms with Netscape's 'What's Related' service (http://home. netscape.com/escapes/related/). Our study showed that the precision at 10 for our two algorithms are 73% better and 51% better than that of Netscape, despite the fact that Netscape uses both content and usage pattern information in addition to connectivity information.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.