We address the problem of clustering the refinements of a user search query. The clusters computed by our proposed algorithm can be used to improve the selection and placement of the query suggestions proposed by a search engine, and can also serve to summarize the different aspects of information relevant to the original user query. Our algorithm clusters refinements based on their likely underlying user intents by combining document click and session cooccurrence information. At its core, our algorithm operates by performing multiple random walks on a Markov graph that approximates user search behavior. A user study performed on top search engine queries shows that our clusters are rated better than corresponding clusters computed using approaches that use only document click or only sessions co-occurrence information.
Transmission of infectious diseases, propagation of information, and spread of ideas and influence through social networks are all examples of diffusion. In such cases we say that a contagion spreads through the network, a process that can be modeled by a cascade graph. Studying cascades and network diffusion is challenging due to missing data. Even a single missing observation in a sequence of propagation events can significantly alter our inferences about the diffusion process.We address the problem of missing data in information cascades. Specifically, given only a fraction C ′ of the complete cascade C, our goal is to estimate the properties of the complete cascade C, such as its size or depth. To estimate the properties of C, we first formulate ktree model of cascades and analytically study its properties in the face of missing data. We then propose a numerical method that given a cascade model and observed cascade C ′ can estimate properties of the complete cascade C. We evaluate our methodology using information propagation cascades in the Twitter network (70 million nodes and 2 billion edges), as well as information cascades arising in the blogosphere. Our experiments show that the k-tree model is an effective tool to study the effects of missing data in cascades. Most importantly, we show that our method (and the k-tree model) can accurately estimate properties of the complete cascade C even when 90% of the data is missing.
This paper uncovers a new phenomenon in web search that we call domain bias -a user's propensity to believe that a page is more relevant just because it comes from a particular domain. We provide evidence of the existence of domain bias in click activity as well as in human judgments via a comprehensive collection of experiments. We begin by studying the difference between domains that a search engine surfaces and that users click. Surprisingly, we find that despite changes in the overall distribution of surfaced domains, there has not been a comparable shift in the distribution of clicked domains. Users seem to have learned the landscape of the internet and their click behavior has thus become more predictable over time. Next, we run a blind domain test, akin to a Pepsi/Coke taste test, to determine whether domains can shift a user's opinion of which page is more relevant. We find that domains can actually flip a user's preference about 25% of the time. Finally, we demonstrate the existence of systematic domain preferences, even after factoring out confounding issues such as position bias and relevance, two factors that have been used extensively in past work to explain user behavior. The existence of domain bias has numerous consequences including, for example, the importance of discounting click activity from reputable domains.
Analysis of a comprehensive set of features extracted from blogs for prediction of movie sales is presented. We use correlation, clustering and time-series analysis to study which features are best predictors.
Current speech synthesis technology is difficult to understand in everyday noise situations. Although there is a significant body of work on how humans modify their speech in noise, the results have yet to be implemented in a synthesizer. Algorithms capable of processing and incorporating these modifications may lead to improved speech intelligibility of assistive communication aids and more generally of spoken dialogue systems. We describe our efforts in building the Loudmouth synthesizer which emulates human modifications to speech in noise. A perceptual experiment indicated that Loudmouth achieved a statistically significant gain in intelligibility compared to a standard synthesizer in noise.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.