Performance evaluation is an important issue in Web search engine researches. Traditional evaluation methods rely on much human efforts and are therefore quite time-consuming. With clickthrough data analysis, we proposed an automatic search engine performance evaluation method. This method generates navigational type query topics and answers automatically based on search users' querying and clicking behavior. Experimental results based on a commercial Chinese search engine's user logs show that the automatically method gets a similar evaluation result with traditional assessor-based ones.
Uber's business is highly real-time in nature. PBs of data is continuously being collected from the end users such as Uber drivers, riders, restaurants, eaters and so on everyday. There is a lot of valuable information to be processed and many decisions must be made in seconds for a variety of use cases such as customer incentives, fraud detection, machine learning model prediction. In addition, there is an increasing need to expose this ability to different user categories, including engineers, data scientists, executives and operations personnel which adds to the complexity.In this paper, we present the overall architecture of the real-time data infrastructure and identify three scaling challenges that we need to continuously address for each component in the architecture. At Uber, we heavily rely on open source technologies for the key areas of the infrastructure. On top of those open-source software, we add significant improvements and customizations to make the open-source solutions fit in Uber's environment and bridge the gaps to meet Uber's unique scale and requirements.We then highlight several important use cases and show their real-time solutions and tradeoffs. Finally, we reflect on the lessons we learned as we built, operated and scaled these systems.
CCS CONCEPTS• Information systems → Stream management; • Computer systems organization → Real-time system architecture.
Searching an organization's document repositories for experts is a frequently faced problem in intranet information management. This paper proposes a candidate-centered model which is referred as Candidate Description Document (CDD)-based retrieval model. The expertise evidence about an expert candidate scattered over repositories is mined and aggregated automatically to form a profile called the candidate's CDD, which represents his knowledge. We present the model from its foundations through its logical development and argue in favor of this model for expert finding. We devise and compare the different strategies for exploring a variety of expertise evidence. The experiments on TREC enterprise corpora demonstrate that the CDD-based model achieves significant and consistent improvement on performance through comparative studies with non-CDD methods.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.