Modern search engines are good enough to answer popular commercial queries with mainly highly relevant documents. However, our experiments show that users behavior on such relevant commercial sites may differ from one to another web-site with the same relevance label. Thus search engines face the challenge of ranking results that are equally relevant from the perspective of the traditional relevance grading approach. To solve this problem we propose to consider additional facets of relevance, such as trustability, usability, design quality and the quality of service. In order to let a ranking algorithm take these facets in account, we proposed a number of features, capturing the quality of a web page along the proposed dimensions. We aggregated new facets into the single label, commercial relevance, that represents cumulative quality of the site. We extrapolated commercial relevance labels for the entire learning-to-rank dataset and used weighted sum of commercial and topical relevance instead of default relevance labels. For evaluating our method we created new DCG-like metrics and conducted off-line evaluation as well as on-line interleaving experiments demonstrating that a ranking algorithm taking the proposed facets of relevance into account is better aligned with user preferences.
Search engines are currently facing a problem of websites that distribute malware. In this paper we present a novel efficient algorithm that learns to detect such kind of spam. We have used a bipartite graph with two types of nodes, each representing a layer in the graph: web-sites and file hostings (FH), connected with edges representing the fact that a file can be downloaded from the hosting via a link on the web-site. The performance of this spam detection method was verified using two set of ground truth labels: manual assessments of antivirus analysts and automatically generated assessments obtained from antivirus companies. We demonstrate that the proposed method is able to detect new types of malware even before the best known antivirus solutions are able to detect them.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.