To the modern Search Engines (SEs), one of the biggest threats to be considered is spamdexing. Nowadays spammers are using a wide range of techniques for content generation, they are using content spam to fill the Search Engine Result Pages (SERPs) with low-quality web pages. Generally, spam web pages are insufficient, irrelevant and improper results for users. Many researchers from academia and industry are working on spamdexing to identify the spam web pages. However, so far not even a single universally efficient method is developed for identification of all spam web pages. We believe that for tackling the content spam there must be improved methods. This article is an attempt in that direction, where a framework has been proposed for spam web pages identification. The framework uses Stop words, Keywords Density, Spam Keywords Database, Part of Speech (POS) ratio, and Copied Content algorithms. For conducting the experiments and obtaining threshold values WEBSPAM-UK2006 and WEBSPAM-UK2007 datasets have been used. An excellent and promising F-measure of 77.38% illustrates the effectiveness and applicability of proposed method.
Multiclass classification based on unlabeled images using computer vision and image processing is currently an important issue. In this research, we focused on the phenomena of constructing high-level features detector for class-driven unlabeled data. We proposed a normalized restricted Boltzmann machine (NRBM) to form a robust network model. The proposed NRBM is developed to achieve the goal of dimensionality reduction and provide better feature extraction with enhancement in learning more appropriate features of the data. For increment in learning convergence rate and reduction in complexity of the NRBM, we add Polyak Averaging method when training update parameters. We train the proposed NRBM network model on five variants of Modified National Institute of Standards and Technology database (MNIST) benchmark dataset. The conducted experiments showed that the proposed NRBM is more robust to noisy data as compared to state-of-art approaches.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.