World Wide Web is the most useful source of information. In order to achieve high productivity of publishing, the webpages in many websites are automatically populated by using the common templates with contents. The templates provide readers easy access to the contents guided by consistent structures. However, for machines, the templates are considered harmful since they degrade the accuracy and performance of web applications due to irrelevant terms in templates. Thus, template detection techniques have received a lot of attention recently to improve the performance of search engines, clustering, and classification of web documents. In this paper, we present novel algorithms for extracting templates from a large number of web documents which are generated from heterogeneous templates. We cluster the web documents based on the similarity of underlying template structures in the documents so that the template for each cluster is extracted simultaneously. We develop a novel goodness measure with its fast approximation for clustering and provide comprehensive analysis of our algorithm. Our experimental results with real-life data sets confirm the effectiveness and robustness of our algorithm compared to the state of the art for template detection algorithms.
PurposeA botnet is a network of computers on the internet infected with software robots (or bots). There are numerous botnets, and some of them control millions of computers. Cyber criminals use botnets to launch spam e‐mails and denial of service attacks; and commit click fraud and data theft. Governments use botnets for political purposes or to wage cyber warfare. The purpose of this paper is to review the botnet threats and the responses to the botnet threats.Design/methodology/approachThe paper describes how botnets are created and operated. Then, the paper discusses botnets in terms of architecture, attacking behaviors, communication protocols, observable botnet activities, rally mechanisms, and evasion techniques. Finally, the paper reviews state‐of‐the‐art techniques for detecting and counteracting botnets, and also legal responses to botnet threats.FindingsBotnets have become the platform for many online threats such as spam, denial of service attacks, phishing, data thefts, and online frauds. Security researchers must develop technology to detect and take down botnets, and governments must develop capacity to crack down on botmasters and botnets. Individual computer owners must diligently take measures to keep their computers from becoming members of botnets.Originality/valueThe paper provides a review of current status of botnets and a summary of up‐to‐date responses to botnets in both technical and legal aspects, which can be used as a stepping stone for further research.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.