SUMMARYOne of the most annoying problems on the Internet is spam. To fight spam, many approaches have been proposed over the years. Most of these approaches involve scanning the entire contents of e-mail messages in an attempt to detect suspicious keywords and patterns. Although such approaches are relatively effective, they also show some disadvantages. Therefore an interesting question is whether it would be possible to effectively detect spam without analyzing the entire contents of e-mail messages. The contribution of this paper is to present an alternative spam detection approach, which relies solely on analyzing the origin (IP address) of e-mail messages, as well as possible links within the e-mail messages to websites (URIs). Compared to analyzing suspicious keywords and patterns, detection and analysis of URIs is relatively simple. The IP addresses and URIs are compared to various kinds of blacklists; a hit increases the probability of the message being spam. Although the idea of using blacklists is well known, the novel idea proposed within this paper is to introduce the concept of 'bad neighborhoods'. To validate our approach, a prototype has been developed and tested on our university's mail server. The outcome was compared to SpamAssassin and mail server log files. The result of that comparison was that our prototype showed remarkably good detection capabilities (comparable to SpamAssassin), but puts only a small load on the mail server.
Proposed EU regulations on data retention could require every provider to keep accounting logs of its customers' Internet usage. Although the technical consequences of these requirements have been investigated by consultancy companies, this paper investigates what this accounting data could be, how it can be obtained and how much data storage is needed. This research shows that every gigabyte of network traffic results in approximately 400 kilobyte of accounting data when using our refinements to existing methods for storing accounting dataless by a factor twenty than previously assumed.
The popularity of RSS and similar feed formats is growing fast. This paper gives an overview of the standards and implementations in this field, and analyzes whether they allow scheduling the retrieval of feed updates. As will be shown, such support is very limited and current feed readers therefore poll providers at fixed rates. The measurements performed as part of our study show that in general a clear mismatch exists between such fixed polling rate of feed readers and the rate at which providers update their feeds; a significant performance gain is therefore possible by embedding scheduling information within feeds. This paper proposes a scheduling approach that both reduces lag in updates for active feeds and reduces wasted resources for less active feeds. Simulations show that our approach reduces the perceived lag by twenty percent, while having the same resource requirements as a fixed-rate algorithm.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.