Public reporting burden for the collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources, gathering and maintaining the data needed, and completing and reviewing the collection of information. Keywords: phishing, email, filtering, semantic attacks, learning AbstractThere are an increasing number of emails purporting to be from a trusted entity that attempt to deceive users into providing account or identity information, commonly known as "phishing" emails. Traditional spam filters are not adequately detecting these undesirable emails, and this causes problems for both consumers and businesses wishing to do business online. From a learning perspective, this is a challenging problem. At first glance, the problem appears to be a simple text classification problem, but the classification is confounded by the fact that the class of "phishing" emails is nearly identical to the class of real emails. We propose a new method for detecting these malicious emails called PILFER. By incorporating features specifically designed to highlight the deceptive methods used to fool users, we are able to accurately classify over 92% of phishing emails, while maintaining a false positive rate on the order of 0.1%. These results are obtained on a dataset of approximately 860 phishing emails and 6950 non-phishing emails. The accuracy of PILFER on this dataset is significantly better than that of SpamAssassin, a widely-used spam filter.
There are an increasing number of emails purporting to be from a trusted entity that attempt to deceive users into providing account or identity information, commonly known as "phishing" emails. Traditional spam filters are not adequately detecting these undesirable emails, and this causes problems for both consumers and businesses wishing to do business online. From a learning perspective, this is a challenging problem. At first glance, the problem appears to be a simple text classification problem, but the classification is confounded by the fact that the class of "phishing" emails is nearly identical to the class of real emails. We propose a new method for detecting these malicious emails called PILFER. By incorporating features specifically designed to highlight the deceptive methods used to fool users, we are able to accurately classify over 92% of phishing emails, while maintaining a false positive rate on the order of 0.1%. These results are obtained on a dataset of approximately 860 phishing emails and 6950 non-phishing emails. The accuracy of PILFER on this dataset is significantly better than that of SpamAssassin, a widely-used spam filter.
The dramatic growth of the Internet has created a new problem for users: location of the relevant sources of documents. This article presents a framework for (and experimentally analyzes a solution to) this problem, which we call the text-source discovery problem. Our approach consists of two phases. First, each text source exports its contents to a centralized service. Second, users present queries to the service, which returns an ordered list of promising text sources. This article describes GlOSS, Glossary of Servers Server, with two versions: bGlOSS, which provides a Boolean query retrieval model, and vGlOSS, which provides a vector-space retrieval model. We also present hGlOSS, which provides a decentralized version of the system. We extensively describe the methodology for measuring the retrieval effectiveness of these systems and provide experimental evidence, based on actual data, that all three systems are highly effective in determining promising text sources for a given query.
The performance of distributed text document retrieval systems is strongly in uenced b y t h e o r ganization of the inverted index. This paper compares the performance impact on query processing of various physical organizations for inverted lists. We present a new probabilistic model of the database and queries. Simulation experiments determine which variables most strongly inuence r esponse time and throughput. This lea d s t o a set of design trade-o s over a range of hardware c on gurations and new parallel query processing strategies.
Abstract| Accessing many data sources aggravates problems for users of heterogeneous distributed databases. Database administrators must deal with fragile mediators, that is, mediators with schemas and views that must be signi cantly changed to incorporate a new data source. When implementing translators of queries from mediators to data sources, database implementors must deal with data sources that do not support all the functionality required by mediators. Application programmers must deal with graceless failures for unavailable data sources. Queries simply return failure and no further information when data sources are unavailable for query processing. The Distributed Information Search COmponent ( Disco) addresses these problems. Data modeling techniques manage the connections to data sources, and sources can be added transparently to the users and applications. The interface between mediators and data sources exibly handles di erent query languages and different data source functionality. Query rewriting and optimization techniques rewrite queries so they are e ciently evaluated by sources. Query processing and evaluation semantics are developed to process queries over unavailable data sources. In this article we describe (a) the distributed mediator architecture of Disco (b) the data model and its modeling of data source connections (c) the interface to underlying data sources and the query rewriting process and (d) query processing semantics. We describe several advantages of our system.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.