Niju Shrestha scite author profile

Niju Shrestha

2Publications

3Citation Statements Received

15Citation Statements Given

How they've been cited

How they cite others

Affiliations

University of Alabama at Birmingham

Publications

Order By: Most citations

Consolidating client names in the lobbying disclosure database using efficient clustering techniques

Kharel

Shrestha

Zhang

et al. 2014

View full text Add to dashboard Cite

A fuzzy-matching clustering algorithm is applied to clustering similar client names in the lobbying Disclosure Database. Due to errors and inconsistencies in manual typing, the name of a client often has multiple representations including erroneously spelled names and sometimes shorthand forms, presenting difficulties in associating lobbying activities and interests with one single client. Therefore, there is a need to consolidate various forms of names of the same client into one group/cluster. For efficient clustering, we applied a series of preprocessing techniques before calculating the string distance between two client names. An optimized threshold selection has been adopted, which helps improve clustering accuracy. A single linkage hierarchical clustering technique has been introduced to cluster the client names. The algorithm proves to be effective in clustering similar client names. It also helps to find the representative name for a particular client cluster.

show abstract

High-Performance Classification of Phishing URLs Using a Multi-modal Approach with MapReduce

Shrestha

Kharel

Britt

et al. 2015

View full text Add to dashboard Cite

Classifying phishing websites can be expensive both computationally and financially given a large enough volume of suspect sites. A distributed cloud environment can reduce the computational time and financial cost significantly. To test this idea, we apply a multi-modal feature classification algorithm to classify phishing websites in a non-distributed and several distributed environments. A multi-modal approach combines both visual and text features for classification. The implementation extracts color feature and histogram feature from the screenshot of a phishing website and text from its html source code. Feature extraction and comparison is accomplished by applying the MapReduce framework. Implementing the multimodal approach in a distributed environment proves to reduce the runtime as well as the financial costs. We present results that show our work is 30 times faster than existing state of the art systems in phishing website classification problem. Keywords-Phishing, Map Reduce, Color code I. Contributions:The contributions of this paper are as follows:1. We develop a high performance multi-modal phishing website classification system using MapReduce.2. We conduct performance evaluations to demonstrate the significant performance and cost advantage of our system over the existing state-of-the-art.3. We conduct extensive experiments on real cloud using Amazon EMR and Amazon S3.Organization: The rest of the paper explores the details of the multi-modal phish classification algorithm and its performance in a non-distributed versus a distributed environment. In Section II, we discuss the related research. Section III presents our approach and algorithms. We describe our distance measures for the classification task in Section IV, and the classification technique in Section V. In Section VI, we discuss our experimental setup and results, and provide an analysis in

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Niju Shrestha

Consolidating client names in the lobbying disclosure database using efficient clustering techniques

High-Performance Classification of Phishing URLs Using a Multi-modal Approach with MapReduce

Contact Info

Product

Resources

About