Phishing is serious web security problem that involves mimicking legitimate websites to deceive online users in order to steal their sensitive information. Phishing can be seen as a typical classification problem in data mining where the classifier is constructed from large number of website's features. There are high demands on identifying the best set of features that when mined the predictive accuracy of the classifiers is enhanced. This paper investigates features selection aiming to determine the effective set of features in terms of classification performance. We compare two known features selection method in order to determine the least set of features of phishing detection using data mining. Experimental tests on large number of features data set have been done using Information Gain and Correlation Features set methods. Further, two data mining algorithms namely C4.5 and IREP have been trained on different sets of selected features to show the pros and cons of the feature selection process. We have been able to identify new knowledge in the forms of rules that show vital correlations among significant features.
Autism is a development condition linked with healthcare costs, therefore, early screening of autism symptoms can cut down on these costs. The autism screening process involves presenting a series of questions for parents, caregivers, and family members to answer on behalf of the child to determine the potential of autistic traits. Often existing autism screening tools, such as the Autism Quotient (AQ), involve many questions, in addition to careful design of the questions, which makes the autism screening process lengthy. One potential solution to improve the efficiency and accuracy of screening is the adaptation of fuzzy rule in data mining. Fuzzy rules can be extracted automatically from past controls and cases to form a screening classification system. This system can then be utilized to forecast whether individuals have any autistic traits instead of relying on the conventional domain expert rules. This paper evaluates fuzzy rule-based data mining for forecasting autistic symptoms of children to address the aforementioned problem. Empirical results demonstrate high performance of the fuzzy data mining model in regard to predictive accuracy and sensitivity rates and surprisingly lower than expected specificity rates when compared with other rule-based data mining models.
In this paper, an offline holistic handwritten Arabic text recognition system based on Principal Component Analysis (PCA) and Support Vector Machine (SVM) classifiers is proposed. The proposed system consists of three primary stages: preliminary processing, feature extraction using PCA, and classification using the polynomial, linear, and Gaussian SVM classifiers. In this proposed system, text skeleton is first extracted and the images of the text are normalized into uniform size for extraction of the global features of the Arabic words using PCA. Recognition performance of this proposed system was evaluated on version 2 of the IFN/ENIT database of handwritten Arabic text using the polynomial, linear, and Gaussian SVM classifiers. The classification results of the proposed system were compared with the results produced by a benchmark. TRS that is depending on the Discrete Cosine Transform (DCT) method using numerous normalization sizes of Arabic text images. The experimental testing results support the effectiveness of the proposed system in holistic recognition of the handwritten Arabic text.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.