In the area of pattern recognition and pattern matching, the methods based on deep learning models have recently attracted several researchers by achieving magnificent performance. In this paper, we propose the use of the convolutional neural network to recognize the multifont offline Urdu handwritten characters in an unconstrained environment. We also propose a novel dataset of Urdu handwritten characters since there is no publicly-available dataset of this kind. A series of experiments are performed on our proposed dataset. The accuracy achieved for character recognition is among the best while comparing with the ones reported in the literature for the same task.
We applied t-distributed stochastic neighbor embedding (t-SNE) to visualize Urdu handwritten numerals (or digits). The data set used consists of 28 × 28 images of handwritten Urdu numerals. The data set was created by inviting authors from different categories of native Urdu speakers. One of the challenging and critical issues for the correct visualization of Urdu numerals is shape similarity between some of the digits. This issue was resolved using t-SNE, by exploiting local and global structures of the large data set at different scales. The global structure consists of geometrical features and local structure is the pixel-based information for each class of Urdu digits. We introduce a novel approach that allows the fusion of these two independent spaces using Euclidean pairwise distances in a highly organized and principled way. The fusion matrix embedded with t-SNE helps to locate each data point in a two (or three-) dimensional map in a very different way. Furthermore, our proposed approach focuses on preserving the local structure of the high-dimensional data while mapping to a low-dimensional plane. The visualizations produced by t-SNE outperformed other classical techniques like principal component analysis (PCA) and auto-encoders (AE) on our handwritten Urdu numeral dataset.
The objective of this study was to observe the potential of machine vision (MV) approach for the classification of eight citrus varieties. The leaf images of eight citrus varieties that were grapefruit, Moussami, Malta, Lemon, Kinow, Local lemon, Fuetrells, and Malta Shakri. These were acquired by a digital camera in an open environment without any complex laboratory setup. The acquired digital images dataset was transformed into the multifeature dataset that was the combination of binary, histogram, texture, spectral, rotational, scalability and translational (RST) invariant features. For each citrus leaf image, total 57 multi-features were acquired on every non-overlapping region of interest (ROI), i.e. (32x32), (64x64), (128x128), and (256x256). Furthermore, the optimized 15 features using the supervised correlation-based feature selection (CFS) technique were acquired. The optimized multi-features dataset to different MV classifiers namely Multilayer Perceptron (MLP), Random Forest (RF), J48 and Naïve Bayes using10-fold cross-validation method were plugged-in. The results produced by MLP presented an average overall accuracy of 98.14% on ROIs (256x256) outperforming the other classifiers. The classification accuracy values by MLP on the eight citrus leaf varieties,
Diabetes mellitus is a hyperglycemia-like chronic condition that is a troublesome disease. It is estimated that, according to the growing morbidity, by 2040, the world will cross 642 million diabetic patients. This means that each one of the ten adults will be diabetes-affected. Diabetes can also lead to other illnesses such as heart attacks, kidney damage, and even blindness. The prediction of diabetes in advance motivates us to develop a machine learning-based model. A dataset was obtained from the online repository for this work. The obtained dataset was imbalanced. An imbalanced dataset presents a challenge that is needed to be balanced for prediction using multiple machine learning like Tomek and SMOTE. These techniques remove necessary outliers that are incomplete in the provided dataset. These outliers are also managed using the IQR method. Additionally, this research employed a two-stage model selection methodology. In the first stage, logistic regression, Support Vector Machine, k-nearest neighbors, gradient boost, Naive Bayes, and Random Forests were applied to determine the efficiency of prediction based on patients’ preconditioning. At this stage, Random Forest was found to be the best with an accuracy of 80.7% after applying SMOTE oversampling technique to balance the dataset. In the second stage, three better-performing models were used by utilizing a voting algorithm. The results were encouraging, and the model obtained 82.0% accuracy with the default dataset and 81.7% accuracy with the balanced dataset. Naive Bayes Theorem, Gradient Boosting Classifier, and Random Forest were used as inputs to the voting algorithm.
Work on the problem of handwritten text recognition in Urdu script has been an active research area. A significant progress is made in this interesting and challenging field in the last few years. In this study, the authors presented a comprehensive survey for a number of offline and online handwritten text recognition systems for Urdu script written in Nastaliq font style from 2004 to 2019. Following features make their contribution worthwhile and unique among the reviews of a similar kind: (i) their review classifies the existing studies based on types of recognition systems used for Urdu handwritten text, (ii) it covers a very different outlook of the recognition process of the Urdu handwritten text at different granularity levels (e.g. character, word, ligature, or sentence level), (iii) this review article also presents each of surveyed articles in following dimensions: the task performed, its granularity level, dataset used, results obtained, and future dimensions, and (iv) lastly it gives the summary of the surveyed articles according to the granularity levels, publishing years, related tasks or subtasks, and types of classifiers used. In the end, major challenges and tasks related to Urdu handwritten text recognition approaches are also discussed in detail.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.