Jahiruddin scite author profile

Microblogging sites contain a huge amount of textual data and their classification is an imperative task in many applications, such as information filtering, user profiling, topical analysis, and content tagging. Traditional machine learning approaches mainly use a bag of words or n-gram techniques to generate feature vectors as text representation to train classifiers and perform considerably well for many text information processing tasks. Since short texts, such as tweets, contain a very limited number of words, the traditional machine learning approaches suffer from data sparsity and curse of dimensionality problems due to feature representation using a bag of words or n-grams techniques. Nowadays, the use of feature vectors, such as word embeddings, as an input to neural networks for text classification and clustering has shown a remarkable performance gain. In this paper, we present the different neural network models for multi-label classification of microblogging data. The proposed models are based on convolutional neural network (CNN) architectures, which utilize pre-trained word embeddings from generic and domain-specific textual data sources. The word embeddings are used individually and in various combinations through different channels of CNN to predict class labels. We also present a comparative analysis of the proposed CNN models with traditional machine learning models and one of the existing CNN architectures. The proposed models are evaluated over a real Twitter dataset, and the experimental results establish their efficacy to classify microblogging texts with improved accuracy in comparison with the traditional machine learning approaches and the existing CNN models.INDEX TERMS Social network analysis, machine learning, deep learning, multi-label classification, word embedding, convolution neural network.

show abstract

A concept-driven biomedical knowledge extraction and visualization framework for conceptualization of text corpora

Jahiruddin

Abulaish

Dey

2010

Journal of Biomedical Informatics

View full text Add to dashboard Cite

A number of techniques such as information extraction, document classification, document clustering and information visualization have been developed to ease extraction and understanding of information embedded within text documents. However, knowledge that is embedded in natural language texts is difficult to extract using simple pattern matching techniques and most of these methods do not help users directly understand key concepts and their semantic relationships in document corpora, which are critical for capturing their conceptual structures. The problem arises due to the fact that most of the information is embedded within unstructured or semi-structured texts that computers can not interpret very easily. In this paper, we have presented a novel Biomedical Knowledge Extraction and Visualization framework, BioKEVis to identify key information components from biomedical text documents. The information components are centered on key concepts. BioKEVis applies linguistic analysis and Latent Semantic Analysis (LSA) to identify key concepts. The information component extraction principle is based on natural language processing techniques and semantic-based analysis. The system is also integrated with a biomedical named entity recognizer, ABNER, to tag genes, proteins and other entity names in the text. We have also presented a method for collating information extracted from multiple sources to generate semantic network. The network provides distinct user perspectives and allows navigation over documents with similar information components and is also used to provide a comprehensive view of the collection. The system stores the extracted information components in a structured repository which is integrated with a query-processing module to handle biomedical queries over text documents. We have also proposed a document ranking mechanism to present retrieved documents in order of their relevance to the user query.

show abstract

Twitter Data Mining for Events Classification and Analysis

Azam

Jahiruddin

Abulaish

et al. 2015

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Jahiruddin

Feature and Opinion Mining for Customer Review Summarization

DiseaSE: A biomedical text analytics system for disease symptom extraction and characterization

Multi-Label Classification of Microblogging Texts Using Convolution Neural Network

A concept-driven biomedical knowledge extraction and visualization framework for conceptualization of text corpora

Twitter Data Mining for Events Classification and Analysis

Contact Info

Product

Resources

About