Biomedical named entity recognition (Bio-NER) is a highly complex and time-consuming research domain using natural language processing (NLP). It's widely used in information retrieval, knowledge summarization, biomolecular event extraction, and discovery applications. This paper proposes a method for the recognition and classification of named entities in the biomedical domain using machine learning (ML) techniques. Support vector machine (SVM), decision trees (DT), K-nearest neighbor (KNN), and its kernel versions are used. However, recent advancements in programmable, massively parallel graphics processing units (GPU) hold promise in terms of increased computational capacity at a lower cost to address multi-dimensional data and time complexity. We implement a novel parallel version of KNN by porting the distance computation step on GPU using the compute unified device architecture (CUDA) and compare the performance of all the algorithms using the BioNLP/NLPBA 2004 corpus. Results demonstrate that CUDA-KNN takes full advantage of the GPU's computational capacity and multi-leveled memory architecture, resulting in a 35× performance enhancement over the central processing unit (CPU). In a comparative study with existing research, the proposed model provides an option for a faster NER system for higher dimensionality and larger datasets as it offers balanced performance in terms of accuracy and speed-up, thus providing critical design insights into developing a robust BioNLP system.
Accessibility to medical knowledge and healthcare costs are the two major impediments for common man. Conversational agents like Medical chatbots, which are designed keeping in view medical applications can potentially address these issues. Chatbots can either be generic or disease-specific in nature. Diabetes is a non-communicable disease and early detection of the same can let people know about the serious consequences of this disorder and help save human lives. In this paper, we have developed a generic text-to-text ‘Diabot’ – a DIAgnostic chatBOT which engages patients in conversation using advanced Natural Language Understanding (NLU) techniques to provide personalized prediction using the general health dataset and based on the various symptoms sought from the patient. The design is further extended as a DIAbetes chatBOT for specialized Diabetes prediction using the Pima Indian diabetes dataset for suggesting proactive preventive measures to be taken. For prediction, there exists multiple classification algorithms in Machine Learning which can be used based on their accuracy. However, rather than considering only one model and hoping this model is the best or most accurate predictor we can make, the novelty in this paper lies in Ensemble learning, which is a meta-algorithm that combines a myriad of weaker models and averages them to produce one final balanced and accurate model. From literature reviews, it is observed that very little research has happened in ensemble methods to increase prediction accuracy. The paper presents a state-of-the art Diabot design with an undemanding front-end interface for common man using React UI, RASA NLU based text pre-processing, quantitative performance comparison of various machine learning algorithms as standalone classifiers and combining them all in a majority voting ensemble. It is observed that the chatbot is able to interact seamlessly with all patients based on the symptoms sought. The accuracy of Ensemble model is balanced for general health prediction and highest for diabetes prediction among all weak learners considered which provides motivation for further exploring ensemble techniques in this domain.
Predicting stock market trend is an extremely complicated task and calls for extensive study and insights into the context at hand. Primary requirement for any investor is to assess this trend to help invest for maximizing his returns. The advances in Machine learning and data analytics in particular have changed the way investors can approach this matter. Sentiment analysis or Opinion mining can be carried out by taking into consideration public sentiments regarding the stock market conditions and to understand the ups and down of this most volatile sector. In this paper, public sentiments from Twitter along with news feed related to the stock market conditions for predicting the nature of market is considered to analyse the stock market trend. The data is collected from twitter and various news sites to generate a gross sentiment score regarding the market. The gross sentiment score is used to find a correlation between market price and sentiments to train the proposed models for prediction using Linear and robustness regression techniques such as Ordinary Least squares (OLS), RANSAC, Theil-Sen estimator, Huber Regression and Ridge regression. Ensemble method is used to achieve reliable and better prediction accuracy instead of a single method. Ensemble method combines models and carries out majority voting among them to produce one final model to increase prediction accuracy. The obtained results reveal that public opinion does make a significant impact on market behaviour with the prediction accuracy between 65-91% depending on the dataset.
Data used by current Biomedical named entity recognition (BioNER) systems has mostly been manually labelled for supervision. However, it might be difficult to find large amounts of annotated data, especially in fields with a high level of specialization, such as biomedical, bioinformatics, and so on. When dictionaries and ontologies are available, which are domain-specific knowledge resources, automatically tagged distantly supervised biomedical training data can be developed. However, any such distantly supervised NER result is normally noisy. The prevalence of false positives and false negatives with this type of autonomously generated data is the main problem that directly affects efficiency. This research investigates distant supervision to detect false positive occurrences in BioNER task. A reinforcement learning technique is employed that is modelled as a graphical processing unit (GPU) accelerated Markov decision process (MDP) with a neural network policy. To deal with false negative cases, we employ a partial annotation conditional random field (CRF) technique. Results on two benchmark datasets show a cutting-edge methodology that can enhance the functionality of the neural NER system. It goes on to show how the proposed approach cuts down on human annotated data for BioNER tasks in Natural Language Processing (NLP).
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.