Learning Word Embeddings with Chi-Square Weights for Healthcare Tweet Classification

Kuang, Sicong; Davison, Brian D.

doi:10.3390/app7080846

Cited by 28 publications

(14 citation statements)

References 28 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Distributed representations of words are capable of successfully capturing meaningful syntactic and semantic properties of the language and it has been shown [33] that using word embeddings as features could improve many NLP tasks, such as information retrieval [34,35], part-of-speech tagging [36] or named entity recognition (NER) [37]; Kuang and Davidson [38] learned specific word embeddings from Twitter for classifying healthcare-related tweets. Since learning those word representations is a slow and non-trivial task, already trained models can be found in literature; state-of-the-art embeddings are mainly based on deep-learning [31,39], but other techniques have been previously explored, for instance spectral methods [40,41].…”

Section: Related Workmentioning

confidence: 99%

TwitPersonality: Computing Personality Traits from Tweets Using Word Embeddings and Supervised Learning

et al. 2018

View full text Add to dashboard Cite

Abstract:We are what we do, like, and say. Numerous research efforts have been pushed towards the automatic assessment of personality dimensions relying on a set of information gathered from social media platforms such as list of friends, interests of musics and movies, endorsements and likes an individual has ever performed. Turning this information into signals and giving them as inputs to supervised learning approaches has resulted in being particularly effective and accurate in computing personality traits and types. Despite the demonstrated accuracy of these approaches, the sheer amount of information needed to put in place such a methodology and access restrictions make them unfeasible to be used in a real usage scenario. In this paper, we propose a supervised learning approach to compute personality traits by only relying on what an individual tweets about publicly. The approach segments tweets in tokens, then it learns word vector representations as embeddings that are then used to feed a supervised learner classifier. We demonstrate the effectiveness of the approach by measuring the mean squared error of the learned model using an international benchmark of Facebook status updates. We also test the transfer learning predictive power of this model with an in-house built benchmark created by twenty four panelists who performed a state-of-the-art psychological survey and we observe a good conversion of the model while analyzing their Twitter posts towards the personality traits extracted from the survey.

show abstract

Section: Related Workmentioning

confidence: 99%

TwitPersonality: Computing Personality Traits from Tweets Using Word Embeddings and Supervised Learning

et al. 2018

View full text Add to dashboard Cite

show abstract

“…Xu et al [23] designed a document classification framework based on word embedding and conducted a series of experiments on a biomedical documents classification task, which leveraged the semantic features generated by the word embedding approach, achieving highly competitive results. Kuang [15] proposed two algorithms based on the CBOW model and evaluated word embeddings learned from these proposed algorithms for two healthcare-related datasets. The results showed that the proposed algorithms improved accuracy by more than 9% compared to existing techniques.…”

Section: Feature Extraction From Textmentioning

confidence: 99%

A Hierarchical Feature Extraction Model for Multi-Label Mechanical Patent Classification

et al. 2018

Sustainability

View full text Add to dashboard Cite

Abstract:Various studies have focused on feature extraction methods for automatic patent classification in recent years. However, most of these approaches are based on the knowledge from experts in related domains. Here we propose a hierarchical feature extraction model (HFEM) for multi-label mechanical patent classification, which is able to capture both local features of phrases as well as global and temporal semantics. First, a n-gram feature extractor based on convolutional neural networks (CNNs) is designed to extract salient local lexical-level features. Next, a long dependency feature extraction model based on the bidirectional long-short-term memory (BiLSTM) neural network model is proposed to capture sequential correlations from higher-level sequence representations. Then the HFEM algorithm and its hierarchical feature extraction architecture are detailed. We establish the training, validation and test datasets, containing 72,532, 18,133, and 2679 mechanical patent documents, respectively, and then check the performance of HFEMs. Finally, we compared the results of the proposed HFEM and three other single neural network models, namely CNN, long-short-term memory (LSTM), and BiLSTM. The experimental results indicate that our proposed HFEM outperforms the other compared models in both precision and recall.

show abstract

“…The CBOW algorithm is capable of learning the contexts of words and is commonly applied to text classifiers, as [30] used it for classifying healthcare tweets.…”

Section: Encoding-based Wave2vec Time Series Classifiermentioning

confidence: 99%

Wave2Vec: Vectorizing Electroencephalography Bio-Signal for Prediction of Brain Disease

Kim

2018

IJERPH

View full text Add to dashboard Cite

Interest in research involving health-medical information analysis based on artificial intelligence, especially for deep learning techniques, has recently been increasing. Most of the research in this field has been focused on searching for new knowledge for predicting and diagnosing disease by revealing the relation between disease and various information features of data. These features are extracted by analyzing various clinical pathology data, such as EHR (electronic health records), and academic literature using the techniques of data analysis, natural language processing, etc. However, still needed are more research and interest in applying the latest advanced artificial intelligence-based data analysis technique to bio-signal data, which are continuous physiological records, such as EEG (electroencephalography) and ECG (electrocardiogram). Unlike the other types of data, applying deep learning to bio-signal data, which is in the form of time series of real numbers, has many issues that need to be resolved in preprocessing, learning, and analysis. Such issues include leaving feature selection, learning parts that are black boxes, difficulties in recognizing and identifying effective features, high computational complexities, etc. In this paper, to solve these issues, we provide an encoding-based Wave2vec time series classifier model, which combines signal-processing and deep learning-based natural language processing techniques. To demonstrate its advantages, we provide the results of three experiments conducted with EEG data of the University of California Irvine, which are a real-world benchmark bio-signal dataset. After converting the bio-signals (in the form of waves), which are a real number time series, into a sequence of symbols or a sequence of wavelet patterns that are converted into symbols, through encoding, the proposed model vectorizes the symbols by learning the sequence using deep learning-based natural language processing. The models of each class can be constructed through learning from the vectorized wavelet patterns and training data. The implemented models can be used for prediction and diagnosis of diseases by classifying the new data. The proposed method enhanced data readability and intuition of feature selection and learning processes by converting the time series of real number data into sequences of symbols. In addition, it facilitates intuitive and easy recognition, and identification of influential patterns. Furthermore, real-time large-capacity data analysis is facilitated, which is essential in the development of real-time analysis diagnosis systems, by drastically reducing the complexity of calculation without deterioration of analysis performance by data simplification through the encoding process.

show abstract

Learning Word Embeddings with Chi-Square Weights for Healthcare Tweet Classification

Cited by 28 publications

References 28 publications

TwitPersonality: Computing Personality Traits from Tweets Using Word Embeddings and Supervised Learning

TwitPersonality: Computing Personality Traits from Tweets Using Word Embeddings and Supervised Learning

A Hierarchical Feature Extraction Model for Multi-Label Mechanical Patent Classification

Wave2Vec: Vectorizing Electroencephalography Bio-Signal for Prediction of Brain Disease

Contact Info

Product

Resources

About