Specialists, Scientists, and Sentiments: Word2Vec and Doc2Vec in Analysis of Scientific and Medical Texts

Chen, Qufei; Sokolova, Marina

doi:10.1007/s42979-021-00807-1

Cited by 17 publications

(10 citation statements)

References 36 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…However, the performance of doc2vec on our medical chats dataset was significantly lower than that of word2vec. Previous studies also reported similar results, demonstrating the better performance of word2vec over doc2vec [31][32][33][34] Accordingly, we proceed with the weighted word2vec embeddings in our numerical study. For XGBoost, while we include the message length in the triggering phase, we exclude it in the response generation phase.…”

Section: Machine Learning Modelssupporting

confidence: 58%

Auto Response Generation in Online Medical Chat Services

Jahanshahi

Kazmi

Çevik

2022

J Healthc Inform Res

View full text Add to dashboard Cite

Telehealth helps to facilitate access to medical professionals by enabling remote medical services for the patients. These services have become gradually popular over the years with the advent of necessary technological infrastructure. The benefits of telehealth have been even more apparent since the beginning of the COVID-19 crisis, as people have become less inclined to visit doctors in person during the pandemic. In this paper, we focus on facilitating chat sessions between a doctor and a patient. We note that the quality and efficiency of the chat experience can be critical as the demand for telehealth services increases. Accordingly, we develop a smart auto-response generation mechanism for medical conversations that helps doctors respond to consultation requests efficiently, particularly during busy sessions. We explore over 900,000 anonymous, historical online messages between doctors and patients collected over 9 months. We implement clustering algorithms to identify the most frequent responses by doctors and manually label the data accordingly. We then train machine learning algorithms using this preprocessed data to generate the responses. The considered algorithm has two steps: a filtering (i.e., triggering) model to filter out infeasible patient messages and a response generator to suggest the top-3 doctor responses for the ones that successfully pass the triggering phase. Among the models utilized, BERT provides an accuracy of 85.41% for precision@3 and shows robustness to its parameters. KeywordsNatural language processing • AI and healthcare • Smart chat reply • Medical services • Deep learning * Hadi Jahanshahi

show abstract

Section: Machine Learning Modelssupporting

confidence: 58%

Auto Response Generation in Online Medical Chat Services

Jahanshahi

Kazmi

Çevik

2022

J Healthc Inform Res

View full text Add to dashboard Cite

show abstract

“…Therefore, We use the most advanced embedding algorithms Doc2Vec as learning techniques. The algorithm builds word and document embeddings in an unsupervised manner (Chen & Sokolova, 2021 ).

Fig.…”

Section: Methodsmentioning

confidence: 99%

Cross-influence of information and risk effects on the IPO market: exploring risk disclosure with a machine learning approach

et al. 2022

View full text Add to dashboard Cite

The paper examines whether the structure of the risk factor disclosure in an IPO prospectus helps explain the cross-section of first-day returns in a sample of Chinese initial public offerings. This paper analyzes the semantics and content of risk disclosure based on an unsupervised machine learning algorithm. From both long-term and short-term perspectives, this paper explores how the information effect and risk effect of risk disclosure play their respective roles. The results show that risk disclosure has a stronger risk effect at the semantic novelty level and a more substantial information effect at the risk content level. A novel aspect of the paper lies in the use of text analysis (semantic novelty and content richness) to characterize the structure of the risk factor disclosure. The study shows that initial IPO returns negatively correlate with semantic novelty and content richness. We show the interaction between risk effect and information effect on risk disclosure under the nature of the same stock plate. When enterprise information transparency is low, the impact of semantic novelty and content richness on the IPO market is respectively enhanced.

show abstract

“…This study uses the word2vec model and the glove model, two of the most popular algorithms for word embeddings. The first model is the Word2Vec Model was first introduced by [ 34 ], is popular and widely used in learning word embeddings from raw text. Based on the idea of distributed representation of words, word2vec (word embeddings) uses a shallow neural network to learn word embeddings and predict the relation between every word and its context words.…”

Section: Proposed Systemmentioning

confidence: 99%

“…In word2vec, SG (skip-gram) and CBOW (Continuous Bag-of-Words) algorithms are used to produce word vectors [ 34 ]. The SG model is used to store semantic and syntactic information about sentences.…”

Section: Proposed Systemmentioning

confidence: 99%

Enhancing machine learning-based sentiment analysis through feature extraction techniques

A. Semary,

Ahmed,

Amin

et al. 2024

PLoS ONE

View full text Add to dashboard Cite

A crucial part of sentiment classification is featuring extraction because it involves extracting valuable information from text data, which affects the model’s performance. The goal of this paper is to help in selecting a suitable feature extraction method to enhance the performance of sentiment analysis tasks. In order to provide directions for future machine learning and feature extraction research, it is important to analyze and summarize feature extraction techniques methodically from a machine learning standpoint. There are several methods under consideration, including Bag-of-words (BOW), Word2Vector, N-gram, Term Frequency- Inverse Document Frequency (TF-IDF), Hashing Vectorizer (HV), and Global vector for word representation (GloVe). To prove the ability of each feature extractor, we applied it to the Twitter US airlines and Amazon musical instrument reviews datasets. Finally, we trained a random forest classifier using 70% of the training data and 30% of the testing data, enabling us to evaluate and compare the performance using different metrics. Based on our results, we find that the TD-IDF technique demonstrates superior performance, with an accuracy of 99% in the Amazon reviews dataset and 96% in the Twitter US airlines dataset. This study underscores the paramount significance of feature extraction in sentiment analysis, endowing pragmatic insights to elevate model performance and steer future research pursuits.

show abstract

Specialists, Scientists, and Sentiments: Word2Vec and Doc2Vec in Analysis of Scientific and Medical Texts

Cited by 17 publications

References 36 publications

Auto Response Generation in Online Medical Chat Services

Auto Response Generation in Online Medical Chat Services

Cross-influence of information and risk effects on the IPO market: exploring risk disclosure with a machine learning approach

Enhancing machine learning-based sentiment analysis through feature extraction techniques

Contact Info

Product

Resources

About