JinYeong Bak scite author profile

Vector representation of words improves performance in various NLP tasks, but the high-dimensional word vectors are very difficult to interpret. We apply several rotation algorithms to the vector representation of words to improve the interpretability. Unlike previous approaches that induce sparsity, the rotated vectors are interpretable while preserving the expressive performance of the original vectors. Furthermore, any pre-built word vector representation can be rotated for improved interpretability. We apply rotation to skipgrams and glove and compare the expressive power and interpretability with the original vectors and the sparse overcomplete vectors. The results show that the rotated vectors outperform the original and the sparse overcomplete vectors for interpretability and expressiveness tasks.

show abstract

Self-disclosure topic model for classifying and analyzing Twitter conversations

Bak¹,

Lin

2014

View full text Add to dashboard Cite

Self-disclosure, the act of revealing oneself to others, is an important social behavior that strengthens interpersonal relationships and increases social support. Although there are many social science studies of self-disclosure, they are based on manual coding of small datasets and questionnaires. We conduct a computational analysis of self-disclosure with a large dataset of naturally-occurring conversations, a semi-supervised machine learning algorithm, and a computational analysis of the effects of self-disclosure on subsequent conversations. We use a longitudinal dataset of 17 million tweets, all of which occurred in conversations that consist of five or more tweets directly replying to the previous tweet, and from dyads with twenty of more conversations each. We develop self-disclosure topic model (SDTM), a variant of latent Dirichlet allocation (LDA) for automatically classifying the level of self-disclosure for each tweet. We take the results of SDTM and analyze the effects of self-disclosure on subsequent conversations. Our model significantly outperforms several comparable methods on classifying the level of selfdisclosure, and the analysis of the longitudinal data using SDTM uncovers significant and positive correlation between selfdisclosure and conversation frequency and length.

show abstract

Variational Hierarchical User-based Conversation Model

Bak¹,

Oh²

2019

View full text Add to dashboard Cite

Generating appropriate conversation responses requires careful modeling of the utterances and speakers together. Some recent approaches to response generation model both the utterances and the speakers, but these approaches tend to generate responses that are overly tailored to the speakers. To overcome this limitation, we propose a new model with a stochastic variable designed to capture the speaker information and deliver it to the conversational context. An important part of this model is the network of speakers in which each speaker is connected to one or more conversational partner, and this network is then used to model the speakers better. To test whether our model generates more appropriate conversation responses, we build a new conversation corpus containing approximately 27,000 speakers and 770,000 conversations. With this corpus, we run experiments of generating conversational responses and compare our model with other state-of-the-art models. By automatic evaluation metrics and human evaluation, we show that our model outperforms other models in generating appropriate responses. An additional advantage of our model is that it generates better responses for various new user scenarios, for example when one of the speakers is a known user in our corpus but the partner is a new user. For replicability, we make available all our code and data 1 .

show abstract

Self-disclosure topic model for Twitter conversations

Bak¹,

Lin²,

Oh³

2014

View full text Add to dashboard Cite

Self-disclosure, the act of revealing oneself to others, is an important social behavior that contributes positively to intimacy and social support from others. It is a natural behavior, and social scientists have carried out numerous quantitative analyses of it through manual tagging and survey questionnaires. Recently, the flood of data from online social networks (OSN) offers a practical way to observe and analyze self-disclosure behavior at an unprecedented scale. The challenge with such analysis is that OSN data come with no annotations, and it would be impossible to manually annotate the data for a quantitative analysis of self-disclosure. As a solution, we propose a semi-supervised machine learning approach, using a variant of latent Dirichlet allocation for automatically classifying self-disclosure in a massive dataset of Twitter conversations. For measuring the accuracy of our model, we manually annotate a small subset of our dataset, and we show that our model shows significantly higher accuracy and F-measure than various other methods.With the results our model, we uncover a positive and significant relationship between self-disclosure and online conversation frequency over time.

show abstract

Learning Sequential and Structural Information for Source Code Summarization

Choi¹,

Bak²,

Na³

et al. 2021

View full text Add to dashboard Cite

We propose a model that learns both the sequential and the structural features of code for source code summarization. We adopt the abstract syntax tree (AST) and graph convolution to model the structural information and the Transformer to model the sequential information. We convert code snippets into ASTs and apply graph convolution to obtain structurally-encoded node representations. Then, the sequences of the graphconvolutioned AST nodes are processed by the Transformer layers. Since structurallyneighboring nodes will have similar representations in graph-convolutioned trees, the Transformer layers can effectively capture not only the sequential information but also the structural information such as sentences or blocks of source code. We show that our model outperforms the state-of-the-art for source code summarization by experiments and human evaluations.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

JinYeong Bak

Rotated Word Vector Representations and their Interpretability

Self-disclosure topic model for classifying and analyzing Twitter conversations

Variational Hierarchical User-based Conversation Model

Self-disclosure topic model for Twitter conversations

Learning Sequential and Structural Information for Source Code Summarization

Contact Info

Product

Resources

About