Understanding Attack Trends from Security Blog Posts Using Guided-topic Model

Nagai, Tatsuo; Takita, Makoto; Furumoto, Keisuke; Shiraishi, Yoshiaki; Xia, Kelin; Takano, Yasuhiro; Mohri, Masami; Morii, Masakatu

doi:10.2197/ipsjjip.27.802

Cited by 2 publications

(4 citation statements)

References 8 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“… b The top 20 topic words associated with the highest coefficients are listed. Consistent with previous studies (e.g., Nagai et al, 2019; Ramesh et al, 2014; Toubia et al, 2019; Watanabe & Zhou, 2020), many words derived from guided LDA were the same as seed words, but relevant topic words were also identified. The percentage of each topic under product attributes was rounded, and the sum is based on the numbers before rounding up. c The “Other” category refers to unseeded topics in our model, to account for tweets that did not fall into any classified topic.…”

Section: Methodssupporting

confidence: 84%

“…For example, by changing seed confidence, researchers can tune their guided LDA models to classify tweets based not only on the selected seed words, but also on words’ co-occurrence patterns. Following Nagai et al’s (2019) recommendations, we set our seed confidence at .7, but future studies could usefully explore the potential impact of other seed-confidence levels on model performance.…”

Section: Discussionmentioning

confidence: 99%

“…Following Nagai et al (2019), we set the parameters as .01 for α and .01 for η, which are respectively Dirichlet priors on the per-document-topic distribution and per-topic word distribution (Gangadharan & Gupta, 2020). Meanwhile, seed confidence, that is, the probability of biasing the selection of seed-word distribution, was set at .7 (Nagai et al, 2019). Researchers running guided LDA models have included topics in addition to the number of identified seeded topics (i.e., 12, in this case) to cover documents that did not fall under any of the latter (e.g., Li et al, 2019; Ramesh et al, 2014; Shanthakumar et al, 2020), and we followed this practice as well.…”

Section: Methodsmentioning

confidence: 99%

“…Step 4: Running the Guided LDA Model We used the GuidedLDA package in Python to run guided topic modeling. Following Nagai et al (2019), we set the parameters as .01 for α and .01 for η, which are respectively Dirichlet priors on the per-document-topic distribution and per-topic word distribution (Gangadharan & Gupta, 2020). Meanwhile, seed confidence, that is, the probability of biasing the selection of seed-word distribution, was set at .7 (Nagai et al, 2019).…”

Section: Methodsmentioning

confidence: 99%

See 3 more Smart Citations

Integrating Human Insights Into Text Analysis: Semi-Supervised Topic Modeling of Emerging Food-Technology Businesses’ Brand Communication on Social Media

Chen

et al. 2023

Social Science Computer Review

View full text Add to dashboard Cite

Textual social media data have become indispensable to researchers’ understanding of message strategies and other marketing practices. In a new departure for the field of brand communication, this study adopts and extends a semi-supervised machine-learning approach, guided latent Dirichlet allocation (LDA), which incorporates human insights into the discovery and classification of topics. We used it to analyze tweets from businesses involved with an emerging food technology, cultured meat, and delineated four key message strategies used by these brands: providing functional, educational, corporate social responsibility, and relational content. We further ascertained the relationships between brands and the key topics embedded in their Twitter data. A comparison of model performance suggests that guided LDA can be an advantageous alternative to traditional LDA, which is characterized by high efficiency and immense popularity among researchers, but—because of its unsupervised nature—yields findings that can be difficult to interpret. The present study therefore has critical theoretical and methodological implications for communication and marketing scholars.

show abstract

Section: Methodssupporting

confidence: 84%

Section: Discussionmentioning

confidence: 99%

Section: Methodsmentioning

confidence: 99%

Section: Methodsmentioning

confidence: 99%

See 2 more Smart Citations

Integrating Human Insights Into Text Analysis: Semi-Supervised Topic Modeling of Emerging Food-Technology Businesses’ Brand Communication on Social Media

Chen

et al. 2023

Social Science Computer Review

View full text Add to dashboard Cite

show abstract

What Are the Attackers Doing Now? Automating Cyberthreat Intelligence Extraction from Text on Pace with the Changing Threat Landscape: A Survey

2023

View full text Add to dashboard Cite

Cybersecurity researchers have contributed to the automated extraction of CTI from textual sources, such as threat reports and online articles describing cyberattack strategies, procedures, and tools. The goal of this article is to aid cybersecurity researchers in understanding the current techniques used for cyberthreat intelligence extraction from text through a survey of relevant studies in the literature. Our work finds eleven types of extraction purposes and seven types of textual sources for CTI extraction. We observe the technical challenges associated with obtaining available clean and labeled data for replication, validation, and further extension of the studies. We advocate for building upon the current CTI extraction work to help cybersecurity practitioners with proactive decision-making such as in threat prioritization and mitigation strategy formulation to utilize knowledge from past cybersecurity incidents.

show abstract

Understanding Attack Trends from Security Blog Posts Using Guided-topic Model

Cited by 2 publications

References 8 publications

Integrating Human Insights Into Text Analysis: Semi-Supervised Topic Modeling of Emerging Food-Technology Businesses’ Brand Communication on Social Media

Integrating Human Insights Into Text Analysis: Semi-Supervised Topic Modeling of Emerging Food-Technology Businesses’ Brand Communication on Social Media

What Are the Attackers Doing Now? Automating Cyberthreat Intelligence Extraction from Text on Pace with the Changing Threat Landscape: A Survey

Contact Info

Product

Resources

About