Background
Text mining methods such as topic modeling can offer valuable information on how and to whom internet-delivered cognitive behavioral therapies (iCBT) work. Although iCBT treatments provide convenient data for topic modeling, it has rarely been used in this context.
Objective
Our aims were to apply topic modeling to written assignment texts from iCBT for generalized anxiety disorder and explore the resulting topics’ associations with treatment response. As predetermining the number of topics presents a considerable challenge in topic modeling, we also aimed to explore a novel method for topic number selection.
Methods
We defined 2 latent Dirichlet allocation (LDA) topic models using a novel data-driven and a more commonly used interpretability-based topic number selection approaches. We used multilevel models to associate the topics with continuous-valued treatment response, defined as the rate of per-session change in GAD-7 sum scores throughout the treatment.
Results
Our analyses included 1686 patients. We observed 2 topics that were associated with better than average treatment response: “well-being of family, pets, and loved ones” from the data-driven LDA model (B=–0.10 SD/session/∆topic; 95% CI –016 to –0.03) and “children, family issues” from the interpretability-based model (B=–0.18 SD/session/∆topic; 95% CI –0.31 to –0.05). Two topics were associated with worse treatment response: “monitoring of thoughts and worries” from the data-driven model (B=0.06 SD/session/∆topic; 95% CI 0.01 to 0.11) and “internet therapy” from the interpretability-based model (B=0.27 SD/session/∆topic; 95% CI 0.07 to 0.46).
Conclusions
The 2 LDA models were different in terms of their interpretability and broadness of topics but both contained topics that were associated with treatment response in an interpretable manner. Our work demonstrates that topic modeling is well suited for iCBT research and has potential to expose clinically relevant information in vast text data.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.