Background
Online medical and health communities provide a platform for internet users to share experiences and ask questions about medical and health issues. However, there are problems in these communities, such as the low accuracy of the classification of users’ questions and the uneven health literacy of users, which affect the accuracy of user retrieval and the professionalism of the medical personnel answering the question. In this context, it is essential to study more effective classification methods of users’ information needs.
Objective
Most online medical and health communities tend to provide only disease-type labels, which do not give a comprehensive summary of users’ needs. The study aims to construct a multilevel classification framework based on the graph convolutional network (GCN) model for users’ needs in online medical and health communities so that users can perform more targeted information retrieval.
Methods
Using the Chinese online medical and health community “Qiuyi” as an example, we crawled questions posted by users in the “Cardiovascular Disease” section as the data source. First, the disease types involved in the problem data were segmented by manual coding to generate the first-level label. Second, the needs were identified by K-means clustering to generate the users’ information needs label as the second-level label. Finally, by constructing a GCN model, users’ questions were automatically classified, thus realizing the multilevel classification of users’ needs.
Results
Based on the empirical research of questions posted by users in the “Cardiovascular Disease” section of Qiuyi, the hierarchical classification of users’ questions (data) was realized. The classification models designed in the study achieved accuracy, precision, recall, and F1-score of 0.6265, 0.6328, 0.5788, and 0.5912, respectively. Compared with the traditional machine learning method naïve Bayes and the deep learning method hierarchical text classification convolutional neural network, our classification model showed better performance. At the same time, we also performed a single-level classification experiment on users’ needs, which in comparison with the multilevel classification model exhibited a great improvement.
Conclusions
A multilevel classification framework has been designed based on the GCN model. The results demonstrated that the method is effective in classifying users’ information needs in online medical and health communities. At the same time, users with different diseases have different directions for information needs, which plays an important role in providing diversified and targeted services to the online medical and health community. Our method is also applicable to other similar disease classifications.