Abstract. Our ongoing project is aimed at improving information access to narrow-domain collections of questions and answers. This poster demonstrates how out-of-the-box tools and domain dictionaries can be applied to community question answering (CQA) content in health domain. This approach can be used to improve user interfaces and search over CQA data, as well as to evaluate content quality. The study is a first-time use of a sizable dataset from the Russian CQA site Otvety@Mail.Ru.Keywords: community question answering, CQA, consumer health information, content analysis, latent Dirichlet allocation, LDA, Otvety@Mail.Ru
IntroductionAccording to a 2009 survey, 61% of American adults look for health information online [2]. A recent study reports that 55% of Russian adults do not go to the doctor if they are indisposed; in case of self-treatment 32% seek advice from friends and acquaintances or search information on the Web [4]. Community question answering (CQA) is one of the major destinations for health-related inquiries. Vast amounts of data collected by the CQA sites allow for re-using the "wisdom of crowds" [3]. Our study focuses on questions and answers on health and medicine. This topic is highly exemplary for CQA: search context (e.g. age, gender, or weight of the person the information is sought for) is important; ideally, the answerer has practical experience with the topic; users prefer a personalized answer. The quality of user-generated content (UGC) is essential for answers in the Health category.Recent studies on health-related CQA data have relied on manual processing of small samples [5], [7]. An approach close to ours is described in [6]: topic modeling is applied to Twitter data in health domain. In our study we use latent Dirichlet allocation (LDA), domain dictionaries, and exploit question-answer structure of the pages to characterize the content. The approach can contribute to a better understanding and representation of CQA data, improved focused search and user interfaces, as well as content quality evaluation on a larger scale. The dataset used in the research comes from a popular Russian CQA site Otvety@Mail.Ru (http://otvet.mail.ru).