We present the system we built for participating in SemEval-2016 Task 3 on Community Question Answering. We achieved the best results on subtask C, and strong results on subtasks A and B, by combining a rich set of various types of features: semantic, lexical, metadata, and user-related. The most important group turned out to be the metadata for the question and for the comment, semantic vectors trained on QatarLiving data and similarities between the question and the comment for subtasks A and C, and between the original and the related question for Subtask B.
We describe our submission to SemEval-2016 Task 3 on Community Question Answering. We participated in subtask A, which asks to rerank the comments from the thread for a given forum question from good to bad. Our approach focuses on the generation and use of goodness polarity lexicons, similarly to the sentiment polarity lexicons, which are very popular in sentiment analysis. In particular, we use a combination of bootstrapping and pointwise mutual information to estimate the strength of association between a word (from a large unannotated set of question-answer threads) and the class of good/bad comments. We then use various features based on these lexicons to train a regression model, whose predictions we use to induce the final comment ranking. While our system was not very strong as it lacked important features, our lexicons contributed to the strong performance of another top-performing system.
We transfer a key idea from the field of sentiment analysis to a new domain: community question answering (cQA). e cQA task we are interested in is the following: given a question and a thread of comments, we want to re-rank the comments, so that the ones that are good answers to the question would be ranked higher than the bad ones. We notice that good vs. bad comments use specific vocabulary and that one can o en predict the goodness/badness of a comment even ignoring the question, based on the comment contents only. is leads us to the idea to build a good/bad polarity lexicon as an analogy to the positive/negative sentiment polarity lexicons, commonly used in sentiment analysis. In particular, we use pointwise mutual information in order to build large-scale goodness polarity lexicons in a semi-supervised manner starting with a small number of initial seeds. e evaluation results show an improvement of 0.7 MAP points absolute over a very strong baseline, and state-of-the art performance on SemEval-2016 Task 3.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.