We consider the problem of automatically assigning a category to a given question posted to a Community Question Answering (CQA) site, where the question contains not only text but also an image. For example, CQA users may post a photograph of a dress and ask the community "Is this appropriate for a wedding?" where the appropriate category for this question might be "Manners, Ceremonial occasions. " We tackle this problem using Convolutional Neural Networks with a DualNet architecture for combining the image and text representations. Our experiments with real data from Yahoo Chiebukuro and crowdsourced gold-standard categories show that the DualNet approach outperforms a text-only baseline (p = .0000), a sum-and-product baseline (p = .0000), Multimodal Compact Bilinear pooling (p = .0000), and a combination of sum-and-product and MCB (p = .0000), where the p-values are based on a randomised Tukey Honestly Significant Difference test with B = 5000 trials.
RSL19BD (Waseda University Sakai Laboratory) participated in the Fourth Dialogue Breakdown Detection Challenge (DBDC4) and submitted five runs to both English and Japanese subtasks. In these runs, we utilise the Decision Treebased model and the Long Short-Term Memory-based (LSTM-based) model following the approaches of RSL17BD and KTH in the Third Dialogue Breakdown Detection Challenge (DBDC3) respectively. The Decision Tree-based model follows the approach of RSL17BD but utilises RandomForestRegressor instead of Ex-traTreesRegressor. In addition, instead of predicting the mean and the variance of the probability distribution of the three breakdown labels, it predicts the probability of each label directly. The LSTM-based model follows the approach of KTH with some changes in the architecture and utilises Convolutional Neural Network (CNN) to perform text feature extraction. In addition, instead of targeting the single breakdown label and minimising the categorical cross entropy loss, it targets the probability distribution of the three breakdown labels and minimises the mean squared error. Run 1 utilises a Decision Tree-based model; Run 2 utilises an LSTM-based model; Run 3 performs an ensemble of 5 LSTM-based models; Run 4 performs an ensemble of Run 1 and Run 2; Run 5 performs an ensemble of Run 1 and Run 3. Run 5 statistically significantly outperformed all other runs in terms of MSE (NB, PB, B) for the English data and all other runs except Run 4 in terms of MSE (NB, PB, B) for the Japanese data (alpha level = 0.05).
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.