First-order factoid question answering assumes that the question can be answered by a single fact in a knowledge base (KB). While this does not seem like a challenging task, many recent attempts that apply either complex linguistic reasoning or deep neural networks achieve 65%-76% accuracy on benchmark sets. Our approach formulates the task as two machine learning problems: detecting the entities in the question, and classifying the question as one of the relation types in the KB. We train a recurrent neural network to solve each problem. On the SimpleQuestions dataset, our approach yields substantial improvements over previously published results -even neural networks based on much more complex architectures. The simplicity of our approach also has practical advantages, such as efficiency and modularity, that are valuable especially in an industry setting. In fact, we present a preliminary analysis of the performance of our model on real queries from Comcast's X1 entertainment platform with millions of users every day.
In item-based collaborative filtering, a critical intermediate step to personalized recommendations is the definition of an itemsimilarity metric. Existing algorithms compute the item-similarity using the user-to-item ratings (cosine, Pearson, Jaccard, etc.). When computing the similarity between two items A and B many of these algorithms divide the actual number of co-occurring users by some "difficulty" of co-occurrence. We refine this approach by defining item similarity as the ratio of the actual number of cooccurrences to the number of co-occurrences that would be expected if user choices were random. In the final step of our method to compute personalized recommendations we apply the usage history of a user to the item similarity matrix. The well defined probabilistic meaning of our similarities allows us to further improve this final step. We measured the quality of our algorithm on a large real-world data-set. As part of Comcast's efforts to improve its personalized recommendations of movies and TV shows, several top recommender companies were invited to apply their algorithms to one year of Video-on-Demand usage data. Our algorithm tied for first place. This paper includes a MapReduce pseudo code implementation of our algorithm.
No abstract
This paper 12 presents a novel approach to language model adaptation for speech recognition. We define mutual information histograms which account for different semantic and syntactic relations between words in text data. We introduce a novel word distance measure which is based on mutual information histograms. By using this measure we were able to create linguistically meaningful word clusters composed of words obtained in first-pass speech recognition. Words included in the clusters were used to adapt language models. Adapted language models were used for a second pass of speech recognition.We conducted experiments on the Fisher speech corpus of telephone conversations. Mutual information histograms for word pairs were estimated from the Fisher data as well as from data extracted from a corpus of New York Times articles. Results showed that word clusters conveyed significant information and could be helpful in improving speech recognition accuracy.
No abstract
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.