Topic models can yield insight into how depressed and non-depressed individuals use language differently. In this paper, we explore the use of supervised topic models in the analysis of linguistic signal for detecting depression, providing promising results using several models.
Trust between users is an important piece of knowledge that can be exploited in search and recommendation. Given that user-supplied trust relationships are usually very sparse, we study the prediction of trust relationships using user interaction features in an online user generated review application context. We show that trust relationship prediction can achieve better accuracy when one adopts personalized and cluster-based classification methods. The former trains one classifier for each user using user-specific training data. The cluster-based method first constructs user clusters before training one classifier for each user cluster. Our proposed methods have been evaluated in a series of experiments using two datasets from Epinions.com. It is shown that the personalized and cluster-based classification methods outperform the global classification method, particularly for the active users.
This paper analyzes the trustor and trustee factors that lead to inter-personal trust using a well studied Trust Antecedent framework in management science [10]. To apply these factors to trust ranking problem in online rating systems, we derive features that correspond to each factor and develop different trust ranking models. The advantage of this approach is that features relevant to trust can be systematically derived so as to achieve good prediction accuracy. Through a series of experiments on real data from Epinions, we show that even a simple model using the derived features yields good accuracy and outperforms MoleTrust, a trust propagation based model. SVM classifiers using these features also show improvements.
The Dirichlet process is used to model probability distributions that are mixtures of an unknown number of components. Amino acid frequencies at homologous positions within related proteins have been fruitfully modeled by Dirichlet mixtures, and we use the Dirichlet process to derive such mixtures with an unbounded number of components. This application of the method requires several technical innovations to sample an unbounded number of Dirichlet-mixture components. The resulting Dirichlet mixtures model multiple-alignment data substantially better than do previously derived ones. They consist of over 500 components, in contrast to fewer than 40 previously, and provide a novel perspective on the structure of proteins. Individual protein positions should be seen not as falling into one of several categories, but rather as arrayed near probability ridges winding through amino acid multinomial space.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.