ABSTRACT. A keyword query is the representation of the information need of a user, and is the result of a complex cognitive process which often results in under-specification. We propose an unsupervised method namely Latent Concept Modeling (LCM) for mining and modeling latent search concepts in order to recreate the conceptual view of the original information need. We use Latent Dirichlet Allocation (LDA) to exhibit highly-specific query-related topics from pseudo-relevant feedback documents. We define these topics as the latent concepts of the user query. We perform a thorough evaluation of our approach over two large ad-hoc TREC collections. Our findings reveal that the proposed method accurately models latent concepts, while being very effective in a query expansion retrieval setting.RÉSUMÉ. Une requête est la représentation du besoin d'information d'un utilisateur, et est le résultat d'un processus cognitif complexe qui mène souvent à un mauvais choix de mots-clés. Nous proposons une méthode non supervisée pour la modélisation de concepts implicites d'une requête, dans le but de recréer la représentation conceptuelle du besoin d'information initial. Nous utilisons l'allocation de Dirichlet latente (LDA) pour détecter les concepts implicites de la requête en utilisant des documents pseudo-pertinents. Nous évaluons cette méthode en profondeur en utilisant deux collections de test de TREC. Nous trouvons notamment que notre approche permet de modéliser précisément les concepts implicites de la requête, tout en obtenant de bonnes performances dans le cadre d'une recherche de documents.
This paper describes our sentiment analysis systems which have been built for SemEval-2015 Task 10 Subtask B and E. For subtask B, a Logistic Regression classifier has been trained after extracting several groups of features including lexical, syntactic, lexiconbased, Z score and semantic features. A weighting schema has been adapted for positive and negative labels in order to take into account the unbalanced distribution of tweets between the positive and negative classes. This system is ranked third over 40 participants, it achieves average F1 64.27 on Twitter data set 2015 just 0.57% less than the first system. We also present our participation in Subtask E in which our system has got the second rank with Kendall metric but the first one with Spearman for ranking twitter terms according to their association with the positive sentiment.
The variety and diversity of published content are currently expanding in all fields of scholarly communication. Yet, scientific knowledge graphs (SKG) provide only poor images of the varied directions of alternative scientific choices, and in particular scientific controversies, which are not currently identified and interpreted. We propose to use the rich variety of knowledge present in search histories to represent cliques modeling the main interpretable practices of information retrieval issued from the same “cognitive community”, identified by their use of keywords and by the search experience of the users sharing the same research question. Modeling typical cliques belonging to the same cognitive community is achieved through a new conceptual framework, based on user profiles, namely a bipartite geometric scientific knowledge graph, SKG GRAPHYP. Further studies of interpretation will test differences of documentary profiles and their meaning in various possible contexts which studies on “disagreements in scientific literature” have outlined. This final adjusted version of GRAPHYP optimizes the modeling of “Manifold Subnetworks of Cliques in Cognitive Communities” (MSCCC), captured from previous user experience in the same search domain. Cliques are built from graph grids of three parameters outlining the manifold of search experiences: mass of users; intensity of uses of items; and attention, identified as a ratio of “feature augmentation” by literature on information retrieval, its mean value allows calculation of an observed “steady” value of the user/item ratio or, conversely, a documentary behavior “deviating” from this mean value. An illustration of our approach is supplied in a positive first test, which stimulates further work on modeling subnetworks of users in search experience, that could help identify the varied alternative documentary sources of information retrieval, and in particular the scientific controversies and scholarly disputes.
This paper describes our contribution in Opinion Target Extraction OTE and Sentiment Polarity sub tasks of SemEval 2015 ABSA task. A CRF model with IOB notation has been adopted for OTE with several groups of features including syntactic, lexical, semantic, sentiment lexicon features. Our submission for OTE is ranked fifth over twenty submissions. A Logistic Regression model with a weighting schema of positive and negative labels have been used for sentiment polarity; several groups of features (lexical, syntactic, semantic, lexicon and Z score) are extracted. Our submission for Sentiment Polarity is ranked third over ten submissions on the restaurant data set, third over thirteen on the laptops data set, but the first over eleven on the hotel data set that is out-of-domain set.
In this paper, we present our contribution in SemEval2016 task7 1 : Determining Sentiment Intensity of English and Arabic Phrases, where we use web search engines for English and Arabic unsupervised sentiment intensity prediction. Our work is based, first, on a group of classic sentiment lexicons (e.g. Sen-timent140 Lexicon, SentiWordNet). Second, on web search engines' ability to find the cooccurrence of sentences with predefined negative and positive words. The use of web search engines (e.g. Google Search API) enhance the results on phrases built from opposite polarity terms.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.