Cet article présente la contribution d'eXenSa à l'édition 2016 du DÉfi Fouille de Textes (DEFT) dont la tâche consiste à indexer des documents scientifiques par des mots-clefs, préalablement sélectionnés par des professionnels. Le système proposé est purement statistique et combine une approche graphique et une approche sémantique. La première approche cherche dans le titre et le résumé du document des mots graphiquement proches des mots-clefs du thésaurus. La seconde approche attribue à un nouveau document des mots-clefs associés aux documents du corpus d'entraînement qui lui sont sémantiquement proches. Les deux approches utilisent des représentations vectorielles apprises en utilisant l'algorithme NC-ISC, un algorithme stochastique de factorisation de matrices. Notre approche a été classée première en terme de F-mesure sur deux des corpus de spécialité proposés et deuxième sur les deux autres.ABSTRACT. This article presents the eXenSa contribution to the 2016 DEFT shared task. The proposed task consists in indexing bibliographic records with keywords chosen by professional indexers. We propose a statistical approach which combines graphical and semantic approaches. The first approach defines a document keywords as thesaurus terms graphically similar to terms contained in the title or the abstract of this document. The second approach assigns to document the keywords associated with semantically similar documents in training corpora. Both approaches use vector space models generated using NC-ISC, a stochastic matrix factorisation algorithm. Our system obtains the best F-score on half of the four test corpora and ranks second for the two others.
No abstract
Expression of opinion depends on the domain. For instance, some words, called here multi-polarity words, have different polarities across domain. Therefore, a classifier trained on one domain and tested on another one will not perform well without adaptation. This article presents a study of the influence of these multi-polarity words on domain adaptation for automatic opinion classification. We also suggest an exploratory method for detecting them without using any label in the target domain. We show as well how these multi-polarity words can improve opinion classification in an open-domain corpus.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.