Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics 2019
DOI: 10.18653/v1/p19-1557
|View full text |Cite
|
Sign up to set email alerts
|

Towards Integration of Statistical Hypothesis Tests into Deep Neural Networks

Abstract: We report our ongoing work about a new deep architecture working in tandem with a statistical test procedure for jointly training texts and their label descriptions for multi-label and multi-class classification tasks. A statistical hypothesis testing method is used to extract the most informative words for each given class. These words are used as a class description for more label-aware text classification. Intuition is to help the model to concentrate on more informative words rather than more frequent ones… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
8
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
2
2
1

Relationship

2
3

Authors

Journals

citations
Cited by 6 publications
(8 citation statements)
references
References 22 publications
0
8
0
Order By: Relevance
“…The Artists dataset contains alternative names for music artists extracted from MusicBrains [26]. Finally, the Patent Assignee dataset contains the aliases of assignees in patent documents 2 . Table 1 All entities in the training data including true and false names are used to generate a list of most frequent n-gram characters limited to bi-, tri-, and four-grams.…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…The Artists dataset contains alternative names for music artists extracted from MusicBrains [26]. Finally, the Patent Assignee dataset contains the aliases of assignees in patent documents 2 . Table 1 All entities in the training data including true and false names are used to generate a list of most frequent n-gram characters limited to bi-, tri-, and four-grams.…”
Section: Resultsmentioning
confidence: 99%
“…Most advanced neural models today use CRFs for making inference but instead of doing feature engineering manually, they use a form of Deep Neural Network (DNN) such as Long Short Term Memory (LSTM), Convolutional Neural Network (CNN), or Gated Recurrent Unit (GRU) for automatic feature learning [2,18]. Designing a NED system by feature engineering is a highly time-consuming process, hence end-to-end neural systems capable of learning the features on their own [16,29] are more approachable systems.…”
Section: Related Workmentioning
confidence: 99%
“…We use an advanced model based on Deep Learning proposed by [10]. The model uses Gated Recurrent Units (GRU) [29] for encoding the input features.…”
Section: Dl-based Classificationmentioning
confidence: 99%
“…We automatically tagged the articles in MEDLINE with these ontologies to compile a dataset for Ontology Classification (Please see Section 3). To establish a baseline for Ontology Classification over this dataset, we opted for three models: a Naive Bayes classifier, a Deep Learning(DL)-based model proposed by [10], and a Transformer-based model called PubMedBERT [11]. Before providing our detailed methods and results, we review the state-of-the-art biomedical text classification for the sake of completeness.…”
Section: Introductionmentioning
confidence: 99%
“…Although bilingual corpus is very important for NMT, it is very scare for many low-recourse language pairs. Many language pairs and domains are suffer from the scarcity problem of bilingual corpus on machine translation [6]. Therefore, recent works attempt to mining bilingual parallel sentences from Internet which has enough monolingual corpus for many languages [7,8].…”
Section: Introductionmentioning
confidence: 99%