2017 International Conference on Asian Language Processing (IALP) 2017
DOI: 10.1109/ialp.2017.8300603
|View full text |Cite
|
Sign up to set email alerts
|

Supervised learning for robust term extraction

Abstract: We propose a machine learning method to automatically classify the extracted ngrams from a corpus into terms and non-terms. We use 10 common statistics in previous term extraction literature as features for training. The proposed method, applicable to term recognition in multiple domains and languages, can help 1) avoid the laborious work in the post-processing (e.g. subjective threshold setting); 2) handle the skewness and demonstrate noticeable resilience to domain-shift issue of training data. Experiments a… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
13
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
4
2

Relationship

2
4

Authors

Journals

citations
Cited by 15 publications
(13 citation statements)
references
References 9 publications
0
13
0
Order By: Relevance
“…4. In the table, [5] achieves a satisfying result with Random Forest method in their paper, but the feature preparation is complex and time-consuming. The training data is also re-balanced on the positive and negative instances.…”
Section: Results and Analysismentioning
confidence: 97%
See 3 more Smart Citations
“…4. In the table, [5] achieves a satisfying result with Random Forest method in their paper, but the feature preparation is complex and time-consuming. The training data is also re-balanced on the positive and negative instances.…”
Section: Results and Analysismentioning
confidence: 97%
“…2. Yuan et al [5], a feature-based machine learning method using n-grams as term candidates and 10 kinds features is pre-processed for each candidate. Our best models for testing are chosen with the loss of development dataset Noted that: Decreasing the term ratio α will increase the precision but degenerate the recall (Fig.…”
Section: Results and Analysismentioning
confidence: 99%
See 2 more Smart Citations
“…Given training data, machine learning based methods [Astrakhantsev 2014;Conrado et al 2013;Fedorenko et al 2014;Maldonado and Lewis 2016] typically transform training instances into a feature space and train a classifier that can be later used for prediction. The features can be linguistic (e.g., PoS pattern, presence of special characters, etc), or statistical or a combination of both, which often utilise scores calculated by statistical ATE metrics [Maldonado and Lewis 2016;Yuan et al 2017]. However, one of the major problems in applying machine learning to ATE is the availability of reliable training data.…”
Section: Classic Unithood and Termhood Based Methodsmentioning
confidence: 99%