Supervised learning for robust term extraction

Yuan, Yu; Gao, Jie; Zhang, Yue

doi:10.1109/ialp.2017.8300603

Cited by 15 publications

(13 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…4. In the table, [5] achieves a satisfying result with Random Forest method in their paper, but the feature preparation is complex and time-consuming. The training data is also re-balanced on the positive and negative instances.…”

Section: Results and Analysismentioning

confidence: 97%

“…2. Yuan et al [5], a feature-based machine learning method using n-grams as term candidates and 10 kinds features is pre-processed for each candidate. Our best models for testing are chosen with the loss of development dataset Noted that: Decreasing the term ratio α will increase the precision but degenerate the recall (Fig.…”

Section: Results and Analysismentioning

confidence: 99%

“…Machine-learning based ATE [5,6,7,8] is to design and learn different features in the raw text or from syntax information, and then integrate these features into a machine learning method (such as conditional random field, supporting vector classifier). However, different domain, especially language shares different feature patterns, making this method specified to one language or domain.…”

Section: Related Workmentioning

confidence: 99%

“…2 for more details about term span. [1,3], [1,4], [1,5], [2,2], [2,3], [2,4], [2,5], [3,3], [3,4], [3,5], [4,4], [4,5], [5,5]…”

Section: Term Spansmentioning

confidence: 99%

See 3 more Smart Citations

Feature-Less End-to-End Nested Term Extraction

Gao

Yuan

2019

Natural Language Processing and Chinese Computing

Self Cite

View full text Add to dashboard Cite

In this paper, we proposed a deep learning-based end-toend method on domain specified automatic term extraction (ATE), it considers possible term spans within a fixed length in the sentence and predicts them whether they can be conceptual terms. In comparison with current ATE methods, the model supports nested term extraction and does not crucially need extra (extracted) features. Results show that it can achieve a high recall and a comparable precision on term extraction task with inputting segmented raw text.

show abstract

Section: Results and Analysismentioning

confidence: 97%

Section: Results and Analysismentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

“…2 for more details about term span. [1,3], [1,4], [1,5], [2,2], [2,3], [2,4], [2,5], [3,3], [3,4], [3,5], [4,4], [4,5], [5,5]…”

Section: Term Spansmentioning

confidence: 99%

See 2 more Smart Citations

Feature-Less End-to-End Nested Term Extraction

Gao

Yuan

2019

Natural Language Processing and Chinese Computing

Self Cite

View full text Add to dashboard Cite

show abstract

“…Given training data, machine learning based methods [Astrakhantsev 2014;Conrado et al 2013;Fedorenko et al 2014;Maldonado and Lewis 2016] typically transform training instances into a feature space and train a classifier that can be later used for prediction. The features can be linguistic (e.g., PoS pattern, presence of special characters, etc), or statistical or a combination of both, which often utilise scores calculated by statistical ATE metrics [Maldonado and Lewis 2016;Yuan et al 2017]. However, one of the major problems in applying machine learning to ATE is the availability of reliable training data.…”

Section: Classic Unithood and Termhood Based Methodsmentioning

confidence: 99%

SemRe-Rank

Zhang

Gao

Ciravegna

2018

ACM Trans. Knowl. Discov. Data

Self Cite

View full text Add to dashboard Cite

Automatic Term Extraction deals with the extraction of terminology from a domain specific corpus, and has long been an established research area in data and knowledge acquisition. ATE remains a challenging task as it is known that there is no existing ATE methods that can consistently outperform others in any domain. This work adopts a refreshed perspective to this problem: instead of searching for such a 'one-size-fit-all' solution that may never exist, we propose to develop generic methods to 'enhance' existing ATE methods. We introduce SemRe-Rank, the first method based on this principle, to incorporate semantic relatedness-an often overlooked venue-into an existing ATE method to further improve its performance. SemRe-Rank incorporates word embeddings into a personalised PageRank process to compute 'semantic importance' scores for candidate terms from a graph of semantically related words (nodes), which are then used to revise the scores of candidate terms computed by a base ATE algorithm. Extensively evaluated with 13 state-of-the-art base ATE methods on four datasets of diverse nature, it is shown to have achieved widespread improvement over all base methods and across all datasets, with up to 15 percentage points when measured by the Precision in the top ranked K candidate terms (the average for a set of K's), or up to 28 percentage points in F1 measured at a K that equals to the expected real terms in the candidates (F1 in short). Compared to an alternative approach built on the well-known TextRank algorithm, SemRe-Rank can potentially outperform by up to 8 points in Precision at top K, or up to 17 points in F1.

show abstract