2015
DOI: 10.1186/1758-2946-7-s1-s4
|View full text |Cite
|
Sign up to set email alerts
|

CHEMDNER system with mixed conditional random fields and multi-scale word clustering

Abstract: BackgroundThe chemical compound and drug name recognition plays an important role in chemical text mining, and it is the basis for automatic relation extraction and event identification in chemical information processing. So a high-performance named entity recognition system for chemical compound and drug names is necessary.MethodsWe developed a CHEMDNER system based on mixed conditional random fields (CRF) with word clustering for chemical compound and drug name recognition. For the word clustering, we used B… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
37
0

Year Published

2017
2017
2024
2024

Publication Types

Select...
4
1
1

Relationship

0
6

Authors

Journals

citations
Cited by 46 publications
(37 citation statements)
references
References 8 publications
0
37
0
Order By: Relevance
“…Due to many difficulties inherent to chemical entity detection and normalisation [1], even manual annotation yields the interannotator agreement score of 91%, which can be regarded as the theoretical limit for any automatic system trained on this corpus. Twenty six teams have submitted their NER systems for the challenge, best of which have reached the F1 score of ∼ 72 − 88% [2,3,4,5,6,7,8,9] on two subtasks: chemical entity mention (CEM) and chemical document indexing (CDI).…”
Section: Content Backgroundmentioning
confidence: 99%
See 4 more Smart Citations
“…Due to many difficulties inherent to chemical entity detection and normalisation [1], even manual annotation yields the interannotator agreement score of 91%, which can be regarded as the theoretical limit for any automatic system trained on this corpus. Twenty six teams have submitted their NER systems for the challenge, best of which have reached the F1 score of ∼ 72 − 88% [2,3,4,5,6,7,8,9] on two subtasks: chemical entity mention (CEM) and chemical document indexing (CDI).…”
Section: Content Backgroundmentioning
confidence: 99%
“…Although tokenisation typically reduces the number of time-steps in the sequence, thus reducing the input complexity, it can introduce severe artefacts, e.g. merged/overlapping entities [5,9]. It makes it essential to use an adequate tokeniser with rules finely adjusted for the task at hand.…”
Section: Content Backgroundmentioning
confidence: 99%
See 3 more Smart Citations