Proceedings of the Ninth International Workshop on Health Text Mining and Information Analysis 2018
DOI: 10.18653/v1/w18-5608
|View full text |Cite
|
Sign up to set email alerts
|

De-identifying Free Text of Japanese Dummy Electronic Health Records

Abstract: A new law was established in Japan to promote utilization of EHRs for research and developments, while de-identification is required to use EHRs. However, studies of automatic de-identification in the healthcare domain is not active for Japanese language, no de-identification tool available in practical performance for Japanese medical domains, as far as we know. Previous work shows that rule-based methods are still effective, while deep learning methods are reported to be better recently. In order to implemen… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
14
0

Year Published

2020
2020
2021
2021

Publication Types

Select...
3
3

Relationship

1
5

Authors

Journals

citations
Cited by 10 publications
(14 citation statements)
references
References 7 publications
0
14
0
Order By: Relevance
“…According to Dernoncourt et al [33], the plain LSTM approach works best. However, from a consistency perspective, Kajiyama et al [39] recommend their hybrid method, but the highest F-measure score was recorded for their rule-based system. Trienes et al [47] are the only researchers whose results favor a combined LSTM-CRF approach.…”
Section: Discussion Of Surveyed Workmentioning
confidence: 99%
See 1 more Smart Citation
“…According to Dernoncourt et al [33], the plain LSTM approach works best. However, from a consistency perspective, Kajiyama et al [39] recommend their hybrid method, but the highest F-measure score was recorded for their rule-based system. Trienes et al [47] are the only researchers whose results favor a combined LSTM-CRF approach.…”
Section: Discussion Of Surveyed Workmentioning
confidence: 99%
“…We expected to encounter a moderate level of inconsistency since there is obviously no universal standard on reporting dataset information, but unfortunately, the issue borders on the extreme. On one hand, researchers such as Trienes et al [47] provide adequate information, and on the other hand, researchers such as Kajiyama et al [39] are not as forthcoming. The inconsistency makes the process of drawing comparisons between the surveyed works more challenging.…”
Section: Discussion Of Surveyed Workmentioning
confidence: 99%
“…A major milestone was achieved in 2017, when the government legalized the use of big data (including cloud applications) in healthcare through the Medical Big Data Law, which detailed provisions for handling personal medical data and telemedicine, and covered online consultation within health insurance [ 84 ]. In the same year, the Personal Information Protection Act was passed, which established standards for careful handling of EHRs compared to other forms of personal information [ 85 ].…”
Section: Casesmentioning
confidence: 99%
“…We have observed the coherency of the original annotations of the datasets. Overall, this study differs from our earlier work [20] in that we added a new pathology dataset and its annotations, trained and evaluated our machine learning models using the new dataset, and evaluated the results using newly created annotations by three annotators to observe characteristics of the original and our own annotations.…”
mentioning
confidence: 99%
“…To evaluate the effectiveness of such different methods for the Japanese language, we implemented two EHR deidentification systems for the Japanese language in our earlier work [20]. We used the MedNLP shared task dataset and our own dummy EHR dataset, which was written as a virtual database by medical professionals who hold medical doctor certification.…”
mentioning
confidence: 99%