A Year of Papers Using Biomedical Texts: Findings from the Section on Natural Language Processing of the IMIA Yearbook

Grabar, Natalia; Grouin, Cyril

doi:10.1055/s-0039-1677937

Cited by 5 publications

(2 citation statements)

References 32 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Peringkas Teks Otomatis adalah pembuatan bentuk yang lebih singkat dari suatu teks dengan memanfaatkan sistem yang dijalankan dan dioperasikan pada komputer [2]. Dua diantaranya peringkas teks dalam bidang medis adalah seperti penelitian oleh [3] [1].Banyak Teknik yang digunakan dalam peringkasan ini, antara lain Teknik Pendekatan statistika dan Teknik Pendekatan dengan Naturan Language Analysis. Beberapa Teknik Pendekatan statistika adalah Sebagai berikut [2]:…”

Section: Tinjauan Pustakaunclassified

Peringkas Teks Otomatis Dokuem Tunggal Dan Multi Bahasa Menggunakan Metode Tf-Idf

Abidin¹,

A²

2019

JTIK

View full text Add to dashboard Cite

Sistem peringkas teks otomatis berita kesehatan multi bahasa dapat digunakan oleh pembaca untuk meringkas teks berita kesehatan berbagi terjemahanya untuk kebutuhan berbagai jenis kegiatan. Untuk mengembangkan perangkat lunak yang sesuai dan mudah dimengerti untuk pengguna dalam membaca berita dan menterjemahkannya dalam bahasa internasional. Berita kesehatan dilakukan proses teks diproses hapus tanda baca, stopword, stemming, tokenizing, pembobotan kata dan pembobotan kalimat. Setelah pemrosesan teks setiap kalimat akan memiliki bobot masing-masing dari yang bernilai hingga terendah. K8 mendapatkan bobot 37.19723289, K9 mendapatkan bobot 17.89999416, K10 mendapatkan bobot 16.52464106 dan K3 mendapatkan bobot 14.77709596. metode yang digunakan adalah Metode Term frequency inverse document frequency (TF-IDF). Sistem menerima entri berita, menterjemahkan, dokumen menjadi kalimat, membuang karakter, memecahkanya menjadi kata, memberikan nilai bobot pada kata, menjumlahkan nilai bobot, menghitung nilai idf dan TD-IDF sehingga dapat memperoleh nilai bobot dari setiap kalimat yang akan menghasilkan nilai paling tinggi. Bahasa pemrograman yang digunakan adalah PHP dan DBMS menggunakan MySQL. Editor menggunakan Sublime dan Tools menggunakan Xampp 3.2.2 Sistem peringkas teks berita kesehatan multi-bahasa otomatis tidak perlu menyita waktu yang cukup lama untuk membaca dan menerjemahkan. Sistem peringkas teks berita kesehatan dengan cara menguji pembobotan kalimat manual dan otomatis dengan hasil penilaian yang sama dan sistem ini dapat memahami isi berita penting yang diinputkan, dengan memiliki verifikasi pengujian responden 54,17%.

show abstract

Section: Tinjauan Pustakaunclassified

Peringkas Teks Otomatis Dokuem Tunggal Dan Multi Bahasa Menggunakan Metode Tf-Idf

Abidin¹,

A²

2019

JTIK

View full text Add to dashboard Cite

show abstract

“…BioBERT, a biomedical-specific language model, was constructed using approximately 18 billion words from PubMed abstracts and PubMed Central full-text articles [ 4 ]. Although English is the main language being used in the field of medical NLP, multilingual approaches involving other languages (eg, Chinese, German, French, Italian, Japanese, Korean) are also being investigated [ 5 , 6 ]. Technical validation of language embedding models is also highly important in the field of medical NLP.…”

Section: Introductionmentioning

confidence: 99%

A Word Pair Dataset for Semantic Similarity and Relatedness in Korean Medical Vocabulary: Reference Development and Validation

Yum¹,

Lee²,

Jang³

et al. 2021

JMIR Med Inform

View full text Add to dashboard Cite

Background The fact that medical terms require special expertise and are becoming increasingly complex makes it difficult to employ natural language processing techniques in medical informatics. Several human-validated reference standards for medical terms have been developed to evaluate word embedding models using the semantic similarity and relatedness of medical word pairs. However, there are very few reference standards in non-English languages. In addition, because the existing reference standards were developed a long time ago, there is a need to develop an updated standard to represent recent findings in medical sciences. Objective We propose a new Korean word pair reference set to verify embedding models. Methods From January 2010 to December 2020, 518 medical textbooks, 72,844 health information news, and 15,698 medical research articles were collected, and the top 10,000 medical terms were selected to develop medical word pairs. Attending physicians (n=16) participated in the verification of the developed set with 607 word pairs. Results The proportion of word pairs answered by all participants was 90.8% (551/607) for the similarity task and 86.5% (525/605) for the relatedness task. The similarity and relatedness of the word pair showed a high correlation (ρ=0.70, P<.001). The intraclass correlation coefficients to assess the interrater agreements of the word pair sets were 0.47 on the similarity task and 0.53 on the relatedness task. The final reference standard was 604 word pairs for the similarity task and 599 word pairs for relatedness, excluding word pairs with answers corresponding to outliers and word pairs that were answered by less than 50% of all the respondents. When FastText models were applied to the final reference standard word pair sets, the embedding models learning medical documents had a higher correlation between the calculated cosine similarity scores compared to human-judged similarity and relatedness scores (namu, ρ=0.12 vs with medical text for the similarity task, ρ=0.47; namu, ρ=0.02 vs with medical text for the relatedness task, ρ=0.30). Conclusions Korean medical word pair reference standard sets for semantic similarity and relatedness were developed based on medical documents from the past 10 years. It is expected that our word pair reference sets will be actively utilized in the development of medical and multilingual natural language processing technology in the future.

show abstract

Using text mining to retrieve information about circular economy

Spreafico

2021

Computers in Industry

View full text Add to dashboard Cite

A Year of Papers Using Biomedical Texts: Findings from the Section on Natural Language Processing of the IMIA Yearbook

Cited by 5 publications

References 32 publications

Peringkas Teks Otomatis Dokuem Tunggal Dan Multi Bahasa Menggunakan Metode Tf-Idf

Peringkas Teks Otomatis Dokuem Tunggal Dan Multi Bahasa Menggunakan Metode Tf-Idf

A Word Pair Dataset for Semantic Similarity and Relatedness in Korean Medical Vocabulary: Reference Development and Validation

Using text mining to retrieve information about circular economy

Contact Info

Product

Resources

About