Hyesoo Kong scite author profile

Hyesoo Kong

5Publications

16Citation Statements Received

62Citation Statements Given

How they've been cited

How they cite others

Affiliations

Korea Institute of Science & Technology Information, Yonsei University

Publications

Order By: Most citations

LAME: Layout-Aware Metadata Extraction Approach for Research Articles

Choi¹,

Kong²,

Yoon³

et al. 2022

View full text Add to dashboard Cite

The volume of academic literature, such as academic conference papers and journals, has increased rapidly worldwide, and research on metadata extraction is ongoing. However, high-performing metadata extraction is still challenging due to diverse layout formats according to journal publishers. To accommodate the diversity of the layouts of academic journals, we propose a novel LAyout-aware Metadata Extraction (LAME) framework equipped with the three characteristics (e.g., design of automatic layout analysis, construction of a large meta-data training set, and implementation of metadata extractor). In the framework, we designed an automatic layout analysis using PDFMiner. Based on the layout analysis, a large volume of metadata-separated training data, including the title, abstract, author name, author affiliated organization, and keywords, were automatically extracted. Moreover, we constructed a pre-trained model, Layout-MetaBERT, to extract the metadata from academic journals with varying layout formats. The experimental results with our metadata extractor exhibited robust performance (Macro-F1, 93.27%) in metadata extraction for unseen journals with different layout formats.

show abstract

LAME: Layout Aware Metadata Extraction Approach for Research Articles

Choi¹,

Kong²,

Yoon³

et al. 2021

Preprint

View full text Add to dashboard Cite

Generating summary sentences using Adversarially Regularized Autoencoders with conditional context

Kong

Kim

2019

Expert Systems with Applications

View full text Add to dashboard Cite

Optimizing energy consumption for a performance-aware cloud data center in the public sector

Chang

Park

Kong

et al. 2018

Sustainable Computing: Informatics and Systems

View full text Add to dashboard Cite

Building an annotated corpus for automatic metadata extraction from multilingual journal article references

et al. 2023

View full text Add to dashboard Cite

Bibliographic references containing citation information of academic literature play an important role as a medium connecting earlier and recent studies. As references contain machine-readable metadata such as author name, title, or publication year, they have been widely used in the field of citation information services including search services for scholarly information and research trend analysis. Many institutions around the world manually extract and continuously accumulate reference metadata to provide various scholarly services. However, manually collection of reference metadata every year continues to be a burden because of the associated cost and time consumption. With the accumulation of a large volume of academic literature, several tools, including GROBID and CERMINE, that automatically extract reference metadata have been released. However, these tools have some limitations. For example, they are only applicable to references written in English, the types of extractable metadata are limited for each tool, and the performance of the tools is insufficient to replace the manual extraction of reference metadata. Therefore, in this study, we focused on constructing a high-quality corpus to automatically extract metadata from multilingual journal article references. Using our constructed corpus, we trained and evaluated a BERT-based transfer-learning model. Furthermore, we compared the performance of the BERT-based model with that of the existing model, GROBID. Currently, our corpus contains 3,815,987 multilingual references, mainly in English and Korean, with labels for 13 different metadata types. According to our experiment, the BERT-based model trained using our corpus showed excellent performance in extracting metadata not only from journal references written in English but also in other languages, particularly Korean. This corpus is available at http://doi.org/10.23057/47.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Hyesoo Kong

LAME: Layout-Aware Metadata Extraction Approach for Research Articles

LAME: Layout Aware Metadata Extraction Approach for Research Articles

Generating summary sentences using Adversarially Regularized Autoencoders with conditional context

Optimizing energy consumption for a performance-aware cloud data center in the public sector

Building an annotated corpus for automatic metadata extraction from multilingual journal article references

Contact Info

Product

Resources

About