Jón Friðrik Daðason scite author profile

Jón Friðrik Daðason

3Publications

0Citation Statements Received

21Citation Statements Given

How they've been cited

How they cite others

Affiliations

University of Iceland

Publications

Order By: Most citations

IceSum: An Icelandic Text Summarization Corpus

Daðason¹,

Loftsson²,

Sigurðardóttir³

et al. 2021

View full text Add to dashboard Cite

Automatic Text Summarization (ATS) is the task of generating concise and fluent summaries from one or more documents. In this paper, we present IceSum, the first Icelandic corpus annotated with human-generated summaries. IceSum consists of 1,000 online news articles and their extractive summaries. We train and evaluate several neural networkbased models on this dataset, comparing them against a selection of baseline methods. The best model obtains a ROUGE-2 recall score of 71.06, outperforming all baseline methods. Furthermore, we evaluate how the amount of training data affects the quality of the generated summaries. Our results show that while the corpus is sufficiently large to train a wellperforming model, there could still be significant gains from increasing the size of the training set. We release the corpus and the models with an open license.

show abstract

Kvistur: Vélræn stofnhlutagreining samsettra orða

Daðason

Bjarnadóttir

2015

View full text Add to dashboard Cite

Compounding is extremely productive in Icelandic and multi-word compounds are common. The likelihood of finding previously unseen compounds in texts is thus very high, which makes out-of-vocabulary words a problem in the use of NLP tools. Kvistur, the decompounder described in this paper, splits Icelandic compounds and shows their binary constituent structure. The probability of a constituent in an unknown (or unanalysed) compound forming a combined constituent with either of its neighbours is estimated, with the use of data on the constituent structure of over 240 thousand compounds from the Database of Modern Icelandic Inflection (Kristín Bjarna-dótt ir 2012), and word frequencies from Íslenskur orðasjóður, a corpus of approx. 550 million words. Thus, the structure of an unknown compound is derived by comparison with compounds with partially the same constituents and similar structure in the training data. The granularity of the split returned by the decompounder is important in tasks such as semantic analysis or machine translation, where a fl at (non-structured) sequence of constituents is insufficient.

show abstract

Nefnir: A high accuracy lemmatizer for Icelandic

Ingólfsdóttir¹,

Loftsson²,

Daðason³

et al. 2019

Preprint

View full text Add to dashboard Cite

Lemmatization, finding the basic morphological form of a word in a corpus, is an important step in many natural language processing tasks when working with morphologically rich languages. We describe and evaluate Nefnir, a new open source lemmatizer for Icelandic. Nefnir uses suffix substitution rules, derived from a large morphological database, to lemmatize tagged text. Evaluation shows that for correctly tagged text, Nefnir obtains an accuracy of 99.55%, and for text tagged with a PoS tagger, the accuracy obtained is 96.88%.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.