This paper presents our submissions for the CoNLL 2017 UD Shared Task. Our parser, called UParse, is based on a neural network graph-based dependency parser. The parser uses features from a bidirec-tional LSTM to produce a distribution over possible heads for each word in the sentence. To allow transfer learning for low-resource treebanks and surprise languages, we train several multilingual models for related languages, grouped by their genus and language families. Out of 33 participants , our system achieves rank 9th in the main results, with 75.49 UAS and 68.87 LAS F-1 scores (average across 81 tree-banks).
This paper presents a methodology for identifying and resolving various kinds of inconsistency in the context of merging dependency and multiword expression (MWE) annotations, to generate a dependency treebank with comprehensive MWE annotations. Candidates for correction are identified using a variety of heuristics, including an entirely novel one which identifies violations of MWE constituency in the dependency tree, and resolved by arbitration with minimal human intervention. Using this technique, we identified and corrected several hundred errors across both parse and MWE annotations, representing changes to a significant percentage (well over 10%) of the MWE instances in the joint corpus.
Scientific data sharing services are becoming an essential tool in data driven science and can significantly improve the scientific process by making reliable and trustworthy data available, thereby reducing work redundancy, and providing insights on related research and recent advancements. For data sharing services to be useful in the scientific process, they need to fulfill a number of requirements that cover not only discovery and access to data but also ensure the integrity and reliability of published data. B2SHARE, developed by the EUDAT [1] project, provides such a data sharing service to scientific communities. For communities that wishes to download, install and maintain their own service, B2SHARE is also available as software. B2SHARE is developed with a focus on user-friendliness, customizability, reliability, and trustworthiness, and can be customized for different organizations and use-cases. In this paper we discuss the design, architecture, and implementation of B2SHARE and show its usefulness in the scientific process with a couple of case studies from the biodiversity field.
The authors address the legal issues relating to the creation and use of language models. The article begins with an explanation of the development of language technologies. The authors analyse the technological process within the framework copyright, related rights and personal data protection law. The authors also cover commercial use of language models. The authors' main argument is that legal restrictions applicable to language data containing copyrighted material and personal data usually do not apply to language models. Language models are generally not considered derivative works. Due to a wide range of language models, this position is not absolute.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.