Large-scale pretrained language models have become ubiquitous in Natural Language Processing. However, most of these models are available either in high-resource languages, in particular English, or as multilingual models that compromise performance on individual languages for coverage. This paper introduces Romanian BERT, the first purely Romanian transformer-based language model, pretrained on a large text corpus. We discuss corpus composition and cleaning, the model training process, as well as an extensive evaluation of the model on various Romanian datasets. We open source not only the model itself, but also a repository that contains information on how to obtain the corpus, fine-tune and use this model in production (with practical examples), and how to fully replicate the evaluation process.
Developing an efficient system that manages distributed multimedia content supposes to minimize resource consumption while providing the most relevant results for a user's query in the shortest time. This paper presents LINDO, a generic architecture framework for distributed systems that acquires efficiency in multimedia indexing and retrieval. Three characteristics particularize it: (1) it differentiates between implicit algorithms executed over all the multimedia content at the acquisition time, and explicit algorithms, executed on demand for answering a specific need; (2) it stores and processes multimedia content and metadata locally, instead of transferring and indexing it on a central server; (3) it selects a set of relevant servers for query execution based on the user query semantic processing and on the system knowledge, including descriptions of distributed servers, multimedia content and indexing algorithms. The paper relies on a concrete implementation of the LINDO framework in order to validate this contribution.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.