Anais Do III Dataset Showcase Workshop (DSW 2021) 2021
DOI: 10.5753/dsw.2021.17416
|View full text |Cite
|
Sign up to set email alerts
|

PPORTAL: Public Domain Portuguese-language Literature Dataset

Abstract: Combining human expertise with book-consumers data may generate what is needed to sustain constant changes experienced in the book publishing market. Then, building and making available datasets that entirely comprise the essential elements of the book industry ecosystem is essential. However, little has been done in such a context for non-English languages, such as Portuguese. Hence, we introduce PPORTAL, a public domain Portuguese-language literature dataset composed of books-related metadata. After an overv… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
5
0
1

Year Published

2022
2022
2023
2023

Publication Types

Select...
4
1
1

Relationship

1
5

Authors

Journals

citations
Cited by 6 publications
(6 citation statements)
references
References 11 publications
0
5
0
1
Order By: Relevance
“…This article extends a previous paper from the Dataset Showcase Workshop of Brazilian Symposium on Databases 2021 [Silva et al 2021a]. Specifically, the related work is updated; we handle missing data by also considering a new data source, the isbntools Python library; the dataset considers new features generated by sentiment analysis tools based on online reviews; we also introduce three examples of real-world applications (book genre classification, sentiment analysis on book reviews) and two social network analyses.…”
Section: Original Text In Portuguesementioning
confidence: 90%
See 1 more Smart Citation
“…This article extends a previous paper from the Dataset Showcase Workshop of Brazilian Symposium on Databases 2021 [Silva et al 2021a]. Specifically, the related work is updated; we handle missing data by also considering a new data source, the isbntools Python library; the dataset considers new features generated by sentiment analysis tools based on online reviews; we also introduce three examples of real-world applications (book genre classification, sentiment analysis on book reviews) and two social network analyses.…”
Section: Original Text In Portuguesementioning
confidence: 90%
“…With so much digital information available, ML-based solutions have been developed to predict success and recommend items in such industries as well. Both applications may directly use PPORTAL, and we refer to the original work for more discussion on the subject [Silva et al 2021a].…”
Section: Other Scenariosmentioning
confidence: 99%
“…In this work, we use PPORTAL, which stores metadata related to over 80,000 public domain works in the Portuguese language [Silva et al 2021]. In particular, PPORTAL consists of three digital libraries of public domain works, mainly from Brazil and Portugal: Domínio Público, 1 Projeto Adamastor, 2…”
Section: Datasetmentioning
confidence: 99%
“…In this work, we use PPORTAL (Public Domain Portuguese-language Literature Dataset) [13], a cross-collection dataset of public domain Portuguese-language books. PPORTAL is primarily composed of well-known digital libraries for public domain works mainly from Brazil and Portugal: Domínio Público, 2 Projecto Adamastor, 3 and Biblioteca Digital de Literatura de Países Lusófonos (BLPL), 4 all integrated with additional data obtained from Goodreads 5 platform.…”
Section: Datasetmentioning
confidence: 99%