2020
DOI: 10.1007/s10796-020-09999-y
|View full text |Cite
|
Sign up to set email alerts
|

TextBenDS: a Generic Textual Data Benchmark for Distributed Systems

Abstract: Extracting top-k keywords and documents using weighting schemes are popular techniques employed in text mining and machine learning for different analysis and retrieval tasks. The weights are usually computed in the data preprocessing step, as they are costly to update and keep track of all the modifications performed on the dataset. Furthermore, computation errors are introduced when analyzing only subsets of the dataset. Therefore, in a Big Data context, it is crucial to lower the runtime of computing weight… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
7
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
4
1
1

Relationship

2
4

Authors

Journals

citations
Cited by 15 publications
(7 citation statements)
references
References 59 publications
0
7
0
Order By: Relevance
“…Admittedly, BigDataBench's Grep and TextBenDS's Top-K documents operation are relevant for data search. Similarly, Top-K keywords and WordCount are relevant to assess documents aggregation [9,22]. However, other operations such as finding most similar documents or clustering documents should also be considered.…”
Section: Discussionmentioning
confidence: 99%
See 2 more Smart Citations
“…Admittedly, BigDataBench's Grep and TextBenDS's Top-K documents operation are relevant for data search. Similarly, Top-K keywords and WordCount are relevant to assess documents aggregation [9,22]. However, other operations such as finding most similar documents or clustering documents should also be considered.…”
Section: Discussionmentioning
confidence: 99%
“…DLBench's data model also differs from most big data benchmarks as it provides raw tabular files, inducing an additional data integration challenge. Moreover, DLBench includes a set of long textual documents that induces a different challenge than short texts such as tweets [22] and Wikipedia articles [23]. Finally, DLBench is data-centric, unlike big data benchmarks that focus on a particular technology, e.g., TPCx-HS and HiBench.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…efficient data models, data processing pipelines and architectures to integrate standard and big data sources (Jovanovic et al 2020) as well as to improve resource utilization and aggregate performance in shared environments (Michiardi et al 2020); predictive analytics to forecast product demand in the fashion industry (Gardino et al 2020) and techniques to deal with the lack of annotated data for sensor-based human activity recognition (Prabono et al 2020); text data processing to assess the performance of text storage systems through a generic benchmark (Truicȃ et al 2020) and innovative solutions to deal with specific use cases such as the legal domain (Bordino et al 2020); novel approaches for mining social media to support intelligent transportation systems (Vallejos et al 2020) and digging deep the IoT scenario (Ustek-Spilda et al 2020); -solutions to deal with privacy issues in distance learning systems (Preuveneers et al 2020).…”
Section: Special Issue Contentmentioning
confidence: 99%
“…Paper (Truicȃ et al 2020) proposes a generic benchmark, called TextBenDS, for assessing the performance of text storage systems. At the conceptual level, the benchmark models text data (documents) as a cube.…”
Section: Text Miningmentioning
confidence: 99%