2014
DOI: 10.1007/978-3-319-10596-3_11
|View full text |Cite
|
Sign up to set email alerts
|

BDGS: A Scalable Big Data Generator Suite in Big Data Benchmarking

Abstract: Data generation is a key issue in big data benchmarking that aims to generate application-specific data sets to meet the 4V requirements of big data. Specifically, big data generators need to generate scalable data (Volume) of different types (Variety) under controllable generation rates (Velocity) while keeping the important characteristics of raw data (Veracity). This gives rise to various new challenges about how we design generators efficiently and successfully. To date, most existing techniques can only g… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
50
0

Year Published

2014
2014
2021
2021

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 68 publications
(50 citation statements)
references
References 20 publications
0
50
0
Order By: Relevance
“…Spark SQL queries from BigDataBench have been reprogrammed to use DataFrame API. Big Data Generator Suite (BDGS), an open source tool is used to generate synthetic data sets based on raw data sets [23].…”
Section: Workloadsmentioning
confidence: 99%
“…Spark SQL queries from BigDataBench have been reprogrammed to use DataFrame API. Big Data Generator Suite (BDGS), an open source tool is used to generate synthetic data sets based on raw data sets [23].…”
Section: Workloadsmentioning
confidence: 99%
“…Wu et al [14] name several data mining approaches with need for research to address these issues, including: Feature selection and unsupervised learning methods for sparsity, error-aware data mining for uncertainty, and data imputation methods for incompleteness. Another research direction in connection with the quality of input data is the generation of authentic synthetic data, which are needed to benchmark different Big Data solutions, for instance [15]. 10) Do Big Data lead to better results?…”
Section: ) How Can Manipulations Of Input Data Be Detected?mentioning
confidence: 99%
“…Beyond the benchmarks for transactional processing systems, dozens of benchmarks have been proposed for big analytical processing systems, such as CloudSuite [21], BigBench [22], DCBench [23], Hibench [24], GraySort [25], CloudRank-D [26], and several other research work in [27]- [30], and so on. These benchmarks supply a set of analytical jobs or OLAP queries to test the performance to MapReduce-style systems.…”
Section: A Big Data Systems Benchmarksmentioning
confidence: 99%