Advanced Information Systems Engineering

2020

DOI: 10.1007/978-3-030-49435-3_30

|View full text |Cite

|

Sign up to set email alerts

|

Mutation Operators for Large Scale Data Processing Programs in Spark

João Batista de Souza Neto¹,

Anamaria Martins Moreira²,

Genoveva Vargas-Solar³

et al.

Abstract: This paper proposes a mutation testing approach for big data processing programs that follow a data flow model, such as those implemented on top of Apache Spark. Mutation testing is a fault-based technique that relies on fault simulation by modifying programs, to create faulty versions called mutants. Mutant creation is carried on by operators able to simulate specific and well identified faults. A testing process must be able to signal faults within mutants and thereby avoid having ill behaviours within a pro… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Applications Of the Model3

Discussion1

Introduction1

Citation Types

Supporting

3

Mentioning

29

Contrasting

0

Year Published

2020

2020

2022

2022

Publication Types

Select...

Article2

Book1

Relationship

Self Cite3

Independent0

Authors

Journals

Cited by 3 publications

(32 citation statements)

References 15 publications

Supporting

3

Mentioning

29

Contrasting

0

Order By: Relevance

“…A natural extension to this work would be to instantiate the tool for other systems of the data flow family (DryadLINQ, Apache Beam, Apache Flink ). This can be done by adapting TRANSMUT-Spark's front and back ends so that a program originally written in any of them can be tested with the mutation testing proposed in [21].…”

Section: Discussionmentioning

confidence: 99%

“…We first applied the model to formalize the mutation operators presented in [21], where we explored the application of mutation testing in Spark programs, and in the tool TRANSMUT-Spark 4 that we developed to automate this process.…”

Section: Applications Of the Modelmentioning

confidence: 99%

“…Faults are simulated by applying mutation operators, which are rules with modification patterns for programs (a modified program is called a mutant). In [21], we presented a set of mutation operators designed for Spark programs that are divided into two groups: mutation operators for the data flow and mutation operators for transformations.…”

Section: Applications Of the Modelmentioning

confidence: 99%

See 2 more Smart Citations

Modeling Big Data Processing Programs

¹

,

²

,

³

et al. 2020

Lecture Notes in Computer Science

Self Cite

View full text Add to dashboard Cite

We propose a new model for data processing programs. Our model generalizes the data flow programming style implemented by systems such as Apache Spark, DryadLINQ, Apache Beam and Apache Flink. The model uses directed acyclic graphs (DAGs) to represent the main aspects of data flow-based systems, namely Operations over data (filtering, aggregation, join) and Program execution defined by data dependence between operations. We use Monoid Algebra to model operations over distributed, partitioned datasets and Petri Nets to represent the data/control flow. This allows the specification of a data processing program to be agnostic of the target Big Data processing system. Our model has been used to design mutation test operators for big data processing programs. These operators have been implemented by the testing environment TRANSMUT-Spark.

“…A natural extension to this work would be to instantiate the tool for other systems of the data flow family (DryadLINQ, Apache Beam, Apache Flink ). This can be done by adapting TRANSMUT-Spark's front and back ends so that a program originally written in any of them can be tested with the mutation testing proposed in [21].…”

Section: Discussionmentioning

confidence: 99%

“…We first applied the model to formalize the mutation operators presented in [21], where we explored the application of mutation testing in Spark programs, and in the tool TRANSMUT-Spark 4 that we developed to automate this process.…”

Section: Applications Of the Modelmentioning

confidence: 99%

“…Faults are simulated by applying mutation operators, which are rules with modification patterns for programs (a modified program is called a mutant). In [21], we presented a set of mutation operators designed for Spark programs that are divided into two groups: mutation operators for the data flow and mutation operators for transformations.…”

Section: Applications Of the Modelmentioning

confidence: 99%

See 1 more Smart Citation

Modeling Big Data Processing Programs

¹

,

²

,

³

et al. 2020

Lecture Notes in Computer Science

Self Cite

View full text Add to dashboard Cite

We propose a new model for data processing programs. Our model generalizes the data flow programming style implemented by systems such as Apache Spark, DryadLINQ, Apache Beam and Apache Flink. The model uses directed acyclic graphs (DAGs) to represent the main aspects of data flow-based systems, namely Operations over data (filtering, aggregation, join) and Program execution defined by data dependence between operations. We use Monoid Algebra to model operations over distributed, partitioned datasets and Petri Nets to represent the data/control flow. This allows the specification of a data processing program to be agnostic of the target Big Data processing system. Our model has been used to design mutation test operators for big data processing programs. These operators have been implemented by the testing environment TRANSMUT-Spark.

“…The abstract and formal concepts provided by the model make it suitable for the automation of software development processes, such as those done by IDE tools. Consequently, we first applied the model to formalize the mutation operators presented in [13], where we explored the application of mutation testing in Spark programs, and in the tool TRANSMUT-Spark 1 [25] that we devel- oped to automate this process. Mutation testing is a fault-based technique that simulates faults to design and evaluate test sets [26].…”

Section: Applications Of the Modelmentioning

confidence: 99%

“…We have used the model to define mutation operators that can be instantiated for different systems. In particular, specifications in our model have been used as an intermediate representation of programs in a mutation testing tool of Apache Spark programs [13].…”

Section: Introductionmentioning

confidence: 99%

A two-level formal model for Big Data processing programs

¹

,

²

,

³

et al. 2022

Science of Computer Programming

Self Cite

View full text Add to dashboard Cite

No abstract

TRANSMUT‐Spark: Transformation mutation for Apache Spark

¹

,

²

,

³

et al. 2022

Software Testing Verif & Rel

Self Cite

View full text Add to dashboard Cite

This paper proposes TRANSMUT-SPARK for automating mutation testing of Big Data processing code within Spark programs. Apache Spark is an engine for Big Data Analytics/Processing that hides the inherent complexity of parallel Big Data programming. Nonetheless, programmers must cleverly combine Spark built-in functions within programs and guide the engine to use the right data management strategies to exploit the computational resources required by Big Data processing and avoid substantial production losses. Many programming details in Spark data processing code are prone to false statements that must be correctly and automatically tested. This paper explores the application of mutation testing in Spark programs, a faultbased testing technique that relies on fault simulation to evaluate and design test sets. The paper introduces TRANSMUT-SPARK for testing Spark programs by automating the most laborious steps of the process and fully executing the mutation testing process. The paper describes how the TRANSMUT-SPARK automates the mutants generation, test execution, and adequacy analysis phases of mutation testing. It also discusses the results of experiments to validate the tool and argues its scope and limitations.

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Product

Browser Extension Assistant by scite Citation Statement Search Reference Check Visualizations Dashboards Explore Journals Explore Organizations Explore Funders Embedding Badge Embedding Citation Search Pricing

Resources

Blog Help & FAQ Accessibility Statement API Terms For Universities & Governments For Researchers For Publishers For Corporate, Pharma & Enterprise Author Marketing Become an Affiliate Get an organization trial or quote scite Data & Services

About

News & Press Careers Read our Paper Coverage

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Copyright © 2024 scite LLC. All rights reserved.

Made with 💙 for researchers

Part of the Research Solutions Family.