Massive datasets are quickly becoming a concern for many industries. For example, many web-based applications must be able to handle petabytes worth of transactions on a daily basis, and moreover, be able to quickly and efficiently act upon data that exists in each transaction. As a result, providing testing capabilities for such applications becomes a challenge of scale. We argue that existing approaches, such as automated test suite generation, may not necessarily scale without assistance. To this end, we discuss open issues and possible solutions specific to testing big data applications.
CCS Concepts•Software and its engineering → Software testing and debugging; Search-based software engineering; Software system structures;Keywords big data, search-based software testing, test suite generation
OVERVIEWMany techniques are currently being developed for generating datasets of massive scale (i.e., big data) for use in validating applications [1]. However, there is little published research in performing testing on applications that already interact with big data [9]. Moreover, even fewer publications explore how search-based software testing (SBST) techniques can be used to optimize testing strategies [6,8]. As such, research needs to be performed in testing big data applications to determine both the feasibility and applicability of existing testing techniques to such applications. For example, consider a nationwide healthcare network that centralizes medical records for all patients. Such a system can deals with an enormous amount of data as well as an amalgam of heterogeneous systems and devices. This system can enable a patient to visit their primary care physician, receive a prescription for treatment with a specialist in another state, and then enable that specialist to instantly retrieve the entirety of the patient's medical history. As such, specialized applications will require development to handle the dataset, including optimizations for querying and retrieving specific data. However, such applications may not be effectively tested by existing strategies, given the wide range of values that may manifest. As such, this position paper specifically argues for an examination on how big data can impact existing testing strategies, focusing on automated test suite generation.Traditionally, software testing has been considered an ideal field for application of search-based heuristics, such as genetic algorithms [7]. Notable systems include EvoSuite [5] and Nighthawk [2] for automated generation of test suites and instantiation of unit tests, respectively. Given the optimization problems that typically comprise a software testing strategy (e.g., test suite generation, test case prioritization and selection, etc.), search-based heuristics have been shown to quickly and efficiently come to an optimal solution. However, many industries are moving towards the big data paradigm, where petabytes of data must be considered at run time. As such, a strategy such as test suite generation may be cost-prohibitive, given t...