In today's knowledge-, service-, and cloud-based economy, an overwhelming amount of business-related data are being generated at a fast rate daily from a wide range of sources. These data increasingly show all the typical properties of big data: wide physical distribution, diversity of formats, nonstandard data models, and independently managed and heterogeneous semantics. In this context, there is a need for new scalable and process-aware services for querying, exploration, and analysis of process data in the enterprise because (1) process data analysis services should be capable of processing and querying large amount of data effectively and efficiently and, therefore, have to be able to scale well with the infrastructure's scale and (2) the querying services need to enable users to express their data analysis and querying needs using process-aware abstractions rather than other lower-level abstractions. In this paper, we introduce Proces-sAtlas, ie, an extensible large-scale process data querying and analysis platform for analyzing process data in the enterprise. The ProcessAtlas platform offers an extensible architecture by adopting a service-based model so that new analytical services can be plugged into the platform. In ProcessAtlas, we present a domain-specific model for representing process knowledge, ie, process-level entities, abstractions, and the relationships among them modeled as graphs. We provide services for discovering, extracting, and analyzing process data. We provide efficient mapping and execution of process-level queries into graph-level queries by using scalable process query services to deal with the process data size growth and with the infrastructure's scale. We have implemented ProcessAtlas as a MapReduce-based prototype and report on experiments performed on both synthetic and real-world datasets. goal is to understand how a BP is performed and to identify opportunities for improvement. 2 However, the wide-scale automation in enterprises has led to having BPs implemented over many systems. Therefore, answering questions such as "What are the dependencies between 2 different cases? What is the typical path for a case and how much time and resources can be spent in processing it? What is the story behind the file number #756.1: where is the origin of this file? How it evolved over time? Who was involved in updating this file?" becomes difficult at best.The main barrier for answering questions like those aforementioned is that, in today's knowledge-, service-, and cloud-based economy, the information about process execution is scattered across several systems and data sources. Consequently, process logs increasingly come to show all typical properties of the big data 3 : wide physical distribution, diversity of formats, nonstandard data models, and independently managed and heterogeneous semantics. We use the term Process Data to refer to such large hybrid collections of heterogeneous and partially unstructured process-related execution data. Digitalization of business artifacts (eg, doc...