a b s t r a c tThe continuing development of the organic-rich and extremely low permeability shale reservoirs in the United States has the potential to positively impact the future of carbon storage. Due to the unique characteristics of shale reservoirs, not only can CO 2 be safely stored, it also can be preferentially adsorbed and displace methane, leading to enhanced gas recovery.However, CO 2 storage in depleted or nearly depleted shale formations is not completely risk free. Thus, prior to making the economic commitment to a full-field CO 2 sequestration project, a systematic analysis of the complete set of variables must be considered in the planning of a shale-CO 2 storage initiative. Numerical modeling and simulation is a robust tool that can provide an insight into how the system may operate in order to further understand the feasibility and assist in the design and operation of such a project, and to predict changes that may occur.In order to perform a comprehensive uncertainty analysis, a large number of simulation runs are required. Designing and running simulation cases to model enhanced gas recovery and storage in shale by applying the Explicit Hydraulic Fracture modeling technique (EHF) is long and laborious, and its implementation is computationally expensive.In this paper, a data-driven approach with pattern recognition algorithms is used to develop a new generation of a shale proxy model at the hydraulic fracture cluster level, as a replica of a reservoir simulation model. For more accurate analysis, instead of commonly used mechanistic models, a historymatched hydraulic fractured Marcellus shale pad with multiple stages/clusters is used as a base case to perform the analysis. The detailed procedure for development of the Data-driven proxy model is explained and the model is validated using blind simulation runs. The developed Data-driven proxy model is capable of accurately reproducing the calculated CO 2 injection, CO 2 /CH 4 production profiles, and CO 2 breakthrough time from the numerical simulation model, for each cluster/stage and horizontal lateral. Joint use of the deterministic reservoir model with the Data-driven proxy model can serve as a novel screening and optimization tool for the techno-economic evaluation of the CO 2 -Enhanced Gas Recovery (EGR) and Storage process in shale systems.