With the development of storage, transmission, editing, and sharing tools, digital forgery images are propagating rapidly. The need for image provenance analysis has never been more timely. Typical applications are content tracking, copyright enforcement, and forensics reasoning. However, large-scale image provenance datasets, which contain diverse manipulation history graphs with various manipulation operations and rich metadata, are still needed to facilitate the research. It is one of the major factors that hinders the development of techniques for image provenance analysis. To address this issue, we introduce large-scale benchmark datasets for provenance analysis, namely Media Forensics Challenge-Provenance (MFC-Prov) datasets. Two provenance tasks are designed along with evaluation metrics. Furthermore, extensive analysis is conducted for system performance in terms of accuracy on our datasets.