GPS loggers and cameras aboard connected vehicles can produce vast amounts of data. Analysts can mine such data to decipher patterns in vehicle trajectories and driver–vehicle interactions. Ability to process such large-scale data in real time can inform strategies to reduce crashes, improve traffic flow, enhance system operational efficiencies, and reduce environmental impacts. However, connected vehicle technologies are in the very early phases of deployment. Therefore, related datasets are extremely scarce, and the utility of such emerging datasets is largely unknown. This paper provides a comprehensive review of studies that used large-scale connected vehicle data from the United States Department of Transportation Connected Vehicle Safety Pilot Model Deployment program. It is the first and only such dataset available to the public. The data contains real-world information about the operation of connected vehicles that organizations are testing. The paper provides a summary of the available datasets and their organization, and the overall structure and other characteristics of the data captured during pilot deployments. Usage of the data is then classified into three categories: driving pattern identification, development of surrogate safety measures, and improvements in the operation of signalized intersections. Finally, some limitations experienced with the existing datasets are identified.