Software repositories provide rich information about the construction and evolution of software systems. While static data that can be mined directly from version control systems has been extensively studied, dynamic metrics concerning the execution of the software have received much less attention, due to the inherent difficulty of running and monitoring a large number of software versions.In this paper, we present Covrig, a flexible infrastructure that can be used to run each version of a system in isolation and collect static and dynamic software metrics, using a lightweight virtual machine environment that can be deployed on a cluster of local or cloud machines.We use Covrig to conduct an empirical study examining how code and tests co-evolve in six popular open-source systems. We report the main characteristics of software patches, analyse the evolution of program and patch coverage, assess the impact of nondeterminism on the execution of test suites, and investigate whether the coverage of code containing bugs and bug fixes is higher than average.