The design of cyber-physical systems is a complex process and relies on the simulation of the system behavior before its deployment. Such is the case, for instance, of joint simulation of the different subsystems that constitute a hybrid automotive powertrain. Co-simulation allows system designers to simulate a whole system composed of a number of interconnected subsystem simulators. Traditionally, these subsystems are modeled by experts of different fields using different tools, and then integrated into one tool to perform simulation at the system-level. This results in complex and compute-intensive co-simulations and requires the parallelization of these co-simulations in order to accelerate their execution. The simulators composing a co-simulation are characterized by the rates of data exchange between the simulators, defined by the engineers who set the communication steps. The RCOSIM approach allows the parallelization on multi-core processors of co-simulations using the FMI standard. This approach is based on the exploitation of the co-simulation parallelism where dependent functions perform different computation tasks. In this paper, we extend RCOSIM to handle additional co-simulation requirements. First, we extend the co-simulation to multi-rate, i.e. where simulators are assigned different communication steps. We present graph transformation rules and an algorithm that allow the execution of each simulator at its respective rate while ensuring correct and efficient data exchange between simulators. Second, we present an approach based on acyclic orientation of mixed graphs for handling mutual exclusion constraints between functions that belong to the same simulator due to the non-thread-safe implementation of FMI. We propose an exact algorithm and a heuristic for performing the acyclic orientation. The final stage of the proposed approach consists in scheduling the co-simulation on a multi-core architecture. We propose an algorithm and a heuristic for computing a schedule which minimizes the total execution time of the co-simulation. We evaluate the performance of our proposed approach in terms of the obtained execution speed. By applying our approach on an industrial use case, we obtained a maximum speedup of 2.91 on four cores.