Parallelization is crucial for efficient execution of large-scale network simulation. Today's computing clusters commonly used for that purpose are built from a large amount of multiprocessor machines. The traditional approach to utilize all CPU cores in such a system is to partition the network and distribute the partitions to the cores. This, however, does not incorporate the presence of shared memory into the design, such that messages between partitions on the same computing node have to be serialized and synchronization becomes more complex. In this paper, we present an approach that combines the shared-memory parallelization scheme Horizon [9] with the standard approach to distributed simulation to leverage the strengths of today's computing clusters. To further reduce the synchronization overhead, we introduce a novel synchronization algorithm that takes domain knowledge into account to reduce the number of synchronization points. In a case study with a UMTS LTE model, we show that both contributions combined enable much higher scalability achieving almost linear speedup when simulating 1,536 LTE cells on 1,536 CPU cores.