The prevailing use of multicores in Embedded Critical Systems (ECS) is multi-application workloads in which independent applications run in different cores with data sharing restricted to the communication between applications and the real-time operating system. However, thread-level parallelism is increasingly used, e.g., OpenMP, in ECS to improve individual applications' performance. At the hardware level, we are witnessing increased research efforts to master and improve multicore cache coherence that plays a key role enabling efficient data sharing among threads. Despite these efforts, the limited information provided by performance monitoring counters on cache coherence limits the understanding of coherence's impact on tasks execution time and hence, poses severe constraints to estimate tight worst-case execution time bounds. In this line, this work contributes with an analysis of the impact that cache coherence can have on application timing behavior, and a new set of low-overhead performance monitoring counters that can be used to track the coherence-related contention that different threads can cause on each other when sharing data. Our results show that the proposed performance monitoring counters effectively capture all coherence-related contention that tasks can suffer and hence are key for parallel software timing validation and verification in ECS. Furthermore, they help application optimization by providing key information about data sharing among the application threads.