In memory hierarchies, caches perform an important role in reducing average memory access latency. Minimizing cache misses can yield significant performance gains. As set-associative caches are widely used in modern architectures, capacity and conflict cache misses co-exist. These two types of cache misses require different optimization strategies. While cache misses are commonly studied using cache simulators, state-of-the-art simulators usually incur hundreds to thousands of times a program's execution runtime. Moreover, a simulator has difficulty in simulating complex real hardware. To overcome these limitations, measurement methods are proposed to directly monitor program execution on real hardware via performance monitoring units. However, existing measurement-based tools either focus on capacity cache misses or do not distinguish capacity and conflict cache misses. In this paper, we design and implement CCProf, a lightweight measurement-based profiler that identifies conflict cache misses and associates them with program source code and data structures. CCProf incurs moderate runtime overhead that is at least an order of magnitude lower than simulators. With the evaluation on a number of representative programs, CCProf is able to guide optimizations on cache conflict misses and obtain nontrivial speedups.