Writing concurrent programs is a challenge because developers must consider both functional correctness and performance requirements. Numerous program analyses and testing techniques have been proposed to detect functional faults, e.g., caused by incorrect synchronization. However, little work has been done to help developers address performance problems in concurrent programs, e.g., because of inefficient synchronization. This paper presents SyncProf, a concurrency-focused profiling approach that helps in detecting, localizing, and optimizing synchronization bottlenecks. In contrast to traditional profilers, SyncProf repeatedly executes a program with various inputs and summarizes the observed performance behavior. A key novelty is a graph-based representation of relations between critical sections, which is the basis for computing the performance impact of critical sections, for identifying the root cause of a bottleneck, and for suggesting optimization strategies to the developer. We evaluate SyncProf on 19 versions of eight C/C++ projects with both known and previously unknown synchronization bottlenecks. The results show that SyncProf effectively localizes the root causes of these bottlenecks with higher precision than a state of the art lock contention profiler and that it suggests valuable strategies to avoid the bottlenecks.