Many concurrent data-structure implementations -both blocking and non-blocking -use the well-known compare-and-swap (CAS) operation, supported in hardware by most modern multiprocessor architectures, for inter-thread synchronization. A key weakness of the CAS operation is its performance in the presence of memory contention. When multiple threads concurrently attempt to apply CAS operations to the same shared variable, at most a single thread will succeed in changing the shared variable's value and the CAS operations of all other threads will fail. Moreover, significant degradation in performance occurs when variables manipulated by CAS become contention 'hot spots', because failed CAS operations congest the interconnect and memory devices and slow down successful CAS operations. In this work, we study the following question: can software-based contention management improve the efficiency of hardware-provided CAS operations? In other words, can a software contention management layer, encapsulating invocations of hardware CAS instructions, improve the performance of CAS-based concurrent data structures? To address this question, we conduct what is, to the best of our knowledge, the first study on the impact of contention management algorithms on the efficiency of the CAS operation. We implemented several Java classes, that extend Java's AtomicReference class, and encapsulate calls to the native CAS instruction with simple contention management mechanisms tuned for different hardware platforms. A key property of our algorithms is the support for an almost-transparent interchange with Java's AtomicReference objects, used in implementations of concurrent data structures. We evaluate the impact of these algorithms on both a synthetic micro-benchmark and on CAS-based concurrent implementations of widely-used data structures such as stacks and queues. Our performance evaluation establishes that lightweight software-based contention management support can greatly improve performance under medium and high contention levels while typically incurring only small overhead under low contention. In some cases, applying efficient contention management for CAS operations used by a simpler data-structure implementation yields better results than highly optimized implementations of the same data structure that use native CAS operations directly.