Counter-based algorithms for busy-wait barrier synchronization execute in time linear in the number of synchronizing processes. This time can be made logarithmic in the number of processes by adopting algorithms based on trees or FFT-like synchronization patterns. As an additional improvement, Gupta and Hill [5] have proposed an adaptive combining tree barrier that exploits non-uniformity in inter-barrier computation times: processes begin to leave the barrier in time logarithmic in the number of processes when all processes arrive at once, but in constant time after the arrival of the last process when arrival times are skewed. Building on earlier work [4], Gupta and Hill present both regular and fuzzy versions of their barrier. The fuzzy version allows a process to perform useful work between the point at which it notifies other processes of its arrival at the barrier and the point at which it waits for all other processes to arrive.Unfortunately, like many forms of busy-wait synchronization, adaptive combining tree barriers as originally devised can induce large amounts of memory and interconnect contention in shared-memory multiprocessors, seriously degrading performance. They also perform a comparatively large amount of work at every tree node, raising the possibility that the constant factors in their execution time may be unacceptably high on machines of reasonable size. To address these problems, we present a new adaptive combining tree barrier, with fuzzy variant, that achieves significant speed improvements by spinning only on locally-accessible locations, and by using atomic fetch_and_store operations to avoid explicit locking of tree nodes. We also present a version of this barrier (again with fuzzy variant) that employs breadth-first wakeup of processes to reduce context switching when processors are multiprogrammed. We compare the performance of these new algorithms to that of other fast barriers on a 64-node BBN Butterfly 1 multiprocessor and on a 35-node BBN TC2000. Results suggest that adaptation is of little benefit, but that the combination of fuzziness with tree-style synchronization is of significant practical importance: fuzzy combining tree barriers with local-only spinning outperform all known alternatives on the TC2000 when the amount of fuzzy computation exceeds about 10% of the time between barriers.