Traditional crossbar switches use centralized scheduling algorithms with high time complexity. In contrast, buffered crossbar switches are capable of distributed scheduling due to crosspoint buffers, which decouple the dependency between inputs and outputs. However, crosspoint buffers are expensive on-chip memories. To reduce the hardware cost of buffered crossbar switches and make them scalable, we consider partially-buffered crossbar switches, whose crosspoint buffers can be of an arbitrarily small size and store only part of a packet instead of the entire packet. In this paper, we propose the Packet-mode Asynchronous Scheduling Algorithm (PASA) for partially buffered crossbar switches. PASA combines the features of both distributed and centralized scheduling algorithms. It works in an asynchronous mode and can directly handle variable length packets without Segmentation And Reassembly (SAR). We theoretically prove that, with a speedup of two, PASA achieves 100% throughput for any admissible traffic. We also show that outputs in PASA have a large probability to avoid the more timeconsuming centralized scheduling process, and thus make fast scheduling decisions. Finally, we present simulation data to verify the analytical results and evaluate the performance of PASA.