A distribution matcher (DM) maps a binary input sequence into a block of nonuniformly distributed symbols. To facilitate the implementation of shaped signaling, fast DM solutions with high throughput and low serialism are required. We propose a novel DM architecture with parallel amplitudes (PA-DM) for which m−1 component DMs, each with a different binary output alphabet, are operated in parallel in order to generate a shaped sequence with m amplitudes. With negligible rate loss compared to a single nonbinary DM, PA-DM has a parallelization factor that grows linearly with m, and the component DMs have reduced output lengths. For such binary-output DMs, a novel constant-composition DM (CCDM) algorithm based on subset ranking (SR) is proposed. We present SR-CCDM algorithms that are serial in the minimum number of occurrences of either binary symbol for mapping and fully parallel in demapping. For distributions that are optimized for the additive white Gaussian noise (AWGN) channel, we numerically show that PA-DM combined with SR-CCDM can reduce the number of sequential processing steps by more than an order of magnitude, while having a rate loss that is comparable to conventional nonbinary CCDM with arithmetic coding.