This paper addresses the separation of audio sources from convolutive mixtures captured by a microphone array. We approach the problem using complex-valued non-negative matrix factorization (CNMF), and extend previous works by tailoring advanced (single-channel) NMF models, such as the deconvolutive NMF, to the multichannel factorization setup. Further, a sparsitypromoting scheme is proposed so that the underlying estimated parameters better fit the time-frequency properties inherent in some audio sources. The proposed parameter estimation framework is compatible with previous related works, and can be thought of as a step toward a more general method. We evaluate the resulting separation accuracy using a simulated acoustic scenario, and the tests confirm that the proposed algorithm provides superior separation quality when compared to a stateof-the-art benchmark. Finally, an analysis of the effects of the introduced regularization term shows that the solution is in fact steered toward a sparser representation.