We propose a methodology for joint feature learning and clustering of multichannel extracellular electrophysiological data, across multiple recording periods for action potential detection and classification (sorting). Our methodology improves over the previous state of the art principally in four ways. First, via sharing information across channels, we can better distinguish between single-unit spikes and artifacts. Second, our proposed "focused mixture model" (FMM) deals with units appearing, disappearing, or reappearing over multiple recording days, an important consideration for any chronic experiment. Third, by jointly learning features and clusters, we improve performance over previous attempts that proceeded via a two-stage learning process. Fourth, by directly modeling spike rate, we improve the detection of sparsely firing neurons. Moreover, our Bayesian methodology seamlessly handles missing data. We present the state-of-the-art performance without requiring manually tuning hyperparameters, considering both a public dataset with partial ground truth and a new experimental dataset.