We motivate and present biVI, which combines the variational autoencoder framework of scVI with biophysically motivated, bivariate models for nascent and mature RNA distributions. In simulated benchmarking, biVI accurately recapitulates key properties of interest, including cell type structure, parameter values, and copy number distributions. In biological datasets, biVI provides a route for the identification of the biophysical mechanisms underlying differential expression. The analytical approach outlines a generalizable strategy for representing multimodal datasets generated by single-cell RNA sequencing.