Existing specific emitter identification (SEI) methods based on hand-crafted features have drawbacks of losing feature information and involving multiple processing stages, which reduce the identification accuracy of emitters and complicate the procedures of identification. In this paper, we propose a deep SEI approach via multidimensional feature extraction for radio frequency fingerprints (RFFs), namely, RFFsNet-SEI. Particularly, we extract multidimensional physical RFFs from the received signal by virtue of variational mode decomposition (VMD) and Hilbert transform (HT). The physical RFFs and I-Q data are formed into the balanced-RFFs, which are then used to train RFFsNet-SEI. As introducing model-aided RFFs into neural network, the hybrid-driven scheme including physical features and I-Q data is constructed. It improves physical interpretability of RFFsNet-SEI. Meanwhile, since RFFsNet-SEI identifies individual of emitters from received raw data in end-to-end, it accelerates SEI implementation and simplifies procedures of identification. Moreover, as the temporal features and spectral features of the received signal are both extracted by RFFsNet-SEI, identification accuracy is improved. Finally, we compare RFFsNet-SEI with the counterparts in terms of identification accuracy, computational complexity, and prediction speed. Experimental results illustrate that the proposed method outperforms the counterparts on the basis of simulation dataset and real dataset collected in the anechoic chamber.