The intensification of population aging has brought pressure on public medical care. In order to reduce this pressure, we combined the image classification method with computer vision and used audio data that is easy to collect in nursing homes. Based on MelGAN, transfer learning, and Vision Transformer, we propose an application called Risevi (A Disease Risk Prediction Model Based on Vision Transformer), a disease risk prediction model for nursing homes. We first design a sample generation method based on MelGAN, then refer to the Mel frequency cepstral coefficient and the Wav2vec2 model to design the sample feature extraction method, perform floating-point operations on the tensor of the extracted features, and then convert it into a waveform. We then design a sample feature classification method based on transfer learning and Vision Transformer. Finally, we obtain the Risevi model. In this paper, we use public datasets and subject data as sample data. The experimental results show that the Risevi model has achieved an accuracy rate of 98.5%, a precision rate of 96.38%, a recall rate of 98.17%, and an F1 score of 97.15%. The experimental results show that the Risevi model can provide practical support for reducing public medical pressure.