This study develops an end‐to‐end deep learning framework to learn and analyze ground motions (GMs) through their latent features, and achieve reliable GM classification, selection, and generation of simulated motions. The framework is composed of an analysis workflow that transforms and reconstructs GMs through short‐time Fourier transform (STFT), encodes and decodes their latent features through convolutional variational autoencoder (CVAE), and classifies and generates GMs by grouping and interpolating latent variables. A benchmark study is established to confirm the minor difference between original GMs and the corresponding reconstructed accelerograms. The encoded latent space reveals that certain latent variables are directly linked to the dominant physical features of GMs. Resultantly, clustering latent variables using the k‐means algorithm successfully classifies GMs into different groups that vary in earthquake magnitude, soil type, field distance, and fault mechanism. By linearly interpolating two parent latent variables, simulated GMs are generated with consistent class information and matching response spectra. Furthermore, seismic fragility models are developed for a steel frame building and a concrete bridge using different sets of GMs. Using five classified, top‐ranked motions, regardless of recorded or simulated accelerograms, can achieve reasonable and efficient fragility estimates compared to the case that adopts 230 GMs. The proposed deep learning framework addresses two compelling questions regarding seismic fragility assessment: How many GMs are sufficient and what types of motions should be selected.