Data‐driven deep learning is effective in diagnosing known faults, but not so well when new or unknown faults occur. With unknown faults as a zero‐shot learning problem, this article proposes a method for detecting and isolating unknown faults based on knowledge distillation within a teacher–student framework. Process data and image data are equivalent in their spatiotemporal dimensions, and convolutional neural networks are selected as the teacher model, pretrained on image data. Information under both normal and fault conditions is then effectively extracted from process data by the well‐trained teacher model. Subsequently, knowledge distillation is used to transfer only the data of normal conditions in the teacher model to the student model. When an unknown fault arises, there exist differences between the information extracted by the teacher model and the student model. Contributions of variables to faults are calculated by quantifying these differences through gradients, thereby isolating the unknown fault. Finally, compared with a series of baseline methods and two state‐of‐the‐art methods, the proposed method improves fault diagnosis accuracy by 3.08% to 26.13% in the Tennessee Eastman process and by 3.48% to 41.45% in the sour water treatment process. Additionally, the physical consistency of fault isolation is visually assessed.