We introduce a theory-driven mechanism for learning a neural network model that performs generative topology design in one shot given a problem setting, circumventing the conventional iterative process that computational design tasks usually entail. The proposed mechanism can lead to machines that quickly response to new design requirements based on its knowledge accumulated through past experiences of design generation. Achieving such a mechanism through supervised learning would require an impractically large amount of problem-solution pairs for training, due to the known limitation of deep neural networks in knowledge generalization. To this end, we introduce an interaction between a student (the neural network) and a teacher (the optimality conditions underlying topology optimization): The student learns from existing data and is tested on unseen problems. Deviation of the student's solutions from the optimality conditions is quantified, and used for choosing new data points to learn from. We call this learning mechanism "theory-driven", as it explicitly uses domain-specific theories to guide the learning, thus distinguishing itself from purely data-driven supervised learning. We show through a compliance minimization problem that the proposed learning mechanism leads to topology generation with near-optimal structural compliance, much improved from standard supervised learning under the same computational budget. example, the design of vehicle body-in-white is often done by experienced structure engineers, since topology optimization (TO) on full-scale crash simulation is not yet fast enough to respond to requests from higher-level design tasks, e.g., geometry design with style and aerodynamic considerations, and thus may slow down the entire design process 1 .Research exists in developing deep neural network models that learn to create structured solutions in a one-shot fashion, circumventing the need of iterations (e.g., in solving systems of equations [1], simulating dynamical systems [2], or searching for optimal solutions [3,4,5]). Learning of such models through data, however, is often criticized to have limited generalization capability, especially when highly nonlinear input-output relations or highdimensional output spaces exist [6,7,8]. In the context of TO, this means that the network may create structures with unreasonably poor physical properties when it responds to new problem settings. More concretely, consider a topology with a tiny crack in one of its trusses. This design would be far from optimal if the goal is to lower compliance, yet standard datadriven learning mechanisms do not prevent this from happening, i.e., they don't know that they don't know (physics).Our goal is to create a learning mechanism that knows what it does not know, and thus can self-improve in an effective way. Specifically, we are curious about how physicsbased knowledge, e.g., in the forms of dynamical models, theoretical bounds, and optimality conditions, can be directly injected into the learning of networks that ...