In the field of education, the development of educational big data has become an important strategic choice to promote the construction of the digital campus and educational reform, and educational big data has become a new driving force in the field of education that cannot be ignored. Based on the theoretical basis of output-driven hypothesis neural network, and combining the media spanning of contemporary art and cross-media association effect, this study changes the status quo of English teaching through traditional methods such as grammar-translation method and deductive method and constructs a new cross-media university English teaching model. Based on the existing feature learning model of two-way attention, combined with existing techniques such as generative adversarial networks and semantic hashing, the semantic association between different media data is deeply mined, and feature learning is integrated with adversarial learning and hash learning to build a unified semantic space for different media data. In this paper, we focus on the structure and characteristics of convolutional neural networks through the study of deep learning theory, discuss three classical convolutional neural network models, such as AlexNet, VGG, and GoogLeNet, and propose a convolutional neural network model applicable to cross-media teaching in college English classroom and carry out experimental validation, and the results show that the proposed neural network model is based on output-driven hypothesis. The following research has been added to the abstract: to address the key problem of the semantic gap that is difficult to cross in cross-media semantic learning, a cross-media supervised adversarial hashing model based on two-way attentional features is proposed. Based on the existing two-way attention-based feature learning model, we combine existing techniques such as generative adversarial networks and semantic hashing to deeply explore the semantic association between different media data and integrate feature learning with adversarial learning and hashing to build a unified semantic space for different media data. The results show that the proposed neural network model of cross-media teaching in college English classrooms based on the output-driven hypothesis can not only promote the improvement of students’ English literacy skills but also have a certain promotion effect on their overall performance improvement.