Optical neural networks are emerging as a promising type of machine learning hardware capable of energy-efficient, parallel computation. Today's optical neural networks are mainly developed to perform optical inference after in silico training on digital simulators. However, various physical imperfections that cannot be accurately modelled may lead to the notorious "reality gap" between the digital simulator and the physical system. To address this challenge, we demonstrate hybrid training of optical neural networks where the weight matrix is trained with neuron activation functions computed optically via forward propagation through the network. We examine the efficacy of hybrid training with three different networks: an optical linear classifier, a hybrid opto-electronic network, and a complex-valued optical network. We perform a comparative study to in silico training, and our results show that hybrid training is robust against different kinds of static noise. Our platform-agnostic hybrid training scheme can be applied to a wide variety of optical neural networks, and this work paves the way towards advanced all-optical training in machine intelligence.