For the benefit of designing scalable, fault resistant optical neural networks (ONNs), we investigate the effects architectural designs have on the ONNs' robustness to imprecise components. We train two ONNs -one with a more tunable design (GridNet) and one with better fault tolerance (FFTNet) -to classify handwritten digits. When simulated without any imperfections, GridNet yields a better accuracy (∼ 98%) than FFTNet (∼ 95%). However, under a small amount of error in their photonic components, the more fault tolerant FFTNet overtakes GridNet. We further provide thorough quantitative and qualitative analyses of ONNs' sensitivity to varying levels and types of imprecisions. Our results offer guidelines for the principled design of fault-tolerant ONNs as well as a foundation for further research.Second, rather than optimization towards a specific matrix, the linear operations learned for the classification task is not, a priori, known. As such, our primary figure of merit is the classification accuracy instead of the fidelity between the target unitary matrix and the one learned.Lastly, the aforementioned studies mainly focused on the optimization of the networks after fabrication. The imprecisions introduced generally reduced the expressivity of the network -how well the network can represent arbitrary transformations. Evaluation of this reduction in tunability and mitigating strategies were provided. However, such post-fabrication optimization requires the characterization of every MZI, the number of which scales with the dimension (N ) of the network as N 2 . Protocols for self configuration of imprecise photonic networks have been demonstrated [17,18]. While measurement of MZIs were not necessary in such protocols, each MZI needed to be configured progressively and sequentially. Thus, the same N 2 scaling problem remained. Furthermore, if multiple ONN devices are fabricated, each device, with unique imperfections, has to be optimized separately. The total computational power required, therefore, scales with the number of devices produced.In contrast, we consider the effects of imprecisions introduced after software training of ONNs (Code 1, Ref.[19]), details of which we present in Sec. 3. This pre-fabrication training is more scalable, both in network size and fabrication volume. An ideal ONN (i.e., one with no imprecisions) is trained in software only once and the parameters are transferred to multiple fabricated instances of the network with imprecise components. No subsequent characterization or tuning of devices are necessary. In addition to the benefit of better scalability, fabrication of static MZIs can be made more precise and cost effective compared to re-configurable ones.We evaluate the degradation of ONNs from their ideal performances with increasing imprecision. To understand how such effects can be minimized, we investigate the role that the architectural designs have on ONNs' sensitivity to imprecisions. The results are presented in Sec. 4.1. Specifically, we study the performance of two ONNs i...