The FE 2 computational homogenization method is a predictive multi-scale method without the need for constitutive assumptions and/or potential function postulates at the macro engineering scale. Instead, the effective micro-structural responses are extracted directly from a representative volume element (RVE) underlying each macro point. However, the FE 2 method is still computationally too expensive for most practical uses, since the micro-macro FE coupling is done at each loading step/iteration for the entire domain. To this end, the machine learning method has been utilized in the literature for the offline training of a surrogate model to predict the RVE homogenized response for general loading conditions. In this contribution, the neural network (NN) is incorporated into the macro finite element framework in a non-intrusive manner. This is termed as the FE-NN framework, in analogy to the FE 2 method. In general, online simulations in the FE-NN method is very efficient, with predictions matching closely to those obtained from reference direct numerical simulations (DNS). A bottleneck with the FE-NN framework, however, is the high computational cost associated with the data generation for offline NN model setup. In this paper, focusing on the FE-NN multi-scale framework for non-linear elastic deformation of heterogeneous materials, a sequential training strategy with knowledge transfer is proposed, to enable an efficient offline microscopic NN model setup.For a given target RVE, we first consider a simplified source RVE, where data can be generated rapidly, for the NN pre-training of surrogate model. The pre-trained network parameters are next downloaded to initialize the target NN surrogate model, followed by a fine-tuning training process, using only a small dataset generated by the computationally expensive high-fidelity RVE. The efficiency of the proposed sequential learning method over the conventional NN training, as well as, its excellent predictive capability for multi-scale analyses, are demonstrated for a multi-phase composite material. The proposed FE-NN-KT approach can be implemented easily without complicated pre-processing procedures, since the This is an open access article under the terms of the Creative Commons Attribution-NonCommercial-NoDerivs License, which permits use and distribution in any medium, provided the original work is properly cited, the use is non-commercial and no modifications or adaptations are made.