The estimation of average treatment effect (ATE) as a causal parameter is carried out in two steps, where in the first step, the treatment and outcome are modeled to incorporate the potential confounders, and in the second step, the predictions are inserted into the ATE estimators such as the augmented inverse probability weighting (AIPW) estimator. Due to the concerns regarding the non-linear or unknown relationships between confounders and the treatment and outcome, there has been interest in applying non-parametric methods such as machine learning (ML) algorithms instead. Some of the literature proposes to use two separate neural networks (NNs) where there is no regularization on the network’s parameters except the stochastic gradient descent (SGD) in the NN’s optimization. Our simulations indicate that the AIPW estimator suffers extensively if no regularization is utilized. We propose the normalization of AIPW (referred to as nAIPW) which can be helpful in some scenarios. nAIPW, provably, has the same properties as AIPW, that is, the double-robustness and orthogonality properties. Further, if the first-step algorithms converge fast enough, under regulatory conditions, nAIPW will be asymptotically normal. We also compare the performance of AIPW and nAIPW in terms of the bias and variance when small to moderate L1 regularization is imposed on the NNs.