Training deep neural networks is a relevant problem with open questions related to convergence and quality of learned representations. Gradient-based optimization methods are used in practice, but cases of failure and success are still to be investigated. In this context, we set out to better understand the convergence properties of different optimization strategies, under different parameter options. Our results show that (i) feature embeddings are impacted by different optimization settings, (ii) suboptimal results are achieved by the use of default parameters, (iii) significant improvement is obtained by making educated choices of parameters, (iv) learning rate decay should always be considered. Such findings offer guidelines for training and deployment of deep networks.