This work presents an investigation on how to define Neural Networks (NN) architectures adopting a data-driven approach using clustering to create sub-labels to facilitate the learning process and to discover the number of neurons needed to compose the layers. We also increase the depth of the model aiming to represent the samples better, the more in-depth it flows into the model. We hypothesize that the clustering process identifies sub-regions in the feature space in which the samples belonging to the same cluster have strong similarities. We used seven benchmark datasets to validate our hypothesis using 10-fold cross validation 3 times. The proposed model increased the performance, while never decreased it, with statistical significance considering the p-value $< 0.05$ in comparison with a Multi-Layer Perceptron with a single hidden layer with approximately the same number of parameters of the architectures found by our approach.