One of the main reasons that hinders making software effort estimation remains a most of the unresolved problem due to the heterogeneous nature of software data with complex structures. In processing nonlinear data, the long short-term memory (LSTM) model is often used for the purpose of solving discriminative and generative problems. Taking into account the fact that the LSTM network have low computational efficiency, due to the need to set a large number of hyperparameters. However, training deep learning models requires expensive work in terms of specifying hyperparameter configurations in the model. The grid search (GS) optimization method is used to find the best hyperparameter values for deep learning networks, which have many different hyperparameters that affect how well the network architecture works. In this paper, we proposed the grid search method as a quick step toward optimizing the parameters of the LSTM model. An empirical study was conducted using five datasets. From our results, we have seen that the LSTM-grid search model consistently performs better across datasets in mean absolute error (MAE) and root mean squared error (RMSE) compare to existing work on using machine learning approach for software effort estimation. The main advantage of training the LSTM show that the grid search finds hyperparameters which results in faster convergent, the possibility of generalization, and better coefficient of determination in estimating software effort.