The rapid increase in nontechnical loss (NTL) has become a principal concern for distribution system operators (DSOs) over the years. Electricity theft makes up a major part of NTL. It causes losses for the DSOs and also deteriorates the quality of electricity. The introduction of advanced metering infrastructure along with the upgradation of the traditional grids to the smart grids (SGs) has helped the electric utilities to collect the electricity consumption (EC) readings of consumers, which further empowers the machine learning (ML) algorithms to be exploited for efficient electricity theft detection (ETD). However, there are still some shortcomings, such as class imbalance, curse of dimensionality, and bypassing the automated tuning of hyperparameters in the existing ML‐based theft classification schemes that limit their performances. Therefore, it is essential to develop a novel approach to deal with these problems and efficiently detect electricity theft in SGs. Using the salp swarm algorithm (SSA), gate convolutional autoencoder (GCAE), and cost‐sensitive learning and long short‐term memory (CSLSTM), an effective ETD model named SSA–GCAE–CSLSTM is proposed in this work. Furthermore, a hybrid GCAE model is developed via the combination of gated recurrent unit and convolutional autoencoder. The proposed model comprises five submodules: (1) data preparation, (2) data balancing, (3) dimensionality reduction, (4) hyperparameters' optimization, and (5) electricity theft classification. The real‐time EC data provided by the state grid corporation of China are used for performance evaluations via extensive simulations. The proposed model is compared with two basic models, CSLSTM and GCAE–CSLSTM, along with seven benchmarks, support vector machine, decision tree, extra trees, random forest, adaptive boosting, extreme gradient boosting, and convolutional neural network. The results exhibit that SSA–GCAE–CSLSTM yields 99.45% precision, 95.93% F1 score, 92.25% accuracy, and 71.13% area under the receiver operating characteristic curve score, and surpasses the other models in terms of ETD.