One of the basic conditions for the successful implementation of energy demand-side management (EDM) in smart grids is the monitoring of different loads with an electrical load monitoring system. Energy and sustainability concerns present a multitude of issues that can be addressed using approaches of data mining and machine learning. However, resolving such problems due to the lack of publicly available datasets is cumbersome. In this study, we first designed an efficient energy disaggregation (ED) model and evaluated it on the basis of publicly available benchmark data from the Residential Energy Disaggregation Dataset (REDD), and then we aimed to advance ED research in smart grids using the Turkey Electrical Appliances Dataset (TEAD) containing household electricity usage data. In addition, the TEAD was evaluated using the proposed ED model tested with benchmark REDD data. The Internet of things (IoT) architecture with sensors and Node-Red software installations were established to collect data in the research. In the context of smart metering, a nonintrusive load monitoring (NILM) model was designed to classify household appliances according to TEAD data. A highly accurate supervised ED is introduced, which was designed to raise awareness to customers and generate feedback by demand without the need for smart sensors. It is also cost-effective, maintainable, and easy to install, it does not require much space, and it can be trained to monitor multiple devices. We propose an efficient BERT-NILM tuned by new adaptive gradient descent with exponential long-term memory (Adax), using a deep learning (DL) architecture based on bidirectional encoder representations from transformers (BERT). In this paper, an improved training function was designed specifically for tuning of NILM neural networks. We adapted the Adax optimization technique to the ED field and learned the sequence-to-sequence patterns. With the updated training function, BERT-NILM outperformed state-of-the-art adaptive moment estimation (Adam) optimization across various metrics on REDD datasets; lastly, we evaluated the TEAD dataset using BERT-NILM training.