In this paper, we develop a low-cost cellular internet of medical things (IoMT)-based electrocardiogram (ECG) recorder for monitoring heart conditions and used in practical cases. In order to remove noise from signals recorded by these non-clinical devices, we propose a cloud-based denoising approach that focuses on utilizing deep neural network techniques in the time-frequency domain through the two stages. Accordingly, we exploit the fractional Stockwell transform (FrST) to transfer the ECG signal into the time-frequency domain and apply the deep robust two-stage network (DeepRTSNet) for noise cancellation. Due to the practical use case, the various heart physiologies and noise levels in different amplitudes and frequencies are needed to be robust against wide-range noises in actual conditions. We utilize the MIT-BIH Apnea-ECG database (APNEA-ECG) with several different heart physiologies. Next, the different noises consisting of muscle artifacts (MA), baseline wander (BW), and electrode motion (EM) from the MIT-BIH Noise Stress Test Database (NSTDB) and random noise, are added to the signals. The main focus of the noise generation part is the fast Fourier transformation (FFT) of the simulated noisy signal and the practical noisy signal has a maximum cross-correlation to gain a better morphological resemblance between realistic signals and the prepared datasets. Based on the results, DeepRTSNet outperforms prior learning-based methods and conventional non-learning approaches in terms of signal-to-noise ratio (SNR), root mean square error (RMSE), and percent root mean square difference (PRD). Moreover, outcomes reveal that DeepRTSNet has an extraordinary performance with a certain amount of further complexity than others.