“…As described before, the dataset types that have been used in Indonesian HSAL detection research include text, speech, and images. However, for the speech dataset, [18] , [33] , and [50] do not use audio preprocessing (e.g., compression, filtering, panning and lengthening, noise cancellation, etc.) and directly conducting a features extraction process.…”