(1) Background: The stethoscope is one of the main accessory tools in the diagnosis of temporomandibular joint disorders (TMD). However, the clinical auscultation of the masticatory system still lacks computer-aided support, which would decrease the time needed for each diagnosis. This can be achieved with digital signal processing and classification algorithms. The segmentation of acoustic signals is usually the first step in many sound processing methodologies. We postulate that it is possible to implement the automatic segmentation of the acoustic signals of the temporomandibular joint (TMJ), which can contribute to the development of advanced TMD classification algorithms. (2) Methods: In this paper, we compare two different methods for the segmentation of TMJ sounds which are used in diagnosis of the masticatory system. The first method is based solely on digital signal processing (DSP) and includes filtering and envelope calculation. The second method takes advantage of a deep learning approach established on a U-Net neural network, combined with long short-term memory (LSTM) architecture. (3) Results: Both developed methods were validated against our own TMJ sound database created from the signals recorded with an electronic stethoscope during a clinical diagnostic trail of TMJ. The Dice score of the DSP method was 0.86 and the sensitivity was 0.91; for the deep learning approach, Dice score was 0.85 and there was a sensitivity of 0.98. (4) Conclusions: The presented results indicate that with the use of signal processing and deep learning, it is possible to automatically segment the TMJ sounds into sections of diagnostic value. Such methods can provide representative data for the development of TMD classification algorithms.