Automated recognition of fundamental heart sound segments (FHSS) from Phonocardiogram (PCG) is the preliminary step before clinical parameters extraction to detect the presence of abnormality if any. PCG acquisition systems are usually based on microphones. These microphones apart from cardiac sounds will also pick up non-cardiac sounds like lung sounds and speech. The recognition of FHSS is challenging in the presence of non-cardiac events. Deep learning techniques like convolutional neural network (CNN) and recurrent neural network (RNN) are suitable for automated FHSS. However, it will be shown that their performance is degraded in the presence of interference like lung sounds, and speech. Hence in this work, a combination of conventional signal processing technique with deep neural network (DNN) is proposed to enhance the accuracy of automated FHSS. The conventional signal processing technique is based on EWT which can adaptively design the filter banks based on the type of interference. For DNN, U-Net is considered. The method involves the segmentation of PCG using EWT and recognition of FHSS using U-Net based DNN. Envelope features are extracted from the EWT based reconstructed signal and used for training the U-Net based DNN to recognize FHSS. To further improve the recognition accuracy of FHSS, delineation parameters obtained from EWT are incorporated for temporal modeling with the outcomes of U-Net based DNN. The performance of the proposed method is analyzed using both real-time signals and signals taken from standard databases like the Physionet database, and Littmann's lung sound library. Realtime PCG is acquired using an in-house developed PCG acquisition system. The proposed U-Net based DNN with the EWT method achieves FHSS recognition accuracy of 91.17% for PCG with lung sound interference and 90.78% for PCG with speech interference. The proposed method significantly improves the accuracy of FHSS recognition compared to long short term memory (LSTM), and gated recurrent unit (GRU). INDEX TERMS Recognition, segmentation, fundamental heart sound, lung sound, speech, empirical wavelet transform (EWT), deep neural network (DNN).