Getting calls for ransoms are common phenomena in kidnapping and abduction related incidents where the life of the victim remains extremely vulnerable. These phone calls are often analyzed in real-time by law enforcement authorities to quickly identify the suspects and get crucial information for quick action. However, it is often difficult to manually analyze those phone calls due to the quality of sounds and the presence of several background noises. Even with much high-end software in their inventory, it is futile to accurately refine the incoming calls as it takes a huge amount of time to declutter the different layers of noises in the call. This paper proposes a model based on deep convolutional neural network and signal processing for automatic classification of crucial sounds in ransom related phone calls. We have proposed LSTM and 2D CNN customized models and compared their outputs with VGG16 and AlexNet. Moreover, this paper also presents a unique dataset of different sounds in terms of voices like male or female and the environmental sounds where the victim might be in which can be a probable clue for investigation purposes consisting of 17650 audio clips collected from verified online sources. Finally, the models produced very high classification accuracy with the accuracy of LSTM reaching around 93.4%.