The recording device along with the acoustic environment plays a major role in digital audio forensics. We propose an acoustic source identification system in this paper, which includes identifying both the recording device and the environment in which it was recorded. A hybrid Convolutional Neural Network (CNN) with Long Short-Term Memory (LSTM) is used in this study to automatically extract environments and microphone features from the speech sound. In the experiments, we investigated the effect of using the voiced and unvoiced segments of speech on the accuracy of the environment and microphone classification. We also studied the effect of background noise on microphone classification in 3 different environments, i.e., very quiet, quiet, and noisy. The proposed system utilizes a subset of the KSU-DB corpus containing 3 environments, 4 classes of recording devices, 136 speakers (68 males and 68 females), and 3600 recordings of words, sentences, and continuous speech. This research combines the advantages of both CNN and RNN (in particular bidirectional LSTM) models, called CRNN. The speech signals were represented as a spectrogram and were fed to the CRNN model as 2D images. The proposed method achieved accuracies of 98% and 98.57% for environment and microphone classification, respectively, using unvoiced speech segments.