Ultrasonic sensors are inexpensive and can take measurements with high accuracy, even with simple hardware configurations, so they are used in various fields. When multiple ultrasonic sensors exist in the measurement space, crosstalk occurs due to other nodes, which causes measurement errors. Crosstalk includes not only receiving homogeneous signals from other nodes, but also overlap by other signals, and interference by heterogeneous signals. This paper proposes a method that uses frequency sweep keying modulation to be robust against overlap and a faster region-based convolutional neural networks (R-CNN)-based demodulator to reduce the interference caused by heterogeneous signals. The demodulator works by training faster R-CNN with the spectrograms of various received signals and classifying the received signals using a faster R-CNN. Experiments implementing an ultrasonic crosstalk environment showed that compared to on-off keying (OOK), phase-shift keying (PSK), and frequency-shift keying (FSK), the proposed method can implement CDMA even with shorter codes, and is robust against overlap. Compared to correlation-based frequency sweep keying, the time-of-flight error was reduced by approximately 75%. While existing demodulators did not consider heterogeneous signals, the proposed method ignored approximately 99% of the OOK and PSK signals, and approximately 79% of the FSK signals. The proposed method performs better than the existing methods and is expected to be used in various applications.