For the Remotely Piloted Aircraft Systems (RPAS) market to continue its current growth rate, cost-effective "Detect and Avoid" systems that enable safe beyond visual line of sight (BVLOS) operations are critical. We propose an audio-based "Detect and Avoid" system, composed of microphones and an embedded computer, which performs real-time inferences using a sound event detection (SED) deep learning model. Two state-of-the-art SED models, YAMNet and VGGish, are fine-tuned using our dataset of aircraft sounds and their performances are compared for a wide range of configurations. YAMNet, whose MobileNet architecture is designed for embedded applications, outperformed VGGish both in terms of aircraft detection and computational performance. YAMNet's optimal configuration, with > 70% true positive rate and precision, results from combining data augmentation and undersampling with the highest available inference frequency (i.e. 10 Hz). While our proposed "Detect and Avoid" system already allows the detection of small aircraft from sound in real time, additional testing using multiple aircraft types is required. Finally, a larger training dataset, sensor fusion, or remote computations on cloud-based services could further improve system performance.