Bowing gesture while playing violin refers to the motion of the violinist's arm. Violinists use different types of bow strokes to express musical phrases, played by the movement of the right arm holding the fiddle bow. Although the sound produced by each bow stroke is distinct, it can be difficult for new fiddlers to distinguish and recognize these bowing techniques. So, this paper presents a novel approach of an ensemble of multimodal deep learning models consisting of one Convolution Neural Network (CNN) and two Long Short-Term Memory (LSTM) models to classify into one of the five bowing classes: detaché , legato, martelé , spiccato and staccato. The dataset used consists of audio samples performed by 8 violinists along with the motion of their forearms measured using a Myo sensor device, to acquire 8-channels of Electromyogram (EMG) data and 13-channels of Inertial Measurement Unit (IMU) data. The audio features are extracted from audio excerpts and time domain features are extracted from EMG and IMU motion signals. These features are passed into an ensemble of deep learning models to make the final prediction using weighted voting. The proposed ensemble classifier was able to deliver optimal results with an overall accuracy of 99.5%, which is better than the previous studies that took only either audio or motion data into consideration.