Abstract. This work presents a method for the extraction of instrumental controls during guitar performances. The method is based on the analysis of multimodal data consisting of a combination of motion capture, audio analysis and musical score. High speed video cameras based on marker identification are used to track the position of finger bones and articulations and audio is recorded with a transducer measuring vibration on the guitar body. The extracted parameters are divided into left hand controls, i.e. fingering (which string and fret is pressed with a left hand finger) and right hand controls, i.e. the plucked string, the plucking finger and the characteristics of the pluck (position, velocity and angles with respect to the string). Controls are estimated based on probability functions of low level features, namely, the plucking instants (i.e. note onsets), the pitch and the distances of the fingers (both hands) to strings and frets. Note onsets are detected via audio analysis, the pitch is extracted from the score and distances are computed from 3D Euclidean Geometry. Results show that by combination of multimodal information, it is possible to estimate such a comprehensive set of control features, with special high performance for the fingering and plucked string estimation. Regarding the plucking finger and the pluck characteristics, their accuracy gets lower but improvements are foreseen including a hand model and the use of high-speed cameras for calibration and evaluation.