This paper presents a novel method for transcription of folk music that exploits its specifics to improve transcription accuracy. In contrast to most commercial music, folk music recordings may contain various inaccuracies as they are usually performed by amateur musicians and recorded in the field. If we use standard approaches for transcription, these inaccuracies are reflected in erroneous pitch estimates. On the other hand, the structure of western folk music is usually simple as songs are often composed of repeated melodic parts. In our approach we make use of these repetitions to increase transcription robustness and improve its accuracy. The proposed method fuses three sources of information: (1) frame-based multiple F0 estimates, (2) song structure, and (3) pitch drift estimates. It first selects a representative segment of the analyzed song and aligns all the other segments to it considering temporal as well as frequency deviations. Information from all segments is summarized and used in a two-layer probabilistic model based on explicit duration HMMs, to segment frame-based information into notes. The method is evaluated with state-of-the-art transcription methods where we show that significant improvement in accuracy can be achieved.