In order to investigate the heterogeneity in merging behaviors on freeways, a novel data mining tool, called two-step cluster analysis, is applied to the merging maneuvers (namely, initial speed, merging speed, and merging position). Merging maneuvers of 370 drivers collected from the NGSIM dataset are automatically and optimally segmented into four clusters (Early Merging Drivers at High Speed, Early Merging Drivers at Low Speed, Late Merging Drivers at Low Speed, and Late Merging Drivers at High Speed) by the two-step cluster analysis. Hypothesis test confirms the significant differences in merging maneuvers between different clusters. The clustered data are used to find the best corresponding fitting distributions. Seven distributions (Normal, Log-normal, Student's , Logistic, Log-Logistic, Gamma, and Weibull) are considered for each cluster and the Kolmogorov-Smirnov test statics are used to select the best fitted distributions. It is found that merging drivers may merge either early or late, under congestion or uncongested traffic condition. Further analysis of merging durations shows that Late Merging Drivers use significantly shorter time than Early Merging Drivers to finish the merging maneuver, no matter if they are at high or at low speed. Hypothesis test of accepted lead gaps and lag gaps indicate that merging drivers are more sensitive to the lag gaps under congestion. The proposed method can automatically identify the heterogeneity in merging drivers and the results obtained in this paper can be used to enhance the accuracy of the merge behavior models in microscopic simulation software.