Background: As an efficient method, heart sounds (HSs) analysis by classifying the features extracted from the four-stage sequence consisting of the first heart sound (S1), second heart sound (S2), duration from S1 to S2 and duration from S2 to S1, has been widely used to diagnose heart disease and evaluate heart functions. However, the feature is difficult to be extracted with high accuracy due to the four stages segmented from HSs with low accuracy; the fixed classifiers achieved by training the features from the old samples cannot better fit the features from the new ones because they are not adjusted with the incremental features. Thus, a novel intelligent diagnostic system, the innovations of which are primarily reflected in the automatic feature extraction and adjustable classifier models, is proposed to realize the diagnosis of heart diseases with higher accuracy. Methods: The three stages of the proposed system are summarized as follows. In stage 1, the short time modified Hilbert transform (STMHT)-based curve is used to segment and extract the first complex sound (CS1) and second complex sound (CS2). In stage 2, the envelopes CS1FE and CS2FE for periods CS1 and CS2 are obtained via a novel method, and the frequency features are automatically extracted from CS1FE and CS2FE by setting different threshold value (Thv) lines. Finally, the principal component analysis (PCA)-based first three principal components γ1, γ2, and γ3 are determined as the diagnostic features. In stage 3, a Gaussian mixture model (GMM)-based objective function fet(x) is generated. Then, the χ2 distribution for component k is determined by calculating the Mahalanobis distance from x to the class mean µk for component k, and the confidence region of component k is determined by adjusting the optimal confidence level βk and used as the criterion to diagnose HSs. Results: The performance evaluation was validated by sounds from online HS databases and clinical heart databases. The accuracy of the proposed method was compared to the accuracies of other well-known classifiers, and the highest classification accuracies of 99.43%, 98.93%, 99.13%, 99.85%, 98.62%, 99.67% and 99.91% in the detection of MR, MS, ASD, NM, AS, AR and VSD sounds were achieved by setting βk(k = 1,2,...,7) to 0.87,0.65,0.67,0.65,0.67,0.79 and 0.87, respectively. Conclusions: This proposed intelligent diagnosis system provides an efficient way to diagnose seven types of heart diseases. In addition, methods to manage the sounds (such as some ASD sounds) when CS1 and CS2 cannot be segmented and extracted via the STMHT method will be explored in the future to characterize the physical meanings of the frequency components and to build a model of the secondary curve in the frequency domain.