This paper describes the use of data analytics tools for predicting the fatigue strength of steels. Several physics-based as well as data-driven approaches have been used to arrive at correlations between various properties of alloys and their compositions and manufacturing process parameters. Data-driven approaches are of significant interest to materials engineers especially in arriving at extreme value properties such as cyclic fatigue, where the current state-of-the-art physics based models have severe limitations. Unfortunately, there is limited amount of documented success in these efforts. In this paper, we explore the application of different data science techniques, including feature selection and predictive modeling, to the fatigue properties of steels, utilizing the data from the National Institute for Material Science (NIMS) public domain database, and present a systematic end-to-end framework for exploring materials informatics. Results demonstrate that several advanced data analytics techniques such as neural networks, decision trees, and multivariate polynomial regression can achieve significant improvement in the prediction accuracy over previous efforts, with R 2 values over 0.97. The results have successfully demonstrated the utility of such data mining tools for ranking the composition and process parameters in the order of their potential for predicting fatigue strength of steels, and actually develop predictive models for the same.
Identification of pulmonary diseases comprises of accurate auscultation as well as elaborate and expensive pulmonary function tests. Prior arts have shown that pulmonary diseases lead to abnormal lung sounds such as wheezes and crackles. This paper introduces novel spectral and spectrogram features, which are further refined by Maximal Information Coefficient, leading to the classification of healthy and abnormal lung sounds. A balanced lung sound dataset, consisting of publicly available data and data collected with a low-cost in-house digital stethoscope are used. The performance of the classifier is validated over several randomly selected non-overlapping training and validation samples and tested on separate subjects for two separate test cases: (a) overlapping and (b) non-overlapping data sources in training and testing. The results reveal that the proposed method sustains an accuracy of 80% even for non-overlapping data sources in training and testing.
Analysis of heart sounds is a popular research area for non invasive identification of several heart diseases. This paper proposes a set of 88 time-frequency features along with five different methodologies for classifying normal and abnormal heart sounds. State of the art approach was applied for segregating the fundamental heart sounds. Apart from a baseline two class classifier, separate classifiers for long and short heart sounds were also explored in order to get rid of the dependency of features on the duration of the recordings. Finally, a three class classifier was explored to deal with the noisy data present in the dataset. Both balanced and unbalanced sets were considered for crating of the training models. A comparative analysis showed that, out of all the methodologies, the three class classifier based approach produces the most optimum performance by simultaneously yielding high values of both sensitivity and specificity.
Phonocardiogram (PCG) or auscultation via a stethoscope forms the basis of preliminary medical screening. But PCG recorded in an uncontrolled environment is inherently noisy. In this paper we have derived novel features from the spectral domain and autocorrelation waveforms. These are used to identify the quality of a PCG recording and accepting only diagnosable quality recordings for further analysis. These features proved to be robust irrespective of variations in devices and in data collection protocols employed to ensure consistent data quality. A freely available, large, diverse, medical-grade PCG dataset was used for creating the training models. Results show that the proposed methodology yields an accuracy score of ~75% on our in-house PCG dataset, collected using a low-cost smartphone-based digital stethoscope.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.