Data-driven machine learning is widely employed in the analysis of materials structure-activity relationship, performance optimization and materials design due to its superior ability to reveal latent data patterns and make accurate prediction. However, because of the laborious process of materials data acquisition, machine learning models encounter the issue of the mismatch between high dimension of feature space and small sample size (for traditional machine learning models) or the mismatch between model parameters and sample size (for deep learning models), usually resulting in terrible performance. Here, we review the efforts for tackling this issue via feature reduction, sample augmentation, and specific machine learning approaches and show that the balance between the number of samples and features or model parameters should attract great attention during data quantity governance. Following this, we propose a synergistic data quantity governance flow with incorporation of materials domain knowledge. After summarizing the approaches to incorporating materials domain knowledge into the process of machine learning, we provide examples of incorporating domain knowledge into governance schemes to demonstrate the advantages of the approach and applications. The work paves the way for obtaining the required high-quality data to accelerate the materials design and discovery based on machine learning.
The accurate extraction of an aquaculture area is significant in aquaculture management, post disaster evaluation, and aquatic environment protection. However, little attention has been paid to the aquaculture area extraction in coastal water with high turbidity. In this study, based on the spectral and geospatial features of aquaculture cages in complex coastal water with varying turbidity, we proposed a new aquaculture area extraction method using a Gaofen-2(GF-2) satellite image with 0.8m spatial resolution. The water was classified into clear, medium, and high turbidity categories according to the suspended sediment concentration (SSC) derived from the inversion of the GF-2 image. Different rules of extraction were developed with respect to those three categories of water body: (1) The Normalized Difference Water Index(NDWI) threshold was set for the clear water, (2) a ratio index (R=Green/NIR) was established for the medium turbid water body, and (3) for the turbid water body, feature analysis with a specified classification rule was established. The experimental results demonstrated that our proposed method worked well, with the high accuracies of 87.3300% for the overall accuracy, even for the high turbidity water. The Kappa coefficient was 0.7375, which was much better than the Kappa coefficient values of the three conventional classification methods represented in this paper. This study provides effective information support and auxiliary decision analysis for management departments to scientifically plan and environmentally manage coastal aquaculture areas.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.