Fat deposition in pigs is not only closely related to
pig production
efficiency and pork quality but also an ideal model for human obesity.
Transcriptome sequencing is widely used to study fat deposition. However,
due to small sample sizes, high false positive rates, and poor consistency
of results from different studies, new strategies are urgently needed.
Machine learning, a new analysis method, can effectively fit complex
data and accurately identify samples and genes. In this study, 36
samples of adipose tissue, muscle tissue, and liver tissue were collected
from Songliao black pigs and Landrace pigs, and the mRNA of all the
samples was sequenced. In addition, we collected transcriptome data
for 64 samples in the GEO database from four different sources. After
standardization and imputation of missing values in the data set comprising
100 samples, traditional differential expression analysis was carried
out, and different numbers of expressed genes were selected as features
for the training model of eight machine learning methods. In the 1000
replications of fourfold cross validation with 100 samples, AdaBoost
performed best, with an average prediction accuracy greater than 93%
and the highest mean area under the curve in predicting the high-
and low-fat content groups among the eight ML methods. According to
their performance-based ranks inferred by AdaBoost, 12 genes related
to fat deposition were identified; among them, FASN and APOD were specifically expressed in adipose
tissue, and APOA1 was specifically expressed in the
liver, which could be important candidate biomarkers affecting fat
deposition.