“…); (d) there is much empirical evidence that the larger the dataset, the better the performance of DL models, which offers many opportunities to design specific topologies (deep neural networks) to deal with any type of data in a better way than current models used in GS, because DL models with topologies like CNN can very efficiently capture the correlation (special structure) between adjacent input variables, that is, linkage disequilibrium between nearby SNPs; (f) some DL topologies like CNN have the capability to significantly reduce the number of parameters (number of operations) that need to be estimated because CNN allows sharing parameters and performing data compression (using the pooling operation) without the need to estimate more parameters; and (g) the modeling paradigm of DL is closer to the complex systems that give rise to the observed phenotypic values of some traits. For these reasons, the incorporation of DL for classical breeding pipelines is in progress and some uses of DL are given next: 1) for the prediction of parental combinations, which is critical for choosing superior combinational homozygous parental lines in F1-hybrid breeding programs [ 84 ], 2) for modelling and predicting quantitative characteristics, for example, to perform image-based ear counting of wheat with high level of robustness, without considering variables, such as growth stage and weather conditions [ 85 ], 3) for genetic diversity and genotype classification, for example, in Cinnamomum osmophloeum Kanehira (Lauraceae), DL was applied to differentiate between morphologically similar species [ 86 ], and 4) for genomic selection (see Table 1 ).…”