“…In the particular problem of pre-miRNAs prediction, the big challengue here is that there are only tens or hundreds of well-known pre-miRNAs (the positive class), versus millions of unknown (unlabeled) sequences across the rest of the genome, most of which are really negative class albeit including yet unknown hidden pre-miRNAs. For example, the Anopheles gambiae genome has only 66 well-known pre-miRNAs, but more than 4 million hairpin-like sequences, thus giving an imbalance of 1:60,000 ( Bugnon et al, 2019 ). In the case of viruses, for example, the value of imbalance ranges from 1:30 approximately in the bovine leukemia virus, 1:130 in the Epstein-Barr virus, and up to 1:400 in the Herpes virus of turkeys, which has only 8 known pre-miRNAs and a genome of 159 kb.…”