Abbreviations: BGCs, biosynthetic gene clusters; antiS-MASH, antibiotics and secondary metabolite analysis shell; iChip, isolation chip; PK(S), polyketide (synthase); NRP(S), nonribosomal peptide (synthase)
IntroductionDrug discovery, the first critical step to identify rational lead candidates in the novel drug development, has always been a challenging, time-consuming and laborious scientific task requiring expertise and experience. Historically, natural products or their related derivatives were the main source of officially-approved drugs, with more than 50% of clinical drugs approved between 1981and 2014 were derived from natural products.1 Even in recent days, natural products have still continued to enter clinical trials or to be approved to market, including Trabectedin (ET-743), 2 Halaven (eribulin mesylate), 3,4 Bryostatin 5 and so on. It is believed that the huge chemical structure diversity and the biodiversity of natural products make the greatest contributions to the success of natural products. However, combinatorial chemistries and high-throughput screening in drug discovery over the past decades have darkened the honorable outlook of natural products to some extent, 6 which lead to the growing studies on the novel drug discovery methods in the post-genomic era.
Drug discovery in the post-genomic eraWith the growing development of DNA sequencing and synthetic biology technologies (e.g. proteomics, metabolomics, bioinformatics), great interest has been renewed in natural product discovery.7 It has been already accepted that natural products are synthesized by specific metabolic pathways encoded by biosynthetic gene clusters (BGCs), based on which virtually all natural products could be identified by DNA sequencing and metagenomic analysis theoretically. In this view, the chemical space could be far more covered and much more novel chemical structures might be discovered such as those encoded by silent biosynthetic gene clusters and uncultured microorganisms.Genomics-driven natural product discovery usually includes three main steps: a. Identification of BGCs. Antibiotics from microbes or plants are directly linked to BGCs coding for proteins associated with biosynthesis, resistance, regulation and transport. So how to prioritize and characterize orphan BGCs is the first crucial procedure from the growing genome sequences. 8,9 Genome mining 10,11 is a novel technology developed for the identification and characterization of orphan BGCs, which is designed to analyze the sequenced genome of specific organisms to identify and determine whether the gene clusters involved in the production of new antibiotics. Several bioinformatics approaches have been proposed to satisfy this purpose, such as HMMER, GOLD, NORINE, SBSPKS, SEARCHPKS, NRPSpedictor 2 , plantiSMASH and so on, among which AntiSMASH (antibiotics and secondary metabolite analysis shell) 12 might be the most widely tools used. It is a comprehensive bioinformatic tool for automated genome mining including the annotation of entire gene clusters.b...