Summary1. Hierarchical clustering of molecular data is commonly used for estimation of species diversity in all forms of life. Parameters appropriate for species-level clustering are usually derived from reference data and applied for the delineation of sequences of unknown species membership, although it is not clear how this should be carried out in a multilocus scenario. 2. We introduce a novel means of concurrent clustering parameter optimization and delineation for multilocus data. A simulated annealing heuristic search is performed, whereby clustering thresholds are independently varied for each locus, but optimized according to the recovery of expected taxonomic species globally over loci. For each iteration of the search, one or more loci are randomly selected and a different threshold is separately proposed to cluster each, then the loci are linked to form global species units. Where the set of thresholds group the reference (species labelled) data with high taxonomic congruence, they are adopted for clustering of the subject (nonlabelled) sequences into global molecular operational taxonomic units (global MOTU). Four mined test data sets composed of both reference and subject sequences are combined with a newly sequenced three gene Apoidea data set, and subject to the proposed method. 3. Even optimizing four loci and thousands of sequences, the approach rapidly convergences on a set of parameters with maximal optimality score, although the method masks a high degree of incongruence, and does not always converge on a single set of thresholds. For example, of the 476 Apoidea sequences, 70 global MOTU were inferred over the heuristic search, although the number of single gene MOTU were much lower for the 28S RNA locus, and a range of equally optimal clustering thresholds were observed for the CytB gene. 4. We demonstrate the approach as a scalable species delineation solution for heterogeneous data sets composed of incompletely and inconsistently labelled data from public DNA data bases, for newly sequenced multilocus data, or both. The delineation over a heuristic search of clustering parameters facilitates the estimation of species diversity in multilocus data, giving species estimates that take into account uncertainty regarding choice of clustering thresholds.
Oomyzus spiraculus Song, Fei & Cao sp. nov. (Hymenoptera, Eulophidae) is described and illustrated as a gregarious larval-pupal endoparasitoid of Coccinella septempunctata L. (Coleoptera, Coccinellidae). Differentiation between O. spiraculus and its similar species is discussed and a key to differentiate the female and male of these species is provided. DNA barcodes of O. spiraculus and O. scaposus are analyzed and compared.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.