Plant species, including algae and fungi, are based on type specimens to which the name of a taxon is permanently attached. Applying a scientific name to any specimen therefore requires demonstrating correspondence between the type and that specimen. Traditionally, identifications are based on morpho-anatomical characters, but recently systematists are using DNA sequence data. These studies are flawed if the DNA is isolated from misidentified modern specimens. We propose a genome-based solution. Using 4 × 4 mm2 of material from type specimens, we assembled 14 plastid and 15 mitochondrial genomes attributed to the red algae Pyropia perforata, Py. fucicola, and Py. kanakaensis. The chloroplast genomes were fairly conserved, but the mitochondrial genomes differed significantly among populations in content and length. Complete genomes are attainable from 19th and early 20th century type specimens; this validates the effort and cost of their curation as well as supports the practice of the type method.
Data annotation bias is found in many situations. Often it can be ignored as just another component of the noise floor. However, it is especially prevalent in crowdsourcing tasks and must be actively managed. Annotation bias on single data items has been studied with regard to data difficulty, annotator bias, etc., while annotation bias on batches of multiple data items simultaneously presented to annotators has not been studied. In this paper, we verify the existence of "in-batch annotation bias" between data items in the same batch. We propose a factor graph based batch annotation model to quantitatively capture the in-batch annotation bias, and measure the bias during a crowdsourcing annotation process of inappropriate comments in LinkedIn. We discover that annotators tend to make polarized annotations for the entire batch of data items in our task. We further leverage the batch annotation model to propose a novel batch active learning algorithm. We test the algorithm on a real crowdsourcing platform and find that it outperforms in-batch bias naïve algorithms.
a b s t r a c tThe dramatic growth of storage capacity and network bandwidth is making it increasingly difficult for forensic examiners to report what is present on a piece of subject media. Instead, analysts are focusing on what characteristics of the media have changed between two snapshots in time. To date different algorithms have been implemented for performing differential analysis of computer media, memory, digital documents, network traces, and other kinds of digital evidence. This paper presents an abstract differencing strategy and applies it to all of these problem domains. Use of an abstract strategy allows the lessons gleaned in one problem domain to be directly applied to others.Published by Elsevier Ltd.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.