Shifting from individual neuron analysis to whole-brain neural network analysis opens up new research opportunities for Caenorhabditis elegans (C. elegans). An automated data processing pipeline, including neuron detection, segmentation, tracking and annotation, will significantly improve the efficiency of analyzing whole-brain C. elegans imaging. The resulting large data sets may motivate new scientific discovery by exploiting many promising analysis tools for big data. In this study, we focus on the development of an automated annotation procedure. With only around 180 neurons in the central nervous system of a C. elegans, the annotation of each individual neuron still remains a major challenge because of the high density in space, similarity in neuron shape, unpredictable distortion of the worm's head during motion, intrinsic variations during worm development, etc. We use an ensemble learning approach to achieve around 25% error for a test based on real experimental data. Also, we demonstrate the importance of exploring extra source of information for annotation other than the neuron positions.Key words: C. elegans, Whole-brain imaging, Automated annotation, Ensemble learning
BackgroundCaenorhabditis elegans (C. elegans) is the first living organism to have a complete genome sequencing and neural circuit mapping with all neurons named [28] thanks to its transparent body and the relatively simple neural system. Since the introduction of C. elegans as a model organism for neurobiology by Dr. Sydney Brenner [2], many important scientific discoveries have been achieved in the field of neuroscience, as well as biomedical research [7]. However, previous studies have focused on individual neurons only [24,4,11], which is not sufficient for understanding the mechanism of neural activities at the level of dynamic network. Recent advancement in bioimaging technology allows researchers to efficiently obtain high quality 4D wholebrain images of C. elegans [19]. For example, our laboratory can produce 15 sets of 4D images per week. Nevertheless, the average number of analyzed samples reported in the literature is only around 4 to 5 sets of 4D images [8,18,27]. The main bottleneck comes from the intensive demand on human effort to complete tracking and annotation for a single set of images. Therefore, there has been many studies on the automation of these data processing steps.