Abstract. This study evaluates the application of artificial intelligence (AI) to the automatic
classification of radiolarians and uses as an example eight distinct
morphospecies of the Eocene radiolarian genus Podocyrtis, which are part of three
different evolutionary lineages and are useful in biostratigraphy. The
samples used in this study were recovered from the equatorial Atlantic (ODP
Leg 207) and were supplemented with some samples coming from the North
Atlantic and Indian Oceans. To create an automatic classification tool,
numerous images of the investigated species were needed to train a
MobileNet convolutional neural network entirely coded in Python. Three
different datasets were obtained. The first one consists of a mixture of
broken and complete specimens, some of which sometimes appear blurry. The
second and third datasets were leveled down into two further steps, which
excludes broken and blurry specimens while increasing the quality. The
convolutional neural network randomly selected 85 % of all specimens for
training, while the remaining 15 % were used for validation. The MobileNet
architecture had an overall accuracy of about 91 % for all datasets.
Three predicational models were thereafter created, which had been trained
on each dataset and worked well for classification of Podocyrtis coming from the
Indian Ocean (Madingley Rise, ODP Leg 115, Hole 711A) and the western North
Atlantic Ocean (New Jersey slope, DSDP Leg 95, Hole 612 and Blake Nose, ODP
Leg 171B, Hole 1051A). These samples also provided clearer images since they
were mounted with Canada balsam rather than Norland epoxy. In spite of some
morphological differences encountered in different parts of the world's
oceans and differences in image quality, most species could be correctly
classified or at least classified with a neighboring species along a
lineage. Classification improved slightly for some species by cropping
and/or removing background particles of images which did not segment
properly in the image processing. However, depending on cropping or
background removal, the best result came from the predictive model trained on
the normal stacked dataset consisting of a mixture of broken and complete
specimens.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.