We present a novel natural language processing (NLP) approach to deriving plain English descriptors for science cases otherwise restricted by obfuscating technical terminology. We address the limitations of common radio galaxy morphology classifications by applying this approach. We experimentally derive a set of semantic tags for the Radio Galaxy Zoo EMU (Evolutionary Map of the Universe) project and the wider astronomical community. We collect 8,486 plain English annotations of radio galaxy morphology, from which we derive a taxonomy of tags. The tags are plain English. The result is an extensible framework which is more flexible, more easily communicated, and more sensitive to rare feature combinations which are indescribable using the current framework of radio astronomy classifications.
The volume of data that will be produced by the next generation of astrophysical instruments represents a significant opportunity for making unplanned and unexpected discoveries. Conversely, finding unexpected objects or phenomena within such large volumes of data presents a challenge that may best be solved using computational and statistical approaches. We present the application of a coarse-grained complexity measure for identifying interesting observations in large astronomical datasets. This measure, which has been termed apparent complexity, has been shown to model human intuition and perceptions of complexity. Apparent complexity is computationally efficient to derive and can be used to segment and identify interesting observations in very large datasets based on their morphological complexity. We show using data from the Australia Telescope Large Area Survey (ATLAS) that the apparent complexity can be combined with clustering methods to provide an automated process for distinguishing between images of galaxies which have been classified as having simple and complex morphologies. The approach generalises well when applied to new data after being calibrated on a smaller dataset, where it performs better than tested classification methods using pixel data. This generalisability positions apparent complexity as a suitable machine learning feature for identifying complex observations with unanticipated features.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.