“…The first option, which this work highlights, is to create a large, diverse, and descriptive molecular data set of various chemical species and structures and to train a versatile model with the aim of predicting the structure of almost any small organic molecule. This approach has also been chosen in most previous sample characterization efforts (e.g., , ), but recently, another method has been used in ice structure discovery, , where instead of a diverse data set, a tailored data set is utilized and perfected to make very accurate predictions possible in a constrained problem domain. That is, if the goal was to predict only the geometries of different hydrocarbons or triangulene-based molecules, the model would benefit from a tailored data set with a heavy emphasis on such structures.…”