A series of 2-arylbenzofurans and 2-arylbenzothiophenes was synthesized carrying three different side chains in position five. The synthesized compounds were tested for NF-κB inhibition to establish a structure activity relationship. It was found that both, the side chain in position five and the substitution pattern of the aryl moiety in position two have a significant influence on the inhibitory activity.
The Helmholtz Association (Anonymous 2022d), the largest association of large-scale research centres in Germany, covers a wide range of research fields employing more than 43.000 researchers. In 2019, the Helmholtz Metadata Collaboration (HMC) (Anonymous 2022f) Platform as a joint endeavor across all research areas of the Helmholtz Association was started to make the depth and breadth of research data produced by Helmholtz Centres findable, accessible, interoperable, and reusable (FAIR) for the whole science community. To reach this goal, the concept of FAIR Digital Objects (FAIR DOs) has been chosen as top-level commonality for existing and future infrastructures of all research fields. In doing so, HMC follows the original approach of realizing FAIR DOs based on globally unique, Persistent Identifiers (PID), e.g., provided by https://handle.net/, machine actionable PID Records and strong typing using Data Types like https://dtr-test.pidconsortium.eu/#objects/21.T11148/1c699a5d1b4ad3ba4956 registered in a Data Type Registry, e.g., http://dtr-test.pidconsortium.eu/. In all these areas, HMC can build on the great groundwork of the Research Data Alliance and the FAIR DO Forum. However, when it comes to realization, there are still some gaps that will have to be addressed during our work and will be raised in this presentation. For single FAIR DO components like PIDs and Data Types, existing infrastructures are already available. Here, the Gesellschaft für wissenschaftliche Datenverarbeitung mbH Göttingen (GWDG) (Anonymous 2022e) provides strong support with their many years of experience in this field. Within the framework of the ePIC consortium (Anonymous 2022c), the GWDG is offering on the one hand PID prefixes based on a sustainable business model, on the other hand GWDG is very active in terms of providing base services required for realizing FAIR DOs, e.g., different instances of Data Type Registries for accessing, creating, and managing Data Types required by FAIR DOs. Besides that, in the context of HMC we developed a couple of technical components to support the creation and management of FAIR DOs: The Typed PID Maker (Pfeil 2022b) providing machine actionable interfaces for creating, validating, and managing PIDs with machine-actionable metadata stored in their PID record, or the FAIR DO testbed, currently evolving into the FAIR DO Lab (Pfeil 2022a), serving as reference implementation for setting up a FAIR DO ecosystem. However, introducing FAIR DOs is not only about providing technical services, but also requires the definition and agreement on interfaces, policies, and processes. A first step in this direction was made in the context of HMC by agreeing on a Helmholtz Kernel Information Profile (http://dtr-test.pidconsortium.eu/#objects/21.T11148/b9b76f887845e32d29f7). In the concept of FAIR DOs, PID Kernel Information as defined by Weigel et al. (Weigel et al. 2018) is key to machine actionability of digital content. Strongly relying on Data Types and stored in the PID record directly at the PID resolution service, PID Kernel Information can be used by machines for fast decision making. The Helmholtz Kernel Information Profile is an attempt to introduce a top-level commonality across all digital assets produced within the Helmholtz Association and beyond to establish a basis for FAIR research data based on FAIR DOs. Hereby, the Helmholtz Kernel Information Profile integrates the recommendations of the RDA PID Kernel Information Working Group (Anonymous 2022b) as far as possible. By extending the Draft Kernel Information Profile (Weigel et al. 2018) with additional, mostly optional attributes, the Helmholtz Kernel Information Profile allows the adding of contextual information to FAIR DOs, e.g., research topic, or contact information, which is then available for machine decisions. Furthermore, additional properties for representing relationships between FAIR DOs, e.g, hasMetadata and isMetadataFor, were introduced to allow mutual relations between FAIR DOs. Currently, a demonstrator is implemented integrating the above components and services, i.e., PID Service, Data Type Registry, and Typed PID Maker. Fig. 1 outlines the architecture overview of the first version of the demonstrator. In this first version, in a semi-automatic workflow, a user enters a Zenodo (Anonymous 2022a) PID in a graphical Web frontend. A mapping component tries to fill automatically at least the properties required by the Helmholtz Kernel Information Profile using the obtained Zenodo metadata record. In a manual validation loop, the user may add or update certain properties before they are sent to an instance of the Typed PID Maker, validated against the Helmholtz Kernel Information Profile, and stored in the record of a newly registered PID using the services of the ePIC consortium. In addition, registered PID records are made searchable via the graphical frontend on top of a search index, e.g., realized using https://www.elastic.co/. After implementing this generic workflow, additional mappers supporting other repository platforms will be implemented based on the lessons learned, which will lead to a growing number of FAIR DOs and holds potential for providing significant benefits to scientists, e.g., a central point of contact for research data sets stored in different repositories, machine-actionable identification of relevant datasets, and creation of knowledge graphs representing relationships between data sets, repository platforms, researchers and research organizations. Furthermore, the gathered experience and its documentation will help others to apply the FAIR DO concept more easily, which will lead to an ever-growing collection of available FAIR DOs with an increasing quality and level of automation at creation time.
The application case for implementing and using the FAIR Digital Object (FAIR DO) concept (Schultes and Wittenburg 2019), aims to simplify the access to label information for composing Machine Learning (ML) (Awad and Khanna 2015) training data. Data sets curated by different domain experts usually have non-identical label terms. This prevents images with similar labels from being easily assigned to the same category. Therefore, using them collectively for application as training data in ML comes with the cost of laborious relabeling. The data needs to be machine-interpretable and -actionable to automate this process. This is enabled by applying the FAIR DO concept. A FAIR DO is a representation of scientific data and requires at least a globally unique Persistent Identifier (PID) (Schultes and Wittenburg 2019), mandatory metadata, and a digital object type. Storing typed information in the PID record demands a prior selection of that information. This includes mandatory metadata and a digital object type to enable machine interpretability and subsequent actionability. The information provided in the PID record refers to its PID Kernel Information Profile (PIDKIP), defined or selected by the creator of the FAIR DO. A PIDKIP is a standard that facilitates the definition and validation of the mandatory metadata attributes in the PID record. This information acts as a basis for a machine to decide if the digital object is reusable for a particular application. Part of that is also the digital object type, which enables a machine to work with the data represented by the FAIR DO. If more information is required, the data itself or other associated FAIR DOs need to be accessed through references in the PID record. Specifying the granularity of the data representation, and the granularity of the metadata in the information record is not a fixed task but depends on the objective. Here, the FAIR DO concept is used for representing image data sets with their label metadata. Each data set contains multiple images, which refer to the same label term. One data set associated with a particular label is represented as one FAIR DO. A type that provides information about this entity covers the packaged format of the images and the image format itself. Further information about the label term and other metadata associated with the data set is provided or accessed through references in the PID record. For the PIDKIP, the Helmholtz KIP was chosen, following the RDA Working Group recommendations on PID Kernel Information (RDA 2013). This profile includes mandatory metadata attributes, used for machine-actionable decisions required for relabeling. Information about the data labels is not directly provided in its PID record, but in another PID record of an associated image label FAIR DO. This one represents a metadata document, containing label information about the data set. Its PID record is based on the same PIDKIP, i.e. the Helmholtz KIP. Both FAIR DOs point to each other. Thus, the image label FAIR DO is accessed via the reference in the PID record of the data set FAIR DO and vice versa. Its PID record contains information about the labels, which are relevant to the relabeling task. Accessing data label information that way means the user does not have to look up each data set, analyze its content and search for its labels. (Fig. 1) The automated procedure for relabeling then looks as follows: A specialized client that can work with PIDs, resolves the PID of a FAIR DO which represents an image data set, and fetches its record. Analyzing its type, the client validates the data usability for composing a ML training data set. Furthermore, the referenced PID of the image label FAIR DO in the record is resolved the same way. By analyzing its PID record, the client identifies that it is relevant for getting information about the labels. The document represented by the image label FAIR DO is accessed via its location path provided in the PID record. To work with its content, a specialized tool is required that is compatible with its format and schema, i.e. its type. This tool identifies and analyzes the label term of the data set for mapping it to corresponding label terms of other image data sets. This specification of FAIR DOs enables the relabeling of entire image data sets for application in ML. However, the current granularity of data representation is insufficient for other machine-based decisions and actions on single images. Another aspect in this regard is to increase the information in the PID record to enable more machine-actionable decisions. This requires reconsideration of the granularity of metadata in the PID record and needs to be balanced with the aim of fast record processing. Changing the content of the PID record also leads to deriving a new PIDKIP, or extending existing ones. Metadata tools applied in conjunction with the FAIR DO concept that uses the label information in the document of the metadata FAIR DOs need further specification. One requirement for their implementation is a standardized data description for the metadata document, using schemas and vocabularies. Using the machine actionability of FAIR DOs described above, enables automation for relabeling data sets. This leaves more time for the ML user to concentrate on model training and optimization. Software development of FAIR DO-specific clients and metadata mapping tools are the subject of current research. The next step is to implement such software, for carrying out the proposed concept on a large scale. This work has been supported by the research program 'Engineering Digital Futures' of the Helmholtz Association of German Research Centers and the Helmholtz Metadata Collaboration Platform (Helmholtz-Gemeinschaft Deutscher Forschungszentren 1995).
No abstract
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.