Having
a compact yet robust structurally based identifier or representation
system is a key enabling factor for efficient sharing and dissemination
of research results within the chemistry community, and such systems
lay down the essential foundations for future informatics and data-driven
research. While substantial advances have been made for small molecules,
the polymer community has struggled in coming up with an efficient
representation system. This is because, unlike other disciplines in
chemistry, the basic premise that each distinct chemical species corresponds
to a well-defined chemical structure does not hold for polymers. Polymers
are intrinsically stochastic molecules that are often ensembles with
a distribution of chemical structures. This difficulty limits the
applicability of all deterministic representations developed for small
molecules. In this work, a new representation system that is capable
of handling the stochastic nature of polymers is proposed. The new
system is based on the popular “simplified molecular-input
line-entry system” (SMILES), and it aims to provide representations
that can be used as indexing identifiers for entries in polymer databases.
As a pilot test, the entries of the standard data set of the glass
transition temperature of linear polymers (Bicerano, 2002) were converted
into the new BigSMILES language. Furthermore, it is hoped that the
proposed system will provide a more effective language for communication
within the polymer community and increase cohesion between the researchers
within the community.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.