Gas chromatography/olfactometry (GC/O) has been used in various fields as a valuable method to identify odor-active components from a complex mixture. Since human assessors are employed as detectors to obtain the olfactory perception of separated odorants, the GC/O technique is limited by its subjectivity, variability, and high cost of the trained panelists. Here, we present a proof-of-concept model by which odor information can be obtained by machine-learning-based prediction from molecular parameters (MPs) of odorant molecules. The odor prediction models were established using a database of flavors and fragrances including 1026 odorants and corresponding verbal odor descriptors (ODs). Physicochemical parameters of the odorant molecules were acquired by use of molecular calculation software (DRAGON). Ten representative ODs were selected to build the prediction models based on their high frequency of occurrence in the database. The features of the MPs were extracted via either unsupervised (principal component analysis) or supervised (Boruta, BR) approaches and then used as input to calibrate machine-learning models. Predictions were performed by various machine-learning approaches such as support vector machine (SVM), random forest, and extreme learning machine. All models were optimized via parameter tuning and their prediction accuracies were compared. A SVM model combined with feature extraction by BR-C (confirmed only) was found to afford the best results with an accuracy of 97.08%. Validation of the models was verified by using the GC/O data of an apple sample for comparison between the predicted and measured results. The prediction models can be used as an auxiliary tool in the existing GC/O by suggesting possible OD candidates to the panelists and thus helping to give more objective and correct judgment. In addition, a machine-based GC/O in which the panelist is no longer needed might be expected after further development of the proposed odor prediction technique.
This paper describes a probabilistic latent variable model that is designed to detect human values such as justice or freedom that a writer has sought to reflect or appeal to when participating in a public debate. The proposed model treats the words in a sentence as having been chosen based on specific values; values reflected by each sentence are then estimated by aggregating values associated with each word. The model can determine the human values for the word in light of the influence of the previous word. This design choice was motivated by syntactic structures such as noun+noun, adjective+noun, and verb+adjective. The classifier based on the model was evaluated on a test collection containing 102 manually annotated documents focusing on one contentious political issue-Net neutrality, achieving the highest reported classification effectiveness for this task. We also compared our proposed classifier with human second annotator. As a result, the proposed classifier effectiveness is statistically comparable with human annotators.
The relationships between molecular structures and their properties are subtle and complex, and the properties of odor are no exception. Molecules with similar structures, such as a molecule and its optical isomer, may have completely different odors, whereas molecules with completely distinct structures may have similar odors. Many works have attempted to explain the molecular structure-odor relationship from chemical and data-driven perspectives. The Transformer model is widely used in natural language processing and computer vision, and the attention mechanism included in the Transformer model can identify relationships between inputs and outputs. In this paper, we describe the construction of a Transformer model for predicting molecular properties and interpreting the prediction results. The SMILES data of 100,000 molecules are collected and used to predict the existence of molecular substructures, and our proposed model achieves an F1 value of 0.98. The attention matrix is visualized to investigate the substructure annotation performance of the attention mechanism, and we find that certain atoms in the target substructures are accurately annotated. Finally, we collect 4462 molecules and their odor descriptors and use the proposed model to infer 98 odor descriptors, obtaining an average F1 value of 0.33. For the 19 odor descriptors that achieved F1 values greater than 0.45, we also attempt to summarize the relationship between the molecular substructures and odor quality through the attention matrix.
In scientific research, laboratory (lab) notebooks have traditionally been used to record experiments and their results. Lab notebooks also act as an important record of the generation, processing and analysis of data over the research data lifecycle. As research data becomes increasingly digitized and voluminous, it becomes harder to record and describe research data, especially in paper lab notebooks (PLNs). This paper addresses the challenges of recording contemporary research data in lab notebooks and discusses the requirements in lab notebooks for describing research data. Two basic requirements of lab notebooks are the completeness of the data description and the ability to link the experiment records with their corresponding research datasets. Descriptions of research data should also document the provenance of the research data so that the original data can be retrieved. Guidelines for consistent file naming and systematic directory structures for saved research data can support efficient research data retrieval.
The semantic similarity (or distance) between words is one of the basic knowledge in Natural Language Processing. There have been several previous studies on measuring the similarity (or distance) based on word vectors in a multi-dimensional space. In those studies, high dimensional feature vectors of words are made from words' cooccurrence in a corpus or from reference relation in a dictionary, and then the word vectors are calculated from the feature vectors through the method like principal component analysis. This paper proposes a new placement method of nouns into a multi-dimensional space based on words' cooccurrence in a corpus. The proposed method doesn't use the high dimensional feature vectors of words, but is based on the idea that "vectors corresponding to nouns which cooccur with a word w in a relation f constitute a group in the multi-dimensional space". Although the whole meaning of nouns isn't reflected in the word vectors obtained by the proposed method , the semantic similarity (or distance) between nouns defined with the word vectors is proper for an example-based disambiguation method.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.