Although ingredients are important items of information in recipes, it is difficult to process them, especially for computers, because they are user-generated informal text. To normalize ingredients, we can use a character-based encoder-decoder model that takes the character sequence of an ingredient as an input and outputs its canonical form. However, the model still has two problems: The first is that the model often generates unnatural sequences as outputs. The second problem is that the generated sequences are sometimes unrelated to the original ingredient. Therefore, we propose a two-step validation to generate better normalizations. In the first validation step, we use a trie to limit the normalization candidates to existing sequences. In the second validation step, we rerank the normalization candidates based on their similarity to the original ingredient. We conducted experiments using a corpus that includes approximately 10 thousand pairs of ingredients and their canonical forms and showed that our proposed validation improved the performance of encoder-decoder models. CCS CONCEPTS • Computing methodologies → Natural language processing;
Cooking recipes play an important role in promoting a healthy lifestyle, and a vast number of user-generated recipes are currently available on the Internet. Allied to this growth in the amount of information is an increase in the number of studies on the use of such data for recipe analysis, recipe generation, and recipe search. However, there have been few attempts to estimate the number of calories per serving in a recipe. This study considers this task and introduces two challenging subtasks: ingredient normalization and serving estimation. The ingredient normalization task aims to convert the ingredients written in a recipe (e.g.,), which says “sesame oil (for finishing)” in Japanese) into their canonical forms (e.g., , sesame oil) so that their calorific content can be looked up in an ingredient dictionary. The serving estimation task aims to convert the amount written in the recipe (e.g., N, N pieces) into the number of servings (e.g., M, M people), thus enabling the calories per serving to be calculated. We apply machine learning-based methods to these tasks and describe their practical deployment in Cookpad, the largest recipe service in the world. A series of experiments demonstrate that the performance of our methods is sufficient for use in real-world services.
Most relevance feedback methods re-rank search results using only the information of surface words in texts. We present a method that uses not only the information of surface words but also that of latent words that are inferred from texts. We infer latent word distribution in each document in the search results using latent Dirichlet allocation (LDA). When feedback is given, we also infer the latent word distribution in the feedback using LDA. We calculate the similarities between the user feedback and each document in the search results using both the surface and latent word distributions and re-rank the search results on the basis of the similarities. Evaluation results show that when user feedback consisting of two documents (3, 589 words) is given, the proposed method improves the initial search results by 27.6% in precision at 10 (P@10). Additionally, it proves that the proposed method can perform well even when only a small amount of user feedback is available. For example, an improvement of 5.3% in P@10 was achieved when user feedback constituted only 57 words.
Recently, the number of user-generated recipes on the Internet has increased. In such recipes, users are generally supposed to write a title, an ingredient list, and steps to create a dish. However, some items in an ingredient list in a user-generated recipe are not actually edible ingredients. For example, headings, comments, and kitchenware sometimes appear in an ingredient list because users can freely write the list in their recipes. Such noise makes it difficult for computers to use recipes for a variety of tasks, such as calorie estimation. To address this issue, we propose a non-ingredient detection method inspired by a neural sequence tagging model. In our experiment, we annotated 6, 675 ingredients in 600 user-generated recipes and showed that our proposed method achieved a 93.3 F1 score.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.