Topological Data Analysis (TDA) refers to a collection of methods that find the structure of shapes in data. Although recently, TDA methods have been used in many areas of data mining, it has not been widely applied to text mining tasks. In most text processing algorithms, the order in which different entities appear or co-appear is being lost. Assuming these lost orders are informative features of the data, TDA may play a significant role in the resulted gap on text processing state of the art. Once provided, the topology of different entities through a textual document may reveal some additive information regarding the document that is not reflected in any other features from conventional text processing methods. In this paper, we introduce a novel approach that hires TDA in text processing in order to capture and use the topology of different same-type entities in textual documents. First, we will show how to extract some topological signatures in the text using persistent homology-i.e., a TDA tool that captures topological signature of data cloud. Then we will show how to utilize these signatures for text classification.
Background Individuals may use unhealthy coping mechanisms such as alcohol, tobacco, and unhealthy snack consumption. The purpose of this study was to assess how neighborhood disadvantage is associated with sales of alcohol, tobacco, and unhealthy snacks at stores of a discount variety store chain. Methods Alcohol, tobacco, and unhealthy snack sales were measured monthly for 20 months, 2017–2018, in 16 discount variety stores in the United States. Mixed effects linear regressions adjusted for population size, with store-specific random effects, to examine the relationship of weekly unit sales with three outcome variables and neighborhood disadvantage, measured using the Area Deprivation Index (ADI). Results The discount variety stores were located in neighborhoods where the median ADI percentile was 87 [interquartile range 83,89], compared to the median ADI percentile of 50 for all US communities, indicating that the stores were located in substantially disadvantaged neighborhoods. For every 1% increase in ADI, weekly unit sales of unhealthy snack food increased by 43 [95% confidence interval, CI 28–57], and weekly unit sales of tobacco products increased by 11.5 [95% CI 5–18] per store. No significant relationship between neighborhood disadvantage and the weekly unit sales of alcohol products was identified. Conclusions The positive relationship between neighborhood disadvantage and the sale of tobacco and snack foods may help explain the pathway between neighborhood disadvantage and poor health outcomes. It would be useful for future research to examine how neighborhood disadvantage influences resident health-related behaviors.
Universities typically offer residential students a variety of fast-food dining options as part of the student meal plan. When residential students make fast-food purchases on campus there is a digital record of the transaction which can be used to study food purchasing behavior. This study examines the association between student demographic, economic, and behavioral factors and the healthfulness of student fast-food purchases. The 3781 fast-food items sold at the University of North Carolina at Charlotte from fall 2016 to spring 2019 were given a Fast-Food Health Score. Each student participating in the university meal plan was given a Student Average Fast-Food Health Score; calculated by averaging the Fast-Food Health Scores associated with each food and beverage item the student purchased at a fast-food vendor, concession stand, or convenience store over a semester. This analysis included 14,367 students who generated 1,593,235 transactions valued at $10,757,110. Multivariate analyses were used to examine demographic, economic, and behavioral factors associated with Student Average Fast-Food Health Scores. Being of a low income, spending more money on fast-food items, and having a lower GPA were associated with lower Student Average Fast-Food Health Scores. Future research utilizing institutional food transaction data to study healthy food choices is warranted.
Background Many lower-income communities in the United States lack a full-line grocery store. There is evidence that the Special Supplemental Nutrition Program for Women, Infants, and Children (WIC) increases the availability of healthy foods in stores. One national discount variety store chain (DVS) that is often located in low-income neighborhoods became an authorized WIC vendor in 8 pilot stores. Objectives The objective of this study was to evaluate how implementing WIC in DVS pilot stores affected sales of healthy, WIC-eligible foods. Methods We used DVS sales data and difference-in-differences regression to evaluate how WIC authorization affected sales of WIC-eligible foods in 8 DVS pilot stores, compared with 8 matched comparison stores. Results DVS added 18 new WIC-approved foods to become an authorized vendor. Results indicate that becoming a WIC vendor significantly increased sales of healthy, WIC-eligible foods that DVS carried before authorization. WIC implementation in DVS led to a 31-unit increase in sales of the original WIC foods per week on average (P < 0.01). Lower socioeconomic status, assessed using a summary measure, is associated with increased sales of WIC foods. Yet sales of non-WIC eligible foods (e.g., salty snack foods, candy bars, soda, and processed meats) were not affected by WIC authorization. Conclusions Encouraging DVS stores to become WIC-authorized vendors has the potential to modestly increase DVS sales and the availability of healthy foods in low-income neighborhoods. If WIC authorization is financially viable for small-format variety stores, encouraging similar small-format variety stores to become WIC-authorized has the potential to improve food access.
Machine learning (ML) model explainability has received growing attention, especially in the area related to model risk and regulations. In this paper, we reviewed and compared some popular ML model explainability methodologies, especially those related to Natural Language Processing (NLP) models. We then applied one of the NLP explainability methods Layer-wise Relevance Propagation (LRP) to a NLP classification model. We used the LRP method to derive a relevance score for each word in an instance, which is a local explainability. The relevance scores are then aggregated together to achieve global variable importance of the model. Through the case study, we also demonstrated how to apply the local explainability method to false positive and false negative instances to discover the weakness of a NLP model. These analysis can help us to understand NLP models better and reduce the risk due to the black-box nature of NLP models. We also identified some common issues due to the special natures of NLP models and discussed how explainability analysis can act as a control to detect these issues after the model has been trained.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.