We develop a general approach to valid inference after model selection. At
the core of our framework is a result that characterizes the distribution of a
post-selection estimator conditioned on the selection event. We specialize the
approach to model selection by the lasso to form valid confidence intervals for
the selected coefficients and test whether all relevant variables have been
included in the model.Comment: Published at http://dx.doi.org/10.1214/15-AOS1371 in the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Supervised and semi-supervised source separation algorithms based on non-negative matrix factorization have been shown to be quite effective. However, they require isolated training examples of one or more sources, which is often difficult to obtain. This limits the practical applicability of these algorithms. We examine the problem of efficiently utilizing general training data in the absence of specific training examples. Specifically, we propose a method to learn a universal speech model from a general corpus of speech and show how to use this model to separate speech from other sound sources. This model is used in lieu of a speech model trained on speaker-dependent training examples, and thus circumvents the aforementioned problem. Our experimental results show that our method achieves nearly the same performance as when speaker-dependent training examples are used. Furthermore, we show that our method improves performance when training data of the non-speech source is available.
BACKGROUND
Evidence of racial/ethnic inequalities in tobacco outlet density is limited by: (1) reliance on studies from single counties or states, (2) limited attention to spatial dependence, and (3) an unclear theory-based relationship between neighborhood composition and tobacco outlet density.
METHODS
In 97 counties from the contiguous US, we calculated the 2012 density of likely tobacco outlets (N=90,407), defined as tobacco outlets per 1,000 population in census tracts (n=17,667). We used two spatial regression techniques, (1) a spatial errors approach in GeoDa software and (2) fitting a covariance function to the errors using a distance matrix of all tract centroids. We examined density as a function of race, ethnicity, income, and two indicators identified from city planning literature to indicate neighborhood stability (vacant housing, renter-occupied housing).
RESULTS
The average density was 1.3 tobacco outlets per 1,000 persons. Both spatial regression approaches yielded similar results. In unadjusted models, tobacco outlet density was positively associated with the proportion of Black residents and negatively associated with the proportion of Asian residents, White residents and median household income. There was no association with the proportion of Hispanic residents. Indicators of neighborhood stability explained the disproportionate density associated with Black residential composition, but inequalities by income persisted in multivariable models.
CONCLUSIONS
Data from a large sample of US counties and results from two techniques to address spatial dependence strengthen evidence of inequalities in tobacco outlet density by race and income. Further research is needed to understand the underlying mechanisms in order to strengthen interventions.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.