Due to current data collection technology, our ability to gather data has surpassed our ability to analyze it. In particular, k-means, one of the simplest and fastest clustering algorithms, is ill-equipped to handle extremely large datasets on even the most powerful machines. Our new algorithm uses a sample from a dataset to decrease runtime by reducing the amount of data analyzed. We perform a simulation study to compare our sampling based k-means to the standard k-means algorithm by analyzing both the speed and accuracy of the two methods. Results show that our algorithm is significantly more efficient than the existing algorithm with comparable accuracy.
We characterize and demonstrate a solution method for an optimal commodity (sales) tax problem consisting of multiple goods, heterogeneous agents, and a nonconvex policy maker optimization problem. Our approach allows for more dimensions of heterogeneity than has been previously possible, incorporates potential model uncertainty and policy objective uncertainty, and relaxes some of the assumptions in the previous literature that were necessary to generate a convex optimization problem for the policy maker. Our solution technique involves creating a large database of optimal responses by different individuals for different policy parameters and using "Big Data" techniques to compute policy maker objective values over these individuals. We calibrate our model to the United States and test the effects of a differentiated optimal commodity tax versus a flat tax and the effect of exempting a broad class of goods (services) from commodity taxation. We find that only a potentially small amount of tax revenue is lost for a given societal welfare level by departing from an optimal differentiated sales tax schedule to a uniform flat tax and that there is only a small loss in revenue from exempting a class of goods such as services in the United States.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.