In 2010 Zenga introduced a new three-parameter model for distributions by size which can be used to represent income, wealth, nancial and actuarial variables. In this paper a summary of its main properties is proposed. After that the article focuses on the interpretation of the parameters in term of inequality. The scale parameter µ is equal to the expectation, and it does not aect the inequality, while the two shape parameters α and θ are an inverse and a direct inequality indicators respectively. This result is obtained through stochastic orders based on inequality curves. A procedure to generate random sample from Zenga distribution is also proposed. The second part of the article is about the parameter estimation. Analytical solution of method of moments estimators is obtained. This result is used as starting point of numerical procedures to obtain maximum likelihood estimates both on ungrouped and grouped data. In the application, three empirical income distributions are considered and the aforementioned estimates are evaluated.
The aim of this paper is to establish an ordering related to the inequality for the recently introduced Zenga distribution. In addition to the well-known order based on the Lorenz curve, the order based on I(p) curve is considered. Since the Zenga distribution seems to be suitable to model wealth, financial, actuarial and especially, income distributions, these findings are fundamental in the understanding of how parameter values are related to inequality. This investigation shows that for the Zenga distribution, two of the three parameters are inequality indicators.
We review recent literature that proposes to adapt ideas from classical model based optimal design of experiments to problems of data selection of large datasets. Special attention is given to bias reduction and to protection against confounders. Some new results are presented. Theoretical and computational comparisons are made. K E Y W O R D Sconfounders, large datasets, model bias, optimal experimental design INTRODUCTIONFor the analysis of big datasets, statistical methods have been developed, which use the full available dataset. For example, new methodologies developed in the context of Big Data and focussed on a 'divide-and-recombine' approach are summarised in Wang et al. 19 Other major methods address the scalability of Big Data through Bayesian inference based on a Consensus Monte Carlo algorithm 13 and sparsity assumptions. 16 In contrast, other authors argue on the advantages of inference statements based on a well-chosen subset of the large dataset. Big datasets are characterised by few key factors. While usually data can be collected in scientific studies via active or passive observation, Big Data is often collected in passive way. Rarely their collection is the result of a designed process. This generates sources of bias, which either we do not know at all or are too costly to control. Nevertheless they will affect the overall distribution of the observed variables. 3,11 Many authors in Ref. 15 argues that analysis of big dataset is effected by issues of bias and confounding, selection bias and other sampling problems (see, for example, Sharpes 14 for electronic health records). Often the causal effect of interest can only be measured on the average and great care has to be taken about the background population, for example, it is possible to consider and analyse every message on Twitter and use it to draw conclusions about the public opinion, but it is known that Twitter users are not representative of the whole population. The analysis of the full dataset might be prohibitive because of computational and time constraints. Indeed, in some cases, the analysis of the full dataset might also be not advisable. 4,6 To recall just one example, where the sample proportion of a self-reported big dataset of size 2300,000 unit has the same mean squared error as the sample proportion from a suitable simple random sample (SRS) of size 400 and a Law of Large Population has been defined in order to qualify this (see Meng 9 ).Recently, some researchers argued on the usefulness of utilising methods and ideas from design of experiment (DoE) for the analysis of big datasets, more specifically from model-based optimal experimental design. They argue that specialThis is an open access article under the terms of the Creative Commons Attribution-NonCommercial-NoDerivs License, which permits use and distribution in any medium, provided the original work is properly cited, the use is non-commercial and no modifications or adaptations are made.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.