Shunpu Zhang scite author profile

Health indices provide information to the general public on the health condition of the community. They can also be used to inform the government's policy making, to evaluate the effect of a current policy or healthcare program, or for program planning and priority setting. It is a common practice that the health indices across different geographic units are ranked and the ranks are reported as fixed values. We argue that the ranks should be viewed as random and hence should be accompanied by an indication of precision (i.e., the confidence intervals). A technical difficulty in doing so is how to account for the dependence among the ranks in the construction of confidence intervals. In this paper, we propose a novel Monte Carlo method for constructing the individual and simultaneous confidence intervals of ranks for age-adjusted rates. The proposed method uses as input age-specific counts (of cases of disease or deaths) and their associated populations. We have further extended it to the case in which only the age-adjusted rates and confidence intervals are available. Finally, we demonstrate the proposed method to analyze US age-adjusted cancer incidence rates and mortality rates for cancer and other diseases by states and counties within a state using a website that will be publicly available. The results show that for rare or relatively rare disease (especially at the county level), ranks are essentially meaningless because of their large variability, while for more common disease in larger geographic units, ranks can be effectively utilized. Copyright

show abstract

Next generation analytic tools for large scale genetic epidemiology studies of complex diseases

Mechanic

Chen

Amos

et al. 2011

Genetic Epidemiology

View full text Add to dashboard Cite

Over the past several years, genome-wide association studies (GWAS) have succeeded in identifying hundreds of genetic markers associated with common diseases. However, most of these markers confer relatively small increments of risk and explain only a small proportion of familial clustering. To identify obstacles to future progress in genetic epidemiology research and provide recommendations to NIH for overcoming these barriers, the National Cancer Institute sponsored a workshop entitled “Next Generation Analytic Tools for Large-Scale Genetic Epidemiology Studies of Complex Diseases” on September 15–16, 2010. The goal of the workshop was to facilitate discussions on (1) statistical strategies and methods to efficiently identify genetic and environmental factors contributing to the risk of complex disease; and (2) how to develop, apply, and evaluate these strategies for the design, analysis, and interpretation of large-scale complex disease association studies in order to guide NIH in setting the future agenda in this area of research. The workshop was organized as a series of short presentations covering scientific (gene-gene and gene-environment interaction, complex phenotypes, and rare variants and next generation sequencing) and methodological (simulation modeling and computational resources and data management) topic areas. Specific needs to advance the field were identified during each session and are summarized.

show abstract

A comprehensive evaluation of SAM, the SAM R-package and a simple modification to improve its performance

Zhang

2007

BMC Bioinformatics

View full text Add to dashboard Cite

Background: The Significance Analysis of Microarrays (SAM) is a popular method for detecting significantly expressed genes and controlling the false discovery rate (FDR). Recently, it has been reported in the literature that the FDR is not well controlled by SAM. Due to the vast application of SAM in microarray data analysis, it is of great importance to have an extensive evaluation of SAM and its associated R-package (sam2.20).

show abstract

On kernel density estimation near endpoints

Zhang¹,

Karunamuni²

1998

Journal of Statistical Planning and Inference

View full text Add to dashboard Cite

An improved string composition method for sequence comparison

Zhang

Fang

2008

BMC Bioinformatics

View full text Add to dashboard Cite

BackgroundHistorically, two categories of computational algorithms (alignment-based and alignment-free) have been applied to sequence comparison–one of the most fundamental issues in bioinformatics. Multiple sequence alignment, although dominantly used by biologists, possesses both fundamental as well as computational limitations. Consequently, alignment-free methods have been explored as important alternatives in estimating sequence similarity. Of the alignment-free methods, the string composition vector (CV) methods, which use the frequencies of nucleotide or amino acid strings to represent sequence information, show promising results in genome sequence comparison of prokaryotes. The existing CV-based methods, however, suffer certain statistical problems, thereby underestimating the amount of evolutionary information in genetic sequences.ResultsWe show that the existing string composition based methods have two problems, one related to the Markov model assumption and the other associated with the denominator of the frequency normalization equation. We propose an improved complete composition vector method under the assumption of a uniform and independent model to estimate sequence information contributing to selection for sequence comparison. Phylogenetic analyses using both simulated and experimental data sets demonstrate that our new method is more robust compared with existing counterparts and comparable in robustness with alignment-based methods.ConclusionWe observed two problems existing in the currently used string composition methods and proposed a new robust method for the estimation of evolutionary information of genetic sequences. In addition, we discussed that it might not be necessary to use relatively long strings to build a complete composition vector (CCV), due to the overlapping nature of vector strings with a variable length. We suggested a practical approach for the choice of an optimal string length to construct the CCV.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Shunpu Zhang

Confidence intervals for ranks of age-adjusted rates across states or counties

Next generation analytic tools for large scale genetic epidemiology studies of complex diseases

A comprehensive evaluation of SAM, the SAM R-package and a simple modification to improve its performance

On kernel density estimation near endpoints

An improved string composition method for sequence comparison

Contact Info

Product

Resources

About