Due to its low computational cost, Lasso is an attractive regularization method for high-dimensional statistical settings. In this paper, we consider multivariate counting processes depending on an unknown function parameter to be estimated by linear combinations of a fixed dictionary. To select coefficients, we propose an adaptive $\ell_1$-penalization methodology, where data-driven weights of the penalty are derived from new Bernstein type inequalities for martingales. Oracle inequalities are established under assumptions on the Gram matrix of the dictionary. Nonasymptotic probabilistic results for multivariate Hawkes processes are proven, which allows us to check these assumptions by considering general dictionaries based on histograms, Fourier or wavelet bases. Motivated by problems of neuronal activity inference, we finally carry out a simulation study for multivariate Hawkes processes and compare our methodology with the adaptive Lasso procedure proposed by Zou in (J. Amer. Statist. Assoc. 101 (2006) 1418-1429). We observe an excellent behavior of our procedure. We rely on theoretical aspects for the essential question of tuning our methodology. Unlike adaptive Lasso of (J. Amer. Statist. Assoc. 101 (2006) 1418-1429), our tuning procedure is proven to be robust with respect to all the parameters of the problem, revealing its potential for concrete purposes, in particular in neuroscience.Comment: Published at http://dx.doi.org/10.3150/13-BEJ562 in the Bernoulli (http://isi.cbs.nl/bernoulli/) by the International Statistical Institute/Bernoulli Society (http://isi.cbs.nl/BS/bshome.htm
Summary Background Improvements to prognostic models in metastatic castration-resistant prostate cancer have the potential to augment clinical trial design and guide treatment strategies. In partnership with Project Data Sphere, a not-for-profit initiative allowing data from cancer clinical trials to be shared broadly with researchers, we designed an open-data, crowdsourced, DREAM (Dialogue for Reverse Engineering Assessments and Methods) challenge to not only identify a better prognostic model for prediction of survival in patients with metastatic castration-resistant prostate cancer but also engage a community of international data scientists to study this disease. Methods Data from the comparator arms of four phase 3 clinical trials in first-line metastatic castration-resistant prostate cancer were obtained from Project Data Sphere, comprising 476 patients treated with docetaxel and prednisone from the ASCENT2 trial, 526 patients treated with docetaxel, prednisone, and placebo in the MAINSAIL trial, 598 patients treated with docetaxel, prednisone or prednisolone, and placebo in the VENICE trial, and 470 patients treated with docetaxel and placebo in the ENTHUSE 33 trial. Datasets consisting of more than 150 clinical variables were curated centrally, including demographics, laboratory values, medical history, lesion sites, and previous treatments. Data from ASCENT2, MAINSAIL, and VENICE were released publicly to be used as training data to predict the outcome of interest—namely, overall survival. Clinical data were also released for ENTHUSE 33, but data for outcome variables (overall survival and event status) were hidden from the challenge participants so that ENTHUSE 33 could be used for independent validation. Methods were evaluated using the integrated time-dependent area under the curve (iAUC). The reference model, based on eight clinical variables and a penalised Cox proportional-hazards model, was used to compare method performance. Further validation was done using data from a fifth trial—ENTHUSE M1—in which 266 patients with metastatic castration-resistant prostate cancer were treated with placebo alone. Findings 50 independent methods were developed to predict overall survival and were evaluated through the DREAM challenge. The top performer was based on an ensemble of penalised Cox regression models (ePCR), which uniquely identified predictive interaction effects with immune biomarkers and markers of hepatic and renal function. Overall, ePCR outperformed all other methods (iAUC 0·791; Bayes factor >5) and surpassed the reference model (iAUC 0·743; Bayes factor >20). Both the ePCR model and reference models stratified patients in the ENTHUSE 33 trial into high-risk and low-risk groups with significantly different overall survival (ePCR: hazard ratio 3·32, 95% CI 2·39–4·62, p<0·0001; reference model: 2·56, 1·85–3·53, p<0·0001). The new model was validated further on the ENTHUSE M1 cohort with similarly high performance (iAUC 0·768). Meta-analysis across all methods confirmed previously identified...
Transcripts from spacer sequences within chromosomal repeat clusters [CRISPRs (clusters of regularly interspaced palindromic repeats)] from archaea have been implicated in inhibiting or regulating the propagation of archaeal viruses and plasmids. For the crenarchaeal thermoacidophiles, the chromosomal spacers show a high level of matches ( approximately 30%) with viral or plasmid genomes. Moreover, their distribution along the virus/plasmid genomes, as well as their DNA strand specificity, appear to be random. This is consistent with the hypothesis that chromosomal spacers are taken up directly and randomly from virus and plasmid DNA and that the spacer transcripts target the genomic DNA of the extrachromosomal elements and not their transcripts.
Common Chromosomal Fragile Sites (CFSs) are specific genomic regions prone to form breaks on metaphase chromosomes in response to replication stress. Moreover, CFSs are mutational hotspots in cancer genomes, showing that the mutational mechanisms that operate at CFSs are highly active in cancer cells. Orthologs of human CFSs are found in a number of other mammals, but the extent of CFS conservation beyond the mammalian lineage is unclear. Characterization of CFSs from distantly related organisms can provide new insight into the biology underlying CFSs. Here, we have mapped CFSs in an avian cell line. We find that, overall the most significant CFSs coincide with extremely large conserved genes, from which very long transcripts are produced. However, no significant correlation between any sequence characteristics and CFSs is found. Moreover, we identified putative early replicating fragile sites (ERFSs), which is a distinct class of fragile sites and we developed a fluctuation analysis revealing high mutation rates at the CFS gene PARK2, with deletions as the most prevalent mutation. Finally, we show that avian homologs of the human CFS genes despite their fragility have resisted the general intron size reduction observed in birds suggesting that CFSs have a conserved biological function.
The sparse group lasso optimization problem is solved using a coordinate gradient descent algorithm. The algorithm is applicable to a broad class of convex loss functions. Convergence of the algorithm is established, and the algorithm is used to investigate the performance of the multinomial sparse group lasso classifier. On three different real data examples the multinomial group lasso clearly outperforms multinomial lasso in terms of achieved classification error rate and in terms of including fewer features for the classification. The run-time of our sparse group lasso implementation is of the same order of magnitude as the multinomial lasso algorithm implemented in the R package glmnet. Our implementation scales well with the problem size. One of the high dimensional examples considered is a 50 class classification problem with 10k features, which amounts to estimating 500k parameters. The implementation is available as the R package msgl. In the middle loop, Algorithm 2, the block gradient (10) is computed for all m blocks. To determine if a block is zero it is, in fact, sufficient to compute an upper bound on the block gradient. Since the gradient, q, is already computed we focus on the term involving the Hessian. That is, for J = 1, . . . , m, we compute a b J ∈ R such thatwhen K J (λαξ (J) , |q (J) |) ≤ λ(1 − α)γ J and otherwise let t J = 0. When
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.