BackgroundThe TNM staging system is based on three anatomic prognostic factors: Tumor, Lymph Node and Metastasis. However, cancer is no longer considered an anatomic disease. Therefore, the TNM should be expanded to accommodate new prognostic factors in order to increase the accuracy of estimating cancer patient outcome. The ensemble algorithm for clustering cancer data (EACCD) by Chen et al. reflects an effort to expand the TNM without changing its basic definitions. Though results on using EACCD have been reported, there has been no study on the analysis of the algorithm. In this report, we examine various aspects of EACCD using a large breast cancer patient dataset. We compared the output of EACCD with the corresponding survival curves, investigated the effect of different settings in EACCD, and compared EACCD with alternative clustering approaches.ResultsUsing the basic T and N definitions, EACCD generated a dendrogram that shows a graphic relationship among the survival curves of the breast cancer patients. The dendrograms from EACCD are robust for large values of m (the number of runs in the learning step). When m is large, the dendrograms depend on the linkage functions.The statistical tests, however, employed in the learning step have minimal effect on the dendrogram for large m. In addition, if omitting the step for learning dissimilarity in EACCD, the resulting approaches can have a degraded performance. Furthermore, clustering only based on prognostic factors could generate misleading dendrograms, and direct use of partitioning techniques could lead to misleading assignments to clusters.ConclusionsWhen only the Partitioning Around Medoids (PAM) algorithm is involved in the step of learning dissimilarity, large values of m are required to obtain robust dendrograms, and for a large m EACCD can effectively cluster cancer patient data.
The grouping algorithm developed for complex, large-scale data improves the prediction of 5-year costs. The prediction accuracy could be improved by utilization of a richer set of prognostic factors and refinement of categorical specifications.
A133 hospitalized for ≥ 2 days (hospitalized patients; HPs), or using emergency room (ER) or observation for 1 day (emergency room patients; ERPs). Reimbursements were based on claims and inflated to 2010 USD; costs were derived from 2010 Premier data. Net reimbursement was analyzed by MS-DRG and length of stay (LOS). The risk of all-cause hospitalization and factors correlated with LOS were determined using regression modeling. Results: Across all study years, the median age was 71 for HPs, 65 for ERPs. Median Charlson Comorbidity Index (CCI) was 4 for HPs and 2 for ERPs. HPs had more cellulitis on the leg or surgical infection; ERPs had more cellulitis on the face, trunk, or arm. Median HP LOS was 4 days; 33% of patients had LOS > 6 days. Age, race, and history of bacterial infection were correlated with LOS.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.