Although machine learning (ML) has shown promise across disciplines, out-of-sample generalizability is concerning. This is currently addressed by sharing multi-site data, but such centralization is challenging/infeasible to scale due to various limitations. Federated ML (FL) provides an alternative paradigm for accurate and generalizable ML, by only sharing numerical model updates. Here we present the largest FL study to-date, involving data from 71 sites across 6 continents, to generate an automatic tumor boundary detector for the rare disease of glioblastoma, reporting the largest such dataset in the literature (n = 6, 314). We demonstrate a 33% delineation improvement for the surgically targetable tumor, and 23% for the complete tumor extent, over a publicly trained model. We anticipate our study to: 1) enable more healthcare studies informed by large diverse data, ensuring meaningful results for rare diseases and underrepresented populations, 2) facilitate further analyses for glioblastoma by releasing our consensus model, and 3) demonstrate the FL effectiveness at such scale and task-complexity as a paradigm shift for multi-site collaborations, alleviating the need for data-sharing.
The chlorophyll, pheophytin, and their proportions are critical factors to evaluate the sensory quality of green tea. This research aims to establish an effective method to determine the quantification of chlorophyll and pheophytin in green tea, based on Fourier transform infrared (FT–IR) spectroscopy. First, five brands of tea were collected for spectral acquisition, and the chlorophyll and pheophytin were measured using the reference method. Then, a relation between these two pigments and FT–IR spectroscopy were developed based on chemometrics. Additionally, the characteristic IR wavenumbers of these pigments were extracted and proved to be effective for a quantitative determination. Successively, non-linear models were also built based on these characteristic wavenumbers, obtaining coefficients of determination of 0.87, 0.80, 0.85 and 0.89; and relative predictive deviations of 2.77, 2.62, 2.26 and 3.07 for the four pigments, respectively. These results demonstrate the feasibility of FT–IR spectroscopy for the determination of chlorophyll and pheophytin.
The Long Short-Term Memory (LSTM) network is widely used in modeling sequential observations in fields ranging from natural language processing to medical imaging. The LSTM has shown promise for interpreting computed tomography (CT) in lung screening protocols. Yet, traditional image-based LSTM models ignore interval differences, while recently proposed interval-modeled LSTM variants are limited in their ability to interpret temporal proximity. Meanwhile, clinical imaging acquisition may be irregularly sampled, and such sampling patterns may be commingled with clinical usages. In this paper, we propose the Distanced LSTM (DLSTM) by introducing time-distanced (i.e., time distance to the last scan) gates with a temporal emphasis model (TEM) targeting at lung cancer diagnosis (i.e., evaluating the malignancy of pulmonary nodules). Briefly, (1) the time distance of every scan to the last scan is modeled explicitly, (2) time-distanced input and forget gates in DLSTM are introduced across regular and irregular sampling sequences, and (3) the newer scan in serial data is emphasized by the TEM. The DLSTM algorithm is evaluated with both simulated data and real CT images (from 1794 National Lung Screening Trial (NLST) patients with longitudinal scans and 1420 clinical studied patients). Experimental results on simulated data indicate the DLSTM can capture families of temporal relationships that cannot be detected with traditional LSTM. Cross-validation on empirical CT datasets demonstrates that DLSTM achieves leading performance on both regularly and irregularly sampled data (e.g., improving LSTM from 0.6785 to 0.7085 on F1 score in NLST). In externalvalidation on irregularly acquired data, the benchmarks achieved 0.8350 (CNN feature) and 0.8380 (with LSTM) on AUC score, while the proposed DLSTM achieves 0.8905. In conclusion, the DLSTM approach is shown to be compatible with families of linear, quadratic, exponential, Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
The potential of Fourier transform infrared (FT-IR) transmission spectroscopy for determination of lead chrome green in green tea was investigated based on chemometric methods. Firstly, the qualitative analysis of lead chrome green in tea was performed based on partial least squares discriminant analysis (PLS-DA), and the correct rate of classification was 100%. And then, a hybrid method of interval partial least squares (iPLS) regression and successive projections algorithm (SPA) was proposed to select characteristic wavenumbers for the quantitative analysis of lead chrome green in green tea, and 19 wavenumbers were obtained finally. Among these wavenumbers, 1384 (C = C), 1456, 1438, 1419(C = N), and 1506 (CNH) cm-1 were the characteristic wavenumbers of lead chrome green. Then, these 19 wavenumbers were used to build determination models. The best model was achieved by least squares support vector machine (LS-SVM)algorithm with high coefficient of determination and low root-mean square error of prediction set (R2p = 0.864 and RMSEP = 0.291). All these results indicated the feasibility of IR spectra for detecting lead chrome green in green tea.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.