A neural network-based method (CANYON: CArbonate system and Nutrients concentration from hYdrological properties and Oxygen using a Neural-network) was developed to estimate water-column (i.e., from surface to 8,000 m depth) biogeochemically relevant variables in the Global Ocean. These are the concentrations of three nutrients [nitrate (NO 3 − ), phosphate (PO 4 3− ), and silicate (Si(OH) 4 )] and four carbonate system parameters [total alkalinity (A T ), dissolved inorganic carbon (C T ), pH (pH T ), and partial pressure of CO 2 (pCO 2 )], which are estimated from concurrent in situ measurements of temperature, salinity, hydrostatic pressure, and oxygen (O 2 ) together with sampling latitude, longitude, and date. Seven neural-networks were developed using the GLODAPv2 database, which is largely representative of the diversity of open-ocean conditions, hence making CANYON potentially applicable to most oceanic environments. For each variable, CANYON was trained using 80 % randomly chosen data from the whole database (after eight 10 • × 10 • zones removed providing an "independent dataset" for additional validation), the remaining 20 % data were used for the neural-network test of validation. Overall, CANYON retrieved the variables with high accuracies (RMSE): 1.04 µmol kg −1 (NO 3 − ), 0.074 µmol kg −1 (PO 4 3− ), 3.2 µmol kg −1 (Si(OH) 4 ), 0.020 (pH T ), 9 µmol kg −1 (A T ), 11 µmol kg −1 (C T ) and 7.6 % (pCO 2 ) (30 µatm at 400 µatm). This was confirmed for the eight independent zones not included in the training process. CANYON was also applied to the Hawaiian Time Series site to produce a 22 years long simulated time series for the above seven variables. Comparison of modeled and measured data was also very satisfactory (RMSE in the order of magnitude of RMSE from validation test). CANYON is thus a promising method to derive distributions of key biogeochemical variables. It could be used for a variety of global and regional applications ranging from data quality control to the production of datasets of variables required for initialization and Sauzède et al.
Nutrients and Carbonate System from T/S/O2validation of biogeochemical models that are difficult to obtain. In particular, combining the increased coverage of the global Biogeochemical-Argo program, where O 2 is one of the core variables now very accurately measured, with the CANYON approach offers the fascinating perspective of obtaining large-scale estimates of key biogeochemical variables with unprecedented spatial and temporal resolutions. The Matlab and R codes of the proposed algorithms are provided as Supplementary Material.