As surface temperatures are expected to rise in the future, ice-rich permafrost may thaw, altering soil topography and hydrology and creating a mosaic of wet and dry soil surfaces in the Arctic. Arctic wetlands are large sources of CH , and investigating effects of soil hydrology on CH fluxes is of great importance for predicting ecosystem feedback in response to climate change. In this study, we investigate how a decade-long drying manipulation on an Arctic floodplain influences CH -associated microorganisms, soil thermal regimes, and plant communities. Moreover, we examine how these drainage-induced changes may then modify CH fluxes in the growing and nongrowing seasons. This study shows that drainage substantially lowered the abundance of methanogens along with methanotrophic bacteria, which may have reduced CH cycling. Soil temperatures of the drained areas were lower in deep, anoxic soil layers (below 30 cm), but higher in oxic topsoil layers (0-15 cm) compared to the control wet areas. This pattern of soil temperatures may have reduced the rates of methanogenesis while elevating those of CH oxidation, thereby decreasing net CH fluxes. The abundance of Eriophorum angustifolium, an aerenchymatous plant species, diminished significantly in the drained areas. Due to this decrease, a higher fraction of CH was alternatively emitted to the atmosphere by diffusion, possibly increasing the potential for CH oxidation and leading to a decrease in net CH fluxes compared to a control site. Drainage lowered CH fluxes by a factor of 20 during the growing season, with postdrainage changes in microbial communities, soil temperatures, and plant communities also contributing to this reduction. In contrast, we observed CH emissions increased by 10% in the drained areas during the nongrowing season, although this difference was insignificant given the small magnitudes of fluxes. This study showed that long-term drainage considerably reduced CH fluxes through modified ecosystem properties.
The time series classification literature has expanded rapidly over the last decade, with many new classification approaches published each year. Prior research has mostly focused on improving the accuracy and efficiency of classifiers, with interpretability being somewhat neglected. This aspect of classifiers has become critical for many application domains and the introduction of the EU GDPR legislation in 2018 is likely to further emphasize the importance of interpretable learning algorithms. Currently, state-of-the-art classification accuracy is achieved with very complex models based on large ensembles (COTE) or deep neural networks (FCN). These approaches are not efficient with regard to either time or space, are difficult to interpret and cannot be applied to variable-length time series, requiring pre-processing of the original series to a set fixedlength. In this paper we propose new time series classification algorithms to address these gaps. Our approach is based on symbolic representations of time series, efficient sequence mining algorithms and linear classification models. Our linear models are as accurate as deep learning models but are more efficient regarding running time and memory, can work with variable-length time series and can be interpreted by highlighting the discriminative symbolic features on the original time series. We advance the state-of-the-art in time series classification by proposing new algorithms built using the following three key ideas: (1) Multiple resolutions of symbolic representations: we combine symbolic representations obtained using different parameters, rather than one fixed representation (e.g., multiple SAX representations); (2) Multiple domain representations: we combine symbolic representations in time (e.g., SAX) and frequency (e.g., SFA) domains, to be more robust across problem types; (3) Efficient navigation in a huge symbolic-words space: we extend a symbolic sequence classifier (SEQL) to work with multiple symbolic representations and use its greedy feature selection strategy to effectively filter the best features for each representation. We show that our multi-resolution multi-domain linear classifier (mtSS-SEQL+LR) achieves a similar accuracy to the state-of-the-art COTE ensemble, and to recent deep learning methods (FCN, ResNet), but uses a fraction of the time and memory required by either COTE or deep models. To further analyse the interpretability of our classifier, we present a case study on a human motion dataset collected by the authors. We discuss the accuracy, efficiency and interpretability of our proposed algorithms and release all the results, source code and data to encourage reproducibility.
Accurate modelling of land-atmosphere carbon fluxes is essential for future climate projections. However, the exact responses of carbon cycle processes to climatic drivers often remain uncertain. Presently, knowledge derived from experiments complemented with a steadily evolving body of mechanistic theory provides the main basis for developing the respective models. The strongly increasing availability of measurements may complicate the traditional hypothesis driven path to developing mechanistic models, but it may facilitate new ways of identifying suitable model structures using machine learning as well. Here we explore the potential to derive model formulations automatically from data based on gene expression programming (GEP). GEP automatically (re)combines various mathematical operators to model formulations that are further evolved, eventually identifying the most suitable structures. In contrast to most other machine learning regression techniques, the GEP approach generates models that allow for prediction and possibly for interpretation. Our study is based on two cases: artificially generated data and real observations. Simulations based on artificial data show that GEP is successful in identifying prescribed functions with the prediction capacity of the models comparable to four state-of-the-art machine learning methods (Random Forests, Support Vector Machines, Artificial Neural Networks, and Kernel Ridge Regressions). The case of real observations explores different components of terrestrial respiration at an oak forest in south-east England. We find that GEP retrieved models are often better in prediction than established respiration models. Furthermore, the structure of the GEP models offers new insights to driver selection and interactions. We find previously unconsidered exponential dependencies of respiration on seasonal ecosystem carbon assimilation and water dynamics. However, we also noticed that the GEP models are only partly portable across respiration components; equifinality issues possibly preventing the identification of a "general" terrestrial respiration model. Overall, GEP is a promising tool to uncover new model structures for terrestrial ecology in the data rich era, complementing the traditional approach of model building
Nowadays, observing, recording, and modeling the dynamics of atmospheric pollutants represent actual study areas given the effects of pollution on the population and ecosystems. The existence of aberrant values may influence reports on air quality when they are based on average values over a period. This may also influence the quality of models, which are further used in forecasting. Therefore, correct data collection and analysis is necessary before modeling. This study aimed to detect aberrant values in a nitrogen oxide concentration series recorded in the interval 1 January–8 June 2016 in Timisoara, Romania, and retrieved from the official reports of the National Network for Monitoring the Air Quality, Romania. Four methods were utilized, including the interquartile range (IQR), isolation forest, local outlier factor (LOF) methods, and the generalized extreme studentized deviate (GESD) test. Autoregressive integrated moving average (ARIMA), Generalized Regression Neural Networks (GRNN), and hybrid ARIMA-GRNN models were built for the series before and after the removal of aberrant values. The results show that the first approach provided a good model (from a statistical viewpoint) for the series after the anomalies removal. The best model was obtained by the hybrid ARIMA-GRNN. For example, for the raw NO2 series, the ARIMA model was not statistically validated, whereas, for the series without outliers, the ARIMA(1,1,1) was validated. The GRNN model for the raw series was able to learn the data well: R2 = 76.135%, the correlation between the actual and predicted values (rap) was 0.8778, the mean standard errors (MSE) = 0.177, the mean absolute error MAE = 0.2839, and the mean absolute percentage error MAPE = 9.9786. Still, on the test set, the results were worse: MSE = 1.5101, MAE = 0.8175, rap = 0.4482. For the series without outliers, the model was able to learn the data in the training set better than for the raw series (R2 = 0.996), whereas, on the test set, the results were not very good (R2 = 0.473). The performances of the hybrid ARIMA–GRNN on the initial series were not satisfactory on the test (the pattern of the computed values was almost linear) but were very good on the series without outliers (the correlation between the predicted values on the test set was very close to 1). The same was true for the models built for O3.
The time series classification literature has expanded rapidly over the last decade, with many new classification approaches published each year. The research focus has mostly been on improving the accuracy and efficiency of classifiers, while their interpretability has been somewhat neglected. Classifier interpretability has become a critical constraint for many application domains and the introduction of the 'right to explanation' GDPR EU legislation in May 2018 is likely to further emphasize the importance of explainable learning algorithms. In this work we analyse the state-of-the-art for time series classification, and propose new algorithms that aim to maintain the classifier accuracy and efficiency, but keep interpretability as a key design constraint. We present new time series classification algorithms that advance the state-of-the-art by implementing the following three key ideas: (1) Multiple resolutions of symbolic approximations: we combine symbolic representations obtained using different parameters, rather than one fixed representation (e.g., multiple SAX representations); (2) Multiple domain representations: we combine symbolic approximations in time (e.g., SAX) and frequency (e.g., SFA) domains, to be more robust across problem domains; (3) Efficient navigation of a huge symbolic-words space: we adapt a symbolic sequence classifier named SEQL, to make it work with multiple domain representations (e.g., SAX-SEQL, SFA-SEQL), and use its greedy feature selection strategy to effectively filter the best features for each representation. We show that a multi-resolution multi-domain linear classifier, SAX-SFA-SEQL, achieves a similar accuracy to the state-of-the-art COTE ensemble, and to a recent deep learning method (FCN), but uses a fraction of the time required by either COTE or FCN. We discuss the accuracy, efficiency and interpretability of our proposed algorithms. To further analyse the interpretability aspect of our classifiers, we present a case study on an ecology benchmark.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.