Machine learning is an established and frequently used technique in industry and academia, but a standard process model to improve success and efficiency of machine learning applications is still missing. Project organizations and machine learning practitioners face manifold challenges and risks when developing machine learning applications and have a need for guidance to meet business expectations. This paper therefore proposes a process model for the development of machine learning applications, covering six phases from defining the scope to maintaining the deployed machine learning application. Business and data understanding are executed simultaneously in the first phase, as both have considerable impact on the feasibility of the project. The next phases are comprised of data preparation, modeling, evaluation, and deployment. Special focus is applied to the last phase, as a model running in changing real-time environments requires close monitoring and maintenance to reduce the risk of performance degradation over time. With each task of the process, this work proposes quality assurance methodology that is suitable to address challenges in machine learning development that are identified in the form of risks. The methodology is drawn from practical experience and scientific literature, and has proven to be general and stable. The process model expands on CRISP-DM, a data mining process model that enjoys strong industry support, but fails to address machine learning specific tasks. The presented work proposes an industry- and application-neutral process model tailored for machine learning applications with a focus on technical tasks for quality assurance.
Integration of machine learning (ML) components in critical applications introduces novel challenges for software certification and verification. New safety standards and technical guidelines are under development to support the safety of ML-based systems, e.g., ISO 21448 SOTIF for the automotive domain and the Assurance of Machine Learning for use in Autonomous Systems (AMLAS) framework. SOTIF and AMLAS provide high-level guidance but the details must be chiseled out for each specific case. We initiated a research project with the goal to demonstrate a complete safety case for an ML component in an open automotive system. This paper reports results from an industry-academia collaboration on safety assurance of SMIRK, an ML-based pedestrian automatic emergency braking demonstrator running in an industry-grade simulator. We demonstrate an application of AMLAS on SMIRK for a minimalistic operational design domain, i.e., we share a complete safety case for its integrated ML-based component. Finally, we report lessons learned and provide both SMIRK and the safety case under an open-source license for the research community to reuse.
We propose a process model for the development of machine learning applications. It guides machine learning practitioners and project organizations from industry and academia with a checklist of tasks that spans the complete project life-cycle, ranging from the very first idea to the continuous maintenance of any machine learning application. With each task, we propose quality assurance methodology that is drawn from practical experience and scientific literature and that has proven to be general and stable enough to include them in best practices. We expand on CRISP-DM, a data mining process model that enjoys strong industry support but lacks to address machine learning specific tasks.
<p>Estimating the probability of a wildfire occurring at a specific location on a given day comes with the challenge that it not only depends to a high degree on weather conditions and soil moisture, but also on the presence of an ignition source [1].&#160;A commonly used index to assess wildfire risks is the Canadian Fire Weather Index [2], which does, however, not model the presence of an ignition source.&#160;</p><p>We develop a machine learning model which discriminates between (1) the probability of a wildfire occurring given an ignition source, and (2) the probability of an ignition source being present, and inferences both. We first demonstrate the performance of our approach by estimating these probabilities on simulated data. With these simulations, we also assess the robustness of our model to machine learning-related challenges that arise with wildfire data, such as extreme class imbalance and label uncertainty. We then show the performance of our model trained on satellite-derived global wildfire occurrences between 2001 and 2017. The dataset FireTracks, which includes a comprehensive record of wildfire occurrences [3], is used as ground truth. Input features include weather data (ERA5 [4]) and population densities (GPW4 [5]). Finally we compare wildfire risk ratings computed with the Canadian Fire Weather Index to the probabilities estimated by our model.</p><p><strong>References<br></strong>[1] K. Rao et al., SAR-enhanced mapping of live fuel moisture content, Remote Sensing of Environment, 2020.&#160;<br>[2] R. D. Field et al., Development of a Global Fire Weather Database. Natural Hazards and Earth System Sciences, 2015.&#160;<br>[3] D. Traxl, FireTracks Scientific Dataset, 2021. (https://github.com/dominiktraxl/firetracks)&#160;<br>[4] H. Hersbach et al., ERA5 hourly data on single levels from 1979 to present, Copernicus Climate Change Service (C3S) Climate Data Store (CDS), 2018.&#160;<br>[5] Center for International Earth Science Information Network - CIESIN - Columbia University, Gridded Population of the World, Version 4 (GPWv4): Population Density, NASA Socioeconomic Data and Applications Center (SEDAC), 2016.</p>
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.