TemporAI is an open source Python software library for machine learning (ML) tasks involving data with a time component, focused on medicine and healthcare use cases. It supports data in time series, static, and eventmodalities and provides an interface for prediction, causal inference, and time-to-event analysis, as well as common preprocessing utilities and model interpretability methods. The library aims to facilitate innovation in the medical ML space by offering a standardized temporal setting toolkit for model development, prototyping and benchmarking, bridging the gaps in the ML research, healthcare professional, medical/pharmacological industry, and data science communities. TemporAI is available on GitHub 1 and we welcome community engagement through use, feedback, and code contributions.
Keywords Machine Learning • Time Series • Medicine 1 Time domain is crucial for ML in medicineData with a time component 2 are ubiquitous in modern healthcare and medicine: from patient electronic health records (EHRs) [1], to data streams from Internet-of-Things (IoT) devices and consumer wearables [2], to large public health datasets [3], naming just a few key growing areas. In fact, since patient information is typically associated with a particular time point, the vast majority of healthcare data is temporal, and may be viewed as a time series. Furthermore, availability of open access data in this field is also improving [4,5,6], attracting significant attention from the artificial intelligence (AI), machine learning (ML) and deep learning (DL) research, as well as the medical data science communities [7,8,9]. A such, it is evident that the temporal setting is becoming the cornerstone for ML in healthcare and medicine, with a significant potential for impact.Numerous novel methods have been developed to tackle medically-relevant tasks in the time domain, such as: prediction [10,11], causal inference [12,13,14], time-to-event analysis [15,16,17], clustering [18,19] 3 , as well as data imputation [20,21], and model interpretability [22,23] methods, among others. Yet currently a significant limitation exists in the lack of standardization of both data representation and model benchmarking [7,9]. TemporAI addresses these limitations as the first toolkit for development, prototyping and benchmarking of ML models on medically-relevant tasks with time series, static, and eventdata modalities.1 https://github.com/vanderschaarlab/temporai 2 Depending on the context, referred to alternatively as: temporal, longitudinal, or time series data. 3 In medical and other contexts, these tasks may also be referred to as, respectively: forecasting, (individualized) treatment effect estimation, survival analysis, phenotyping. The descriptor "temporal" may be used to contrast with the static task setting.