Diogo Seca scite author profile

Machine learning contrasts with traditional software development in that the oracle is the data, and the data is not always a correct representation of the problem that machine learning tries to model. We present a survey of the oracle issues found in machine learning and state-of-the-art solutions for dealing with these issues. These include lines of research for differential testing, metamorphic testing, and test coverage. We also review some recent improvements to robustness during modeling that reduce the impact of oracle issues, as well as tools and frameworks for assisting in testing and discovering issues specific to the dataset.

show abstract

TimeGym: Debugging for Time Series Modeling in Python

Seca¹

2021

Preprint

View full text Add to dashboard Cite

We introduce the TimeGym Forecasting Debugging Toolkit, a Python library for testing and debugging time series forecasting pipelines. Tim-eGym simplifies the testing forecasting pipeline by providing generic tests for forecasting pipelines fresh out of the box. These tests are based on common modeling challenges of time series. Our library enables forecasters to apply a Test-Driven Development approach to forecast modeling, using specified oracles to generate artificial data with noise.

show abstract

Estimating the Likelihood of Financial Behaviours Using Nearest Neighbors

et al. 2023

View full text Add to dashboard Cite

As many automated algorithms find their way into the IT systems of the banking sector, having a way to validate and interpret the results from these algorithms can lead to a substantial reduction in the risks associated with automation. Usually, validating these pricing mechanisms requires human resources to manually analyze and validate large quantities of data. There is a lack of effective methods that analyze the time series and understand if what is currently happening is plausible based on previous data, without information about the variables used to calculate the price of the asset. This paper describes an implementation of a process that allows us to validate many data points automatically. We explore the K-Nearest Neighbors algorithm to find coincident patterns in financial time series, allowing us to detect anomalies, outliers, and data points that do not follow normal behavior. This system allows quicker detection of defective calculations that would otherwise result in the incorrect pricing of financial assets. Furthermore, our method does not require knowledge about the variables used to calculate the time series being analyzed. Our proposal uses pattern matching and can validate more than 58% of instances, substantially improving human risk analysts’ efficiency. The proposal is completely transparent, allowing analysts to understand how the algorithm made its decision, increasing the trustworthiness of the method.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Diogo Seca

Benchmark of Encoders of Nominal Features for Regression

A Review on Oracle Issues in Machine Learning

TimeGym: Debugging for Time Series Modeling in Python

Estimating the Likelihood of Financial Behaviours Using Nearest Neighbors

Contact Info

Product

Resources

About