Search citation statements
Paper Sections
Citation Types
Year Published
Publication Types
Relationship
Authors
Journals
Justifiably, while big data is the primary interest of research and public discourse, it is essential to acknowledge that small data remains prevalent. The same technological and societal forces that generate big datasets also produce a more significant number of small datasets. Contrary to the notion that more data is inherently superior, real-world constraints such as budget limitations and increased analytical complexity present critical challenges. Quality versus quantity trade-offs necessitate strategic decision-making, where small data often leads to quicker, more accurate, and cost-effective insights. Concentrating AI research, particularly in deep learning (DL), on big datasets exacerbates AI inequality, as tech giants such as Meta, Amazon, Apple, Netflix and Google (MAANG) can easily lead AI research due to their access to vast datasets, creating a barrier for small and mid-sized enterprises that lack similar access. This article addresses this imbalance by exploring DL techniques optimized for small datasets, offering a comprehensive review of historic and state-of-the-art DL models developed specifically for small datasets. This study aims to highlight the feasibility and benefits of these approaches, promoting a more inclusive and equitable AI landscape. Through a PRISMA-based literature search, 175+ relevant articles are identified and subsequently analysed based on various attributes, such as publisher, country, utilization of small dataset technique, dataset size, and performance. This article also delves into current DL models and highlights open research problems, offering recommendations for future investigations. Additionally, the article highlights the importance of developing DL models that effectively utilize small datasets, particularly in domains where data acquisition is difficult and expensive.
Justifiably, while big data is the primary interest of research and public discourse, it is essential to acknowledge that small data remains prevalent. The same technological and societal forces that generate big datasets also produce a more significant number of small datasets. Contrary to the notion that more data is inherently superior, real-world constraints such as budget limitations and increased analytical complexity present critical challenges. Quality versus quantity trade-offs necessitate strategic decision-making, where small data often leads to quicker, more accurate, and cost-effective insights. Concentrating AI research, particularly in deep learning (DL), on big datasets exacerbates AI inequality, as tech giants such as Meta, Amazon, Apple, Netflix and Google (MAANG) can easily lead AI research due to their access to vast datasets, creating a barrier for small and mid-sized enterprises that lack similar access. This article addresses this imbalance by exploring DL techniques optimized for small datasets, offering a comprehensive review of historic and state-of-the-art DL models developed specifically for small datasets. This study aims to highlight the feasibility and benefits of these approaches, promoting a more inclusive and equitable AI landscape. Through a PRISMA-based literature search, 175+ relevant articles are identified and subsequently analysed based on various attributes, such as publisher, country, utilization of small dataset technique, dataset size, and performance. This article also delves into current DL models and highlights open research problems, offering recommendations for future investigations. Additionally, the article highlights the importance of developing DL models that effectively utilize small datasets, particularly in domains where data acquisition is difficult and expensive.
Credit scoring models are critical for financial institutions to assess borrower risk and maintain profitability. Although machine learning models have improved credit scoring accuracy, imbalanced class distributions remain a major challenge. The widely used Synthetic Minority Oversampling TEchnique (SMOTE) struggles with high-dimensional, non-linear data and may introduce noise through class overlap. Generative Adversarial Networks (GANs) have emerged as an alternative, offering the ability to model complex data distributions. Conditional Wasserstein GANs (cWGANs) have shown promise in handling both numerical and categorical features in credit scoring datasets. However, research on extracting latent features from non-linear data and improving model explainability remains limited. To address these challenges, this paper introduces the Non-parametric Oversampling Technique for Explainable credit scoring (NOTE). The NOTE offers a unified approach that integrates a Non-parametric Stacked Autoencoder (NSA) for capturing non-linear latent features, cWGAN for oversampling the minority class, and a classification process designed to enhance explainability. The experimental results demonstrate that NOTE surpasses state-of-the-art oversampling techniques by improving classification accuracy and model stability, particularly in non-linear and imbalanced credit scoring datasets, while also enhancing the explainability of the results.
Fringe profilometry is a method that obtains the 3D information of objects by projecting a pattern of fringes. The three-step technique uses only three images to acquire the 3D information from an object, and many studies have been conducted to improve this technique. However, there is a problem that is inherent to this technique, and that is the quasi-periodic noise that appears due to this technique and considerably affects the final 3D object reconstructed. Many studies have been carried out to tackle this problem to obtain a 3D object close to the original one. The application of deep learning in many areas of research presents a great opportunity to to reduce or eliminate the quasi-periodic noise that affects images. Therefore, a model of convolutional neural network along with four different patterns of frequencies projected in the three-step technique is researched in this work. The inferences produced by models trained with different frequencies are compared with the original ones both qualitatively and quantitatively.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.