Shallow Univariate ReLU Networks as Splines: Initialization, Loss Surface, Hessian, and Gradient Flow Dynamics

Sahs, Justin; Pyle, Ryan; Damaraju, Aneel; Ortega, Josue; Tavaslioglu, Onur; Lu, Andy; Anselmi, Fabio; Patel, Ankit

doi:10.3389/frai.2022.889981

Cited by 6 publications

(5 citation statements)

References 39 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The idea behind the implicit regularization is that the loss landscape of a network has many minima, and which minimum one converges to after training depends on many factors, including the choice of model architecture and parametrization [37], [38], the initialization scheme [39] and the optimization algorithm [40], [41], [42]. The implicit regularization of state-of-the-art models has been shown to play a critical role in the generalization of deep neural networks [43], [44].…”

Section: Implicit Regularization and Data Augmentationmentioning

confidence: 99%

“…• Subsection III-C analyzes how different data augmentations pose constraints for the learned weights in Fourier domain. This can be seen as an aspect of the so called network implicit bias (see [37], [38], [39], [40], [41], [42], [43], [44]). In section IV we test our theoretical results in a simple task of classification on MNIST.…”

Section: Introduction and Previous Workmentioning

confidence: 99%

See 1 more Smart Citation

Data Symmetries and Learning in Fully Connected Neural Networks

et al. 2023

View full text Add to dashboard Cite

Symmetries in the data and how they constrain the learned weights of modern deep networks is still an open problem. In this work we study the simple case of fully connected shallow non-linear neural networks and consider two types of symmetries: full dataset symmetries where the dataset X is mapped into itself by any transformation g, i.e. gX = X or single data point symmetries where gx = x, x ∈ X . We prove and experimentally confirm that symmetries in the data are directly inherited at the level of the network's learned weights and relate these findings with the common practice of data augmentation in modern machine learning. Finally, we show how symmetry constraints have a profound impact on the spectrum of the learned weights, an aspect of the so-called network implicit bias.

show abstract

Section: Implicit Regularization and Data Augmentationmentioning

confidence: 99%

Section: Introduction and Previous Workmentioning

confidence: 99%

Data Symmetries and Learning in Fully Connected Neural Networks

et al. 2023

View full text Add to dashboard Cite

show abstract

“…Neural models are able to represent complex, non-linear functions with reasonable computational costs. More recently, kernelized version of NNs have been developed that restrict the massive expressive power of NNs while still capturing nonlinear relationships, and also control the smoothness of the resulting predictive models 14 .…”

Section: Description Of Machine Learningmentioning

confidence: 99%

Multiomics, artificial intelligence, and precision medicine in perinatology

2022

View full text Add to dashboard Cite

Technological advances in omics evaluation, bioinformatics and artificial intelligence have made us rethink ways to improve patient outcomes. Collective quantification and characterization of biological data including genomics, epigenomics, metabolomics and proteomics is now feasible at low cost with rapid turnover. Significant advances in the integration methods of these multi-omics datasets by machine learning promises us a holistic view of disease pathogenesis and yield biomarkers for disease diagnosis and prognosis. Using machine learning tools and algorithms, it is possible to integrate multiomics data with clinical information to develop predictive models that identify risk before the condition is clinically apparent, thus facilitating early interventions to improve the health trajectories of the patients. In this review, we intend to update the readers on the recent developments related to the use of artificial intelligence in integrating multiomic and clinical datasets in the field of Perinatology, focusing on neonatal intensive care and the opportunities for precision medicine. We intend to briefly discuss the potential negative societal and ethical consequences of using artificial intelligence in healthcare. We are poised for a new era in medicine where computational analysis of biological and clinical datasets will make precision medicine a reality.

show abstract

“…In addressing the aforementioned question, we adopt a similar, but more general, approach that relies on the concept of “implicit bias.” Implicit bias in machine learning refers to the phenomenon where the training process of an overparameterized network, influenced by factors including the choice of model architecture and parametrization (Gunasekar et al, 2018 ; Yun et al, 2020 ), the initialization scheme (Sahs et al, 2020a ), and the optimization algorithm (Williams et al, 2019 ; Sahs et al, 2020b ; Woodworth et al, 2020 ), naturally favors certain solutions or patterns over others, even in the absence of explicit bias in the training data. The implicit bias of state-of-the-art models has been shown to play a critical role in the generalization of deep neural networks (Arora et al, 2019 ; Li et al, 2019 ).…”

Section: Introductionmentioning

confidence: 99%

Translational symmetry in convolutions with localized kernels causes an implicit bias toward high frequency adversarial examples

Caro,

Ju,

Pyle

et al. 2024

Front. Comput. Neurosci.

View full text Add to dashboard Cite

Adversarial attacks are still a significant challenge for neural networks. Recent efforts have shown that adversarial perturbations typically contain high-frequency features, but the root cause of this phenomenon remains unknown. Inspired by theoretical work on linear convolutional models, we hypothesize that translational symmetry in convolutional operations together with localized kernels implicitly bias the learning of high-frequency features, and that this is one of the main causes of high frequency adversarial examples. To test this hypothesis, we analyzed the impact of different choices of linear and non-linear architectures on the implicit bias of the learned features and adversarial perturbations, in spatial and frequency domains. We find that, independently of the training dataset, convolutional operations have higher frequency adversarial attacks compared to other architectural parameterizations, and that this phenomenon is exacerbated with stronger locality of the kernel (kernel size) end depth of the model. The explanation for the kernel size dependence involves the Fourier Uncertainty Principle: a spatially-limited filter (local kernel in the space domain) cannot also be frequency-limited (local in the frequency domain). Using larger convolution kernel sizes or avoiding convolutions (e.g., by using Vision Transformers or MLP-style architectures) significantly reduces this high-frequency bias. Looking forward, our work strongly suggests that understanding and controlling the implicit bias of architectures will be essential for achieving adversarial robustness.

show abstract

Shallow Univariate ReLU Networks as Splines: Initialization, Loss Surface, Hessian, and Gradient Flow Dynamics

Cited by 6 publications

References 39 publications

Data Symmetries and Learning in Fully Connected Neural Networks

Data Symmetries and Learning in Fully Connected Neural Networks

Multiomics, artificial intelligence, and precision medicine in perinatology

Translational symmetry in convolutions with localized kernels causes an implicit bias toward high frequency adversarial examples

Contact Info

Product

Resources

About