Experiments with an extended tangent distance

Keysers, Daniel; Dahmen, Jörg; Theiner, T.; Ney, Hermann

doi:10.1109/icpr.2000.906014

Cited by 52 publications

(32 citation statements)

References 8 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…It will be seen that with 54-hidden layers, our model achieves a state-of-the-art performance; that is, an error rate of 2.69%, surpassing the conventional ResNet (baseline model). In addition, Fig.3right shows the performance of the best proposed model (54 hidden layer S-ResNet) Invariant vector supports [20] 3.00 Neural network (LetNet) [21] 4.20 Sparse Large Margin Classifiers (SLMC) [22] 4.90 Incrementally Built Dictionary Learning (IBDL-C) [23] 3.99 Neural network + boosting [21] * 2.60 Tangent distance [24] * 2.50 Human performance [24] 2.50 Kernel density + virtual data [25] * 2.40 Kernel density + virtual data + classifier combination [25] * 2.20 Nearest neighbour [25] 5. Table 1.…”

Section: Experiments and Discussionmentioning

confidence: 99%

Training Very Deep Networks via Residual Learning with Stochastic Input Shortcut Connections

Oyedotun

Shabayek

Aouada

et al. 2017

Neural Information Processing

View full text Add to dashboard Cite

Abstract. Many works have posited the benefit of depth in deep networks. However, one of the problems encountered in the training of very deep networks is feature reuse; that is, features are 'diluted' as they are forward propagated through the model. Hence, later network layers receive less informative signals about the input data, consequently making training less effective. In this work, we address the problem of feature reuse by taking inspiration from an earlier work which employed residual learning for alleviating the problem of feature reuse. We propose a modification of residual learning for training very deep networks to realize improved generalization performance; for this, we allow stochastic shortcut connections of identity mappings from the input to hidden layers. We perform extensive experiments using the USPS and MNIST datasets. On the USPS dataset, we achieve an error rate of 2.69% without employing any form of data augmentation (or manipulation). On the MNIST dataset, we reach a comparable state-of-the-art error rate of 0.52%. Particularly, these results are achieved without employing any explicit regularization technique.

show abstract

Section: Experiments and Discussionmentioning

confidence: 99%

Training Very Deep Networks via Residual Learning with Stochastic Input Shortcut Connections

Oyedotun

Shabayek

Aouada

et al. 2017

Neural Information Processing

View full text Add to dashboard Cite

show abstract

“…To address real-world applicability, we tested our approach on classifying the raw US-Postal-Service (USPS) Method Error rate [%] SVM, no invariance [9] 4.0 SVM, VSV-method [9] 3.2 TD + kernel densities [6] 2.4 Human Performance [12] 2.5 by padding out-of-image pixels with zero and performing bilinear interpolation.…”

Section: Classification Of Usps Datamentioning

confidence: 99%

Adjustable invariant features by partial Haar-integration

Haasdonk¹,

Halawani²,

Burkhardt³

2004

Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004.

View full text Add to dashboard Cite

show abstract

“…The invariant distance computation is then based on minimizing the distance between the sets of transformed samples. Similar formalizations of such distances are widely available (Vasconcelos and Lippman 1998;Simard et al 1998;Keysers et al 2000). In particular this notion of invariant distance covers many specific examples in literature.…”

Section: Invariant Distance Substitution Kernelsmentioning

confidence: 99%

“…If in other cases the set T is composed of exponentially many combinations of transformations, which operate locally on independent parts of the pattern x, efficient computation can be performed by sequentially addressing the different object parts and their local transformations. An example for this is the IDM (Keysers et al 2000) for images, which can be evaluated in complexity growing linear in the number of pixels. If in other cases the assumption of linear representation of the sets T x holds, projection methods can be used to perform the exact minimization over infinitely many transformations very efficiently.…”

Section: Invariant Distance Substitution Kernelsmentioning

confidence: 99%

“…In particular this notion of invariant distance covers many specific examples in literature. In particular tangent distance (TD) (Simard et al 1993), the image distortion model (IDM) (Keysers et al 2000), general deformation models (Keysers et al 2004), dynamic time warping (DTW) (Rabiner and Juang 1993), Fréchet distance (Alt and Guibas 1999), invariant distances between point sets (Werman and Weinshall 1995) or two-sided manifold distance (Fitzgibbon and Zisserman 2003).…”

Section: Invariant Distance Substitution Kernelsmentioning

confidence: 99%

See 1 more Smart Citation

Invariant kernel functions for pattern analysis and machine learning

Haasdonk

Burkhardt

2007

Mach Learn

View full text Add to dashboard Cite

In many learning problems prior knowledge about pattern variations can be formalized and beneficially incorporated into the analysis system. The corresponding notion of invariance is commonly used in conceptionally different ways. We propose a more distinguishing treatment in particular in the active field of kernel methods for machine learning and pattern analysis. Additionally, the fundamental relation of invariant kernels and traditional invariant pattern analysis by means of invariant representations will be clarified. After addressing these conceptional questions, we focus on practical aspects and present two generic approaches for constructing invariant kernels. The first approach is based on a technique called invariant integration. The second approach builds on invariant distances. In principle, our approaches support general transformations in particular covering discrete and non-group or even an infinite number of pattern-transformations. Additionally, both enable a smooth interpolation between invariant and non-invariant pattern analysis, i.e. they are a covering general framework. The wide applicability and various possible benefits of invariant kernels are demonstrated in different kernel methods.

show abstract

Experiments with an extended tangent distance

Abstract: Abstract

Cited by 52 publications

References 8 publications

Training Very Deep Networks via Residual Learning with Stochastic Input Shortcut Connections

Training Very Deep Networks via Residual Learning with Stochastic Input Shortcut Connections

Adjustable invariant features by partial Haar-integration

Invariant kernel functions for pattern analysis and machine learning

Contact Info

Product

Resources

About