Uniform convergence may be unable to explain generalization in deep learning

2023-05-16

本文价值：understand the limitations of u.c. based bounds / cast doubt on the power of u.c. bounds to fully explain generalization in DL

highlight that explaining the training-set-size dependence of the generalization error is apparently just as non-trivial as explaining its parameter-count dependence.
show that there are scenarios where all uniform convergence bounds, however cleverly applied, become vacuous.

Development of generalisation bound

stage 1：conventional u.c. bound

generalisation gap ≤ O ( representational complexity of whole hypothesis class training set size ) \text{generalisation gap} \leq O\Big(\sqrt{\frac{\text{representational complexity of whole hypothesis class}}{\text{training set size}}}\Big) generalisation gap≤O(training set sizerepresentational complexity of whole hypothesis class )
con: representational complexity ∝ \propto ∝ depth × \times ×width – too vacuous for deep networks

stage 2: refined u.c. bound – take into account the implicit bias of SGD (more broadly, algorithm-dependent) \text{\small -- take into account the implicit bias of SGD (more broadly, algorithm-dependent)} – take into account the implicit bias of SGD (more broadly, algorithm-dependent)

generalisation gap ≤ O ( representational complexity of ’relevant’ subset of hypothesis class training set size ) \text{generalisation gap} \leq O\Big(\sqrt{\frac{\text{representational complexity of 'relevant' subset of hypothesis class}}{\text{training set size}}}\Big) generalisation gap≤O(training set sizerepresentational complexity of ’relevant’ subset of hypothesis class )
con: 1. too large or grow with parameter count; 2. the ones that are small hold only on modified networks like compression, explicit regularisation, randomisation; 3. (本文) generalisation bound increases with training set size; 4. (本文) in certain situations of DL, any refined u.c. bound would be vacuous in the following sense:
generalization bound even though this is small ( ≈ 0 ) ≤ any refined u.c. bound this will be vacuous ( ≈ 1 ) \underset{\text{even though this is small } (\approx 0)}{\text{generalization bound}} \leq \underset{\text{this will be vacuous } (\approx 1)}{\text{any refined u.c. bound}} even though this is small (≈0)generalization bound≤this will be vacuous (≈1)any refined u.c. bound

proof of 4

Propose a concept of ‘tightest uniform convergence’ and prove that this bound would become vacuous.
Step 1: create a new dataset S ′ S' S′ by projecting training data onto opposite hypersphere and flip the label
Step 2: even though the test error and the training error are both very small, the dataset S ′ S' S′ is completely misclassified, indicating that the complexity of decision boundary is high enough to memorise skews in the locations of training datapoints.
Step 3: mathematically show that the tightest u.c. bound is lower bounded by this inherent complexity and thus is vacuous.

对generalisation bound的反思

generalisation bound should at least show satisfactory dependence on parameter count and training set size.

参考文献

Nagarajan, Vaishnavh, and J. Zico Kolter. “Uniform convergence may be unable to explain generalization in deep learning.” Advances in Neural Information Processing Systems. 2019.
http://www.cs.cmu.edu/~vaishnan/talks/neurips19_uc_slides.pdf

本文内容由网友自发贡献，版权归原作者所有，本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容，请联系:hwhale#tublm.com(使用前将#替换为@)