Uniform convergence may be unable to explain generalization in deep learning
2023-05-16
本文价值:understand the limitations of u.c. based bounds / cast doubt on the power of u.c. bounds to fully explain generalization in DL
highlight that explaining the training-set-size dependence of the generalization error is apparently just as non-trivial as explaining its parameter-count dependence.
show that there are scenarios where all uniform convergence bounds, however cleverly applied, become vacuous.
Development of generalisation bound
stage 1:conventional u.c. bound
generalisation gap
≤
O
(
representational complexity of whole hypothesis class
training set size
)
\text{generalisation gap} \leq O\Big(\sqrt{\frac{\text{representational complexity of whole hypothesis class}}{\text{training set size}}}\Big)
generalisation gap≤O(training set sizerepresentational complexity of whole hypothesis class) con: representational complexity
∝
\propto
∝ depth
×
\times
×width – too vacuous for deep networks
stage 2: refined u.c. bound
– take into account the implicit bias of SGD (more broadly, algorithm-dependent)
\text{\small -- take into account the implicit bias of SGD (more broadly, algorithm-dependent)}
– take into account the implicit bias of SGD (more broadly, algorithm-dependent)
generalisation gap
≤
O
(
representational complexity of ’relevant’ subset of hypothesis class
training set size
)
\text{generalisation gap} \leq O\Big(\sqrt{\frac{\text{representational complexity of 'relevant' subset of hypothesis class}}{\text{training set size}}}\Big)
generalisation gap≤O(training set sizerepresentational complexity of ’relevant’ subset of hypothesis class) con: 1. too large or grow with parameter count; 2. the ones that are small hold only on modified networks like compression, explicit regularisation, randomisation; 3. (本文) generalisation bound increases with training set size; 4. (本文) in certain situations of DL, any refined u.c. bound would be vacuous in the following sense:
generalization bound
even though this is small
(
≈
0
)
≤
any refined u.c. bound
this will be vacuous
(
≈
1
)
\underset{\text{even though this is small } (\approx 0)}{\text{generalization bound}} \leq \underset{\text{this will be vacuous } (\approx 1)}{\text{any refined u.c. bound}}
even though this is small (≈0)generalization bound≤this will be vacuous (≈1)any refined u.c. bound
proof of 4
Propose a concept of ‘tightest uniform convergence’ and prove that this bound would become vacuous. Step 1: create a new dataset
S
′
S'
S′ by projecting training data onto opposite hypersphere and flip the label Step 2: even though the test error and the training error are both very small, the dataset
S
′
S'
S′ is completely misclassified, indicating that the complexity of decision boundary is high enough to memorise skews in the locations of training datapoints. Step 3: mathematically show that the tightest u.c. bound is lower bounded by this inherent complexity and thus is vacuous.
对generalisation bound的反思
generalisation bound should at least show satisfactory dependence on parameter count and training set size.
参考文献
Nagarajan, Vaishnavh, and J. Zico Kolter. “Uniform convergence may be unable to explain generalization in deep learning.” Advances in Neural Information Processing Systems. 2019. http://www.cs.cmu.edu/~vaishnan/talks/neurips19_uc_slides.pdf