Share
In a recent talk by Yann LeCun, he touched upon the phenomenon of double descent, a concept I was not aware of.
The double descent phenomenon occurs in CNNs, ResNets, and transformers: performance first improves, then gets worse, and then improves again with increasing model size, data size, or training time. While this behavior appears to be fairly universal, we don’t yet fully understand why it happens. This goes against the traditional bias-variance tradeoff, where it was believed that increasing model complexity beyond a certain point only leads to worse performance. However, in modern neural networks, we see that after a dip in performance due to overfitting, larger models can recover and perform even better.
Yann likened this to historical innovations such as steam engines and airplanes, which were effective long before their underlying principles were understood. This analogy beautifully illustrates how neural networks can perform remarkably well even though we haven't fully deciphered their underlying mechanics.
In the paper by OpenAI, they show that the peak occurs at a “critical regime,” where the models are barely able to fit the training set. As we increase the number of parameters in a neural network, the test error initially decreases, increases, and, just as the model is able to fit the train set, undergoes a second descent.
Neither classical statisticians’ conventional wisdom that too large models are worse nor the modern ML paradigm that bigger models are better uphold. We find that double descent also occurs over train epochs. Surprisingly, they show these phenomena can lead to a regime where more data hurts, and training a deep network on a larger train set actually performs worse.
Paper : https://arxiv.org/abs/1912.02292
Open AI blog : https://openai.com/index/deep-double-descent/
Incorporate AI ML into your workflows to boost efficiency, accuracy, and productivity. Discover our artificial intelligence services.
© Copyright Fast Code AI 2024. All Rights Reserved