Double Descent in Neural Networks: A New Challenge to Classical Machine Learning Theory

Share

In a recent talk by Yann LeCun, he touched upon the phenomenon of double descent, a concept I was not aware of.

The double⁠ descent⁠ phenomenon⁠ occurs in CNNs, ResNets, and transformers: performance first improves, then gets worse, and then improves again with increasing model size, data size, or training time. While this behavior appears to be fairly universal, we don’t yet fully understand why it happens. This goes against the traditional bias-variance tradeoff, where it was believed that increasing model complexity beyond a certain point only leads to worse performance. However, in modern neural networks, we see that after a dip in performance due to overfitting, larger models can recover and perform even better.

Yann likened this to historical innovations such as steam engines and airplanes, which were effective long before their underlying principles were understood. This analogy beautifully illustrates how neural networks can perform remarkably well even though we haven't fully deciphered their underlying mechanics.

In the paper by OpenAI, they show that the peak occurs at a “critical regime,” where the models are barely able to fit the training set. As we increase the number of parameters in a neural network, the test error initially decreases, increases, and, just as the model is able to fit the train set, undergoes a second descent.

Neither classical statisticians’ conventional wisdom that too large models are worse nor the modern ML paradigm that bigger models are better uphold. We find that double descent also occurs over train epochs. Surprisingly, they show these phenomena can lead to a regime where more data hurts, and training a deep network on a larger train set actually performs worse.

Paper : https://arxiv.org/abs/1912.02292
Open AI blog : https://openai.com/index/deep-double-descent/

Want to know more about AI ML Technology

Incorporate AI ML into your workflows to boost efficiency, accuracy, and productivity. Discover our artificial intelligence services.

Read More Blogs

View All

  • Head Office
  • #48, Bhive Premium Church st,
    Haridevpur, Shanthala Nagar,
    Ashok Nagar, Bengaluru - 560001
    Karnataka, India
  • Email
  • arjun@fastcode.ai
  • Phone
  • +91 85530 38132

© Copyright Fast Code AI 2024. All Rights Reserved

Get Free Consult Now!

Get Free Consult Now!

Say Hi!