© SuperDataScienceDeep Learning A-Z
© SuperDataScienceDeep Learning A-Z
Image Source: people.idsia.ch
© SuperDataScienceDeep Learning A-Z
Image Source: Université Montréal
© SuperDataScienceDeep Learning A-Z
Image Source: recode.net
© SuperDataScienceDeep Learning A-Z
C
ŷ
C = ½(ŷ- y)2
© SuperDataScienceDeep Learning A-Z
y
X2
X1
W11,1
W11,2
W11,3
W12,1
W12,2
W12,3
W21,1
W22,1
W23,1
© SuperDataScienceDeep Learning A-Z
517
Time
517
517
517
517
Win
Wout
Wrec
Win
Wout
Wrec
Win
Wout
Wrec
Win
Wout
Wrec
Win
Wout
yt
εt
xtxt-1xt-2xt-3
Formula Source: Razvan Pascanu et al. (2013)
εt+1εt-1εt-2εt-3
Wrec ~ small
Wrec ~ large
Vanishing
Exploding
Wrec Wrec Wrec
© SuperDataScienceDeep Learning A-Z
Solutions:
1. Exploding Gradient
• Weight Initialization
• Echo State Networks
• Long Short-Term Memory Networks (LSTMs)
• Truncated Backpropagation
• Penalties
• Gradient Clipping
2. Vanishing Gradient
© SuperDataScienceDeep Learning A-Z
Untersuchungen zu dynamischen
neuronalen Netzen
By Sepp (Josef) Hochreiter (1991)
Link:
http://people.idsia.ch/~juergen/SeppHochreiter1991ThesisAdvisorSchmidh
uber.pdf
Additional Reading:
© SuperDataScienceDeep Learning A-Z
Learning Long-Term
Dependencies with Gradient
Descent is Difficult
By Yoshua Bengio et al. (1994)
Link:
http://www-dsi.ing.unifi.it/~paolo/ps/tnn-94-gradient.pdf
Additional Reading:
© SuperDataScienceDeep Learning A-Z
On the difficulty of training recurrent
neural networks
By Razvan Pascanu et al. (2013)
Link:
http://www.jmlr.org/proceedings/papers/v28/pascanu13.pdf
Additional Reading:

Deep Learning A-Z™: Recurrent Neural Networks (RNN) - The Vanishing Gradient Problem