Meta learning with memory augmented neural network

Meta Learning with
Memory-Augmented
Neural Networks
ICML 2016, citation: 16
Katy@DataLab
2017.03.28

Background
• Memory Augmented Neural Network(MANN) refers
to the class of eternal memory-equipped network
instead of those internal memory based
architecture(such as LSTMs)

Motivation
• Some problem of interest(ex: motor control) require
rapid inference from small quantities of data.
• This kind of ﬂexible adaption is a celebrated aspect
of human learning.

Related Work
• Graves, Alex, Greg Wayne, and Ivo Danihelka.
"Neural turing machines." arXiv preprint arXiv:
1410.5401 (2014).

Related Work
• Lake, Brenden M., Ruslan Salakhutdinov, and Joshua B. Tenenbaum.
"Human-level concept learning through probabilistic program induction."
Science 350.6266 (2015): 1332-1338.

Main Idea
• Learn to do classiﬁcation on unseen class
• Learn the sample-class binding on memory
instead of weights
• Let the weights learn higher level knowledge

Model
• yt (label) is present in a temporally offset manner
• Labels are shufﬂed from dataset-to-dataset. This
prevent the network from slowly learning sample-
class binding.

• It must learn to hold data samples in memory until
the appropriate labels are presented at the next
time-step, after which sample-class information can
be bound and stored for later use

Model
Basically the same as neural turing machine(NTM)

• Read from memory using the same content-based
approach in NTM
• Write to memory using Least Recent Used
Access(LRUA)
• Least: Do you use this knowledge often?
• Recent: Do you just learn it?
Model

Content-based approach
controller produces the key

Least Recent Used
Access(LRUA)
• Usage weights wut keep track of the locations
most recently read or written to
• gamma is the decay parameter

• least-used weights wlu
t
• m(wu
t, m) denotes the n smallest element of the vector wu
t
• here we set n equals to the number of read
• write weights wwt
• alpha is a learnable parameter
• prior to writing to memory, the least used memory location
is set to zero

Experiments
• dataset: Omniglot
• 1643 classes with only
a few example per
class(the transpose of
MNIST)
• 1200 training classes
• 443 test classes

Experiments
• train for 100,000
episodes, each
episodes with ﬁve
randomly chosen
classes with ﬁve
randomly chosen
labels, and 10
instances each
• test on never-seen
classes

Class Representation
• A different approach for labeling classes was
employed so that the number of classes presented in
a given episode could be arbitrarily increased.
• Characters for each label were uniformly sampled
from the set {‘a’, ‘b’, ‘c’, ‘d’, ‘e’}, producing random
strings such as ‘ecdba’

Class Representation
• This combinatorial approach allows for 3125
possible labels, which is nearly twice the number of
classes in the dataset.

LSTM MANN
5 classes
/ episode
15 classes
/ episode

Experiment with Different
Algorithms

Experiment with Different
Algorithms
• kNN(single nearest neighbour) has an unlimited
amount of memory, and could automatically store
and retrieve all previously seen examples.
• MANN outperforms kNN
• using LSTM as controller is better than using
feedforward NN

Experiment on Memory
Wiping
• A good strategy is to wipe the external memory
from episode to episode, since each episode
contains unique classes with unique labels.

Experiment on Curriculum
Training
• gradually increase the classes per episode

Experiment on Curriculum
Training

Conclusion
• Gradual, incremental learning encodes
background knowledge that spans tasks, while a
more ﬂexible memory resource binds information
particular to newly encountered tasks. (We wipe
out the external memory between episode in this
experiment)
• Demonstrate the ability of a memory-augmented
neural network to do meta-learning
• Introduce a new method a access external memory

Conclusion
• The controller is like the CPU/hippocampus(海⾺馬
體）, in charges of the long term memory
• The external memory is like the RAM/neocortical(新
⽪皮層), in charges of the short term memory and the
new coming information

Conclusion
• As machine learning researchers, the lesson we
can glean from this is that it is acceptable for our
learning algorithms to suffer from forgetting, but
they may need complementary algorithms to
reduce the information loss.
Goodfellow, Ian J., et al. "An empirical investigation of catastrophic forgetting in gradient-based neural
networks." arXiv preprint arXiv:1312.6211 (2013).

Why Memory Augmented Neural
Network in general work well?
1. Information must be stored in memory in a
representation that is both stable (so that it can be
reliably accessed when needed) and element-
wise addressable (so that relevant pieces of infor-
mation can be accessed selectively).
2. The number of parameters should not be tied to
the size of the memory(LSTM doesn’t fulﬁl this).

Meta learning with memory augmented neural network

More Related Content

What's hot

Viewers also liked

Similar to Meta learning with memory augmented neural network

Recently uploaded

In this document

Meta learning with memory augmented neural network