Ok, switching gears a little. Apologize in advance for the long post.
I spent some time trying to learn more about where deep learning/multi-level neural networks are, and found these talks by Geoff Hinton of the U of Toronto, given to Google. They are still fairly technical, but you can follow the ideas without having to understand the technical theory/mathematics. I'll try to summarize (in my own limited understanding of the talks) and give the times of neat parts of the video, since the videos are each an hour long.
The first video gives a foundation of what current neural nets are like and how they are built up. He also gives some great examples/demos.
Summary
In the past, neural nets were often hard-coded, or training data was labelled to help the net know what it was learning. This resulted in a mapping from data (say a picture of a car) to an abstraction of the data called "features" (say that the car has wheels and a body).
Also, a technique (back progagation) where errors between what the neural net thinks an object is, and what it really is, is fed back into the net to help fine tune the results.
Unfortunately, this type of net was very limited - it didn't really learn on its own (it was told what the image was), and couldn't distinguish well between, say, other cars that didn't look like the training data cars. It was also very slow to train, and the back propagation ended up not helping much at all.
In the new way of doing things, training data is unlabelled (the net doesn't know what it is looking at during training) and so features abstracted from the training data are unbiased. Also, those features then get fed as training data into a second layer of features, and so on, which builds up multiple layers of features, each of which improves on the previous. Also, the building up of the layers was vastly sped up by not requiring hundreds of iterations (the features are built as kind of a feedback loop with the training data). Finally, only at the very end, add labels to the "things" the neural net can distinguish (though he says later even this isn't necessary). And also use back propagation at the end: because the feature layers are "smarter", the fine tuning of the back propagation actually works much better.
Demos
The demo of his neural net recognizing numerical digits (and also generating its own mental picture of digits) starts at 21:35.
The results of applying the neural net to a million documents to try to classify documents into different topics starts at 33:18. Here, he compares two methods. The older way of classifying documents (PCA) which looks for keywords in documents, and his neural net. The documents are mapped into a 2-d plane to provide a visual representation.
The second video of Geoff is an update to the first one, given 3 years later:
Summary
The big new thing here is the ability to combine multiple features together help improve the neural nets capabilities. For instance, he gives the example of telling the neural net to draw a square at a particular location/orientation (15:00). The net could simply specify exact vertex points which will define a square, or it can break the problem down into two more general parts: 1) a square has same-length edges and corners and 2) the relationship between edges and corners (edges are colinear to corners). Both operations are easier to manage in the net, and together will give the same result.
Lots of fascinating stuff where this change allows the net to work with motion (ie. to recognize different types of motion, then predict or model that motion on its own), and then to try to distinguish objects within images.
Demos
The motion demo is at 34:25, and shows how the neural net was trained on different types of walking patterns, and then can make a stick figure walk in those same patterns.
A real suprising result is at 43:45. To determine what an object is within a picture, the net is trained on a number of images, and taught to focus on mean and covariance of the pixels (which will give average colors of pixels in the image, and the "sameness" of pixels within the image, respectively). This results in a number of filters based on mean and covariance which can get applied to an image to help determine what the content of the image is.
The cool part is at 44:10 where a topographic map of the neural net's covariance filters for an image is shown. Apparently that map models what is seen in a monkey's brain. Some good evidence that the neural net is close to modelling how a brain actually stores information to understand images.
|