Yann LeCun, Silver Professor of Computer Science and Neural Science
Courant Institute of Mathematical Sciences and Center for Neural Science, New York University
Intelligent perceptual tasks such as vision and audition require the construction of good internal representations. Machine Learning has been very successful for producing classifiers, but the next big challenge for ML, computer vision, and computational neuroscience is to devise learning algorithms that can learn features and internal representations automatically.
Theoretical and empirical evidence suggest that the perceptual world is best represented by a multi-stage hierarchy in which features in successive stages are increasingly global, invariant, and abstract. An important question is to devise "deep learning" methods for multi-stage architecture than can automatically learn invariant feature hierarchies from labeled and unlabeled data.
A number of unsupervised methods for learning invariant features will be described that are based on sparse coding and sparse auto-encoders: convolutional sparse auto-encoders, invariance through group sparsity, invariance through lateral inhibition, and invariance through temporal constancy. The methods are used to pre-train convolutional networks (ConvNets). ConvNets are biologically-inspired architectures consisting of multiple stages of filter banks, interspersed with non-linear operations, spatial pooling, and contrast normalization operations.
Several applications will be shown, including a a pedestrian detector, a category-level object recognition system that can be trained on the fly, and a 'scene parsing' system that can label every pixel in an image with the category of the object it belongs to. Specialized FPGA and ASIC-based hardware architecture that run these systems in real time will also be described.