Deep Learning of Invariant Feature Hierarchies

SpeakerYann LeCun
Organization Courant Institute of Mathematical Sciences and Center for Neural Science, New York University
LocationEngineering Building II Room 1230
DateSeptember 21, 2012 12:50 PM


Intelligent perceptual tasks such as vision and audition require the construction of good internal representations. Machine Learning has been very successful for producing classifiers, but the next big challenge for ML, computer vision, and computational neuroscience is to devise learning algorithms that can learn features and internal representations automatically. 

Theoretical and empirical evidence suggest that the perceptual world is best represented by a multi-stage hierarchy in which features in successive stages are increasingly global, invariant, and abstract.  An important question is to devise "deep learning" methods for multi-stage architecture than can automatically learn invariant feature hierarchies from labeled and unlabeled data.

 A number of unsupervised methods for learning invariant features will be described that are based on sparse coding and sparse auto-encoders: convolutional sparse auto-encoders, invariance through group sparsity, invariance through lateral inhibition, and invariance through temporal constancy. The methods are used to pre-train convolutional networks (ConvNets). ConvNets are biologically-inspired architectures consisting of multiple stages of filter banks, interspersed with non-linear operations, spatial pooling, and contrast normalization operations.

 Several applications will be shown, including a a pedestrian detector, a category-level object recognition system that can be trained on the fly, and a 'scene parsing' system that can label every pixel in an image with the category of the object it belongs to. Specialized FPGA and ASIC-based hardware architecture that run these systems in real time will also be described.


Yann LeCun is Silver Professor of Computer Science and Neural Science at the Courant Institute of Mathematical Sciences and the Center for Neural Science of New York University. He received the Electrical Engineer Diploma from Ecole Supérieure d'Ingénieurs en Electrotechnique et Electronique (ESIEE), Paris in 1983, and a PhD in Computer Science from Université Pierre et Marie Curie (Paris) in 1987. After a postdoc at the University of Toronto, he joined AT&T Bell Laboratories in Holmdel, NJ in 1988, and later became head of the Image Processing Research Department at AT&T Labs-Research in 1996. He joined NYU as a professor in 2003, after a brief period as Fellow at the NEC Research Institute in Princeton. His current interests include machine learning, computer perception and vision, mobile robotics, and computational neuroscience.  He has published over 150 technical papers and book chapters on these topics as well as on neural networks, handwriting recognition, image processing and compression, and VLSI design. His handwriting recognition technology is used by several banks around the world to read checks.  His image compression technology, called DjVu, is used by hundreds of web sites and publishers and millions of users to access scanned documents on the Web, and his image recognition methods are used in deployed systems by companies such as AT&T, Google, Microsoft, NEC, and others for document recognition, human-computer interaction, image indexing, and video analytics. He has been on the editorial board of IJCV, IEEE PAMI, IEEE T. Neural Networks, was program chair of CVPR'06, and is chair of the annual Learning Workshop. He is on the science advisory board of Institute for Pure and Applied Mathematics. He is the co-founder of MuseAmi, a music technology company and of CogBits, a company working on embedded computational intelligence.

  September 2012
Sun Mon Tues Wed Thu Fri Sat