Deep Learning (Interview With Jürgen Schmidhuber)

Since age 15 or so, Prof. Jürgen Schmidhuber’s main goal has been to build a self-improving Artificial Intelligence (AI) smarter than himself, then retire. He has pioneered self-improving general problem solvers since 1987, and Deep Learning Neural Networks (NNs) since 1991. The recurrent NNs developed by his research groups at the Swiss AI Lab IDSIA (USI & SUPSI) and TU Munich were the first to win official international contests. They have revolutionized handwriting recognition, speech recognition, machine translation, image captioning, and are now available to over a billion users through Google, Microsoft, IBM, Baidu, and many other companies. DeepMind is heavily influenced by his lab’s former students (including 2 of DeepMind’s first 4 members and their first PhDs in AI, one of them co-founder, one of them first employee). His team’s Deep Learners were the first to win object detection and image segmentation contests, and achieved the world’s first superhuman visual classification results, winning nine international competitions in machine learning and pattern recognition (more than any other team). They also were the first to learn control policies directly from high-dimensional sensory input using reinforcement learning. His research group also established the field of mathematically rigorous universal AI and optimal universal problem solvers. His formal theory of creativity and curiosity and fun explains art, science, music, and humor. He also generalized algorithmic information theory and the many-worlds theory of physics, and introduced the concept of Low-Complexity Art, the information age’s extreme form of minimal art. Since 2009 he has been member of the European Academy of Sciences and Arts. He has published 333 peer-reviewed papers, earned seven best paper/best video awards, the 2013 Helmholtz Award of the International Neural Networks Society, and the 2016 IEEE Neural Networks Pioneer Award. He is president of NNAISENSE, which aims at building the first practical general purpose AI.

Since depth implies computational power and efficiency, we have focused on very deep neural nets from the start of my neural net research in the 1980s. My first deep learning publication was from 1991: My First Deep Learning System of 1991 + Deep Learning Timeline 1960–2013.

Back then others were still focusing on rather shallow nets with fewer than 10 subsequent computational stages (Ivakhnenko had 8 such stages already in the 1960s), while our methods already enabled over 1,000 such stages in 1993. So we may claim that we made neural nets really deep, especially recurrent networks, the deepest and most powerful nets of them all.

Variants of our Long Short-Term Memory (LSTM) are used for speech processing, machine translation, stock market prediction, Turing test-related chat bots, document analysis, automatic email answering, and many other sequence learning problems. Convolutional Neural Networks (CNNs) and/or multi-dimensional LSTM are used for image and video analysis.

Sure! For example, my first very deep learner based on unsupervised pre-training (1991–1993) solved previously unsolvable deep learning tasks with over 1000 computational stages. And massive improvements allowed our team to win numerous competitions. For example, our first very deep supervised learner (Long Short-Term Memory or LSTM, 1995–2009 and beyond) was the first recurrent NN to win international contests, and the first NN to win connected handwriting contests (2009). Our GPU-based max-pooling CNN was the first to outperform humans in a computer vision contest (traffic sign recognition, 2011), and dramatically improved the old MNIST error rate. It also was the first deep NN to win a Chinese handwriting contest (2011), the first deep NN to win an image segmentation contest (2012), the first deep NN to win an object detection contest (2012), and the first to win medical imaging contests (2012, 2013).

4. Which applications do you see deep learning is currently used most with?

Variants of our Long Short-Term Memory (LSTM) are used for speech processing, machine translation, stock market prediction, Turing test-related chat bots, document analysis, automatic email answering, and many other sequence learning problems. CNNs and/or multi-dimensional LSTM are used for image and video analysis.

5. Are there any prerequisites or skills one has to master before starting to learn deep learning? Can you give us some examples on those?

Basic math skills, basic programming skills

6. What would be the best way to start learning deep learning?

Read our surveys and papers (smile)

7. What programming languages do you recommend for working with deep learning?

Python, CUDA, C, and many others.

8. What deep learning frameworks do you recommend?

There are many. We like our own, e.g., Brainstorm.

9. Are there any other resources (i.e. websites, courses, tutorials) you recommend for learning deep learning?

Our surveys and papers, in particular Deep Learning in Neural Networks: An Overview and Deep Learning.

10. Are there events, conferences, workshops, etc you recommend for deep learning?

NIPS (Neural Information Processing Systems), ICML (International Conference on Machine Learning), DALI (Data Learning and Inference), IJCNN (International Joint Conference on Neural Network), CVPR (Computer Vision and Pattern Recognition), and many others.

11. Are there any deep learning companies you are interested in?

We are mostly interested in our own company NNAISENSE, which aims at building the first practical general purpose AI. We are also following with interest the development of DeepMind, a company that was sold to Google for over 600M USD, and which made AlphaGo, the program that beat the best human Go player. DeepMind is heavily influenced by our former students: two of DeepMind’s first four members and their first PhDs in Artificial Intelligence and Machine Learning came from my lab, one of them co-founder, one of them first employee. (The other two co-founders were not from my lab and had different backgrounds.) Other ex-PhD students of mine joined DeepMind later, including the creator of DeepMind’s “Neural Turing Machine,” and a co-author of our paper on Atari-Go in 2010.

12. Where do you see deep learning in the coming 10-years?

Let me cannibalize my answer from a recent AMA (Ask Me Anything) at reddit. In about 10 years the first quarter of this century will end, and we will celebrate the centennial of the first transistor, patented by Julius Lilienfeld in 1925. By 2025, even (minor extensions of) existing machine learning and neural network algorithms will achieve many important superhuman feats.

I expect huge (by today’s standards) recurrent neural networks on dedicated hardware to simultaneously perceive and analyse an immense number of multimodal data streams (speech, texts, video, many other modalities) from many sources, learning to correlate all those inputs and use the extracted information to achieve a myriad of commercial and non-commercial goals. Those RNNs will continually and quickly learn new skills on top of those they already know. This should have innumerable applications, and will change society in innumerable ways.

I guess we are witnessing the ignition phase of the field’s explosion. But how to predict turbulent details of an explosion from within? What will be the cumulative effect of all those mutually interacting changes on our civilization, which will depend on machine learning in so many ways? In 2012, I tried to illustrate how hard it is to answer such questions: A single human predicting the future of humankind is like a single neuron predicting what its brain will do.

One thing seems clear though: in the not too distant future, supersmart AIs will start to colonize the solar system, and within a few million years the entire galaxy. The universe wants to make its next step towards more and more unfathomable complexity.

Leveraging machine/deep learning and image processing in medical image analysis.