Deep Learning (Interview With Dong Yu)
Dr. Dong Yu is a principal researcher at Microsoft Research. His research has been focusing on speech recognition and applications of machine learning techniques. He has published two monographs and over 150 papers in these areas and is the inventor/co-inventor of near 60 granted/pending patents. His recent work on the context-dependent deep neural network hidden Markov model (CD-DNN-HMM), which was recognized by the IEEE SPS 2013 best paper award, caused a paradigm shift on large vocabulary speech recognition.
Dr. Dong Yu is currently serving as a member of the IEEE Speech and Language Processing Technical Committee (2013-). He has served as an associate editor of IEEE transactions on audio, speech, and language processing (2011–2015), an associate editor of IEEE signal processing magazine (2008–2011), and the lead guest editor of IEEE transactions on audio, speech, and language processing — special issue on deep learning for speech and language processing (2010–2011).
1. When did you start working with deep learning?
2. What applications do you use deep learning with?
Speech Recognition, Speech Separation, Keyword Spotting, Emotion Classification, Speech Understanding, Ads Click Prediction, Ads Relevance Estimation.
3. Did you notice any improvements when using deep learning compared with other methods you used in the past for the same problems?
Yes, we have observed significant accuracy improvements in speech related tasks. For example, we are the first to show that deep learning can help large vocabulary speech recognition. We reported that we can cut word error rate by one-third on conversational speech recognition tasks in 2010. Additional accuracy improvements have been achieved in recent years using more advanced deep learning techniques.
4. What other applications can deep learning be used with?
People have demonstrated its effectiveness on image classification, machine translation, image captioning, computational arts, and self-driving cars.
5. Which applications do you see deep learning is currently used most with?
Speech Recognition, Image Classification, and Machine Translation.
6. Are there any prerequisites or skills one has to master before starting to learn deep learning? Can you give us some examples on those?
If you want to understand how deep learning works underneath, you will need to have knowledge on calculus, linear algebra, statistics, machine learning, and optimization theory and techniques.
7. What would be the best way to start learning deep learning?
If you already have the prerequisite skills, you can start by taking some online deep learning courses. Of course, without actually making your hands dirty you may not learn a lot. For this reason, it is a good idea to adopt an open source toolkit such as Microsoft’s CNTK, Google’s TensorFlow, Facebook’s Torch, Berkeley’s Caffe, University of Montreal’s Theano, and mxnet, and apply the techniques to some toy problems.
8. What programming languages do you recommend for working with deep learning?
Depends on your skills and background, you may choose Python, R, Lua, or C++. Historically, many deep learning packages and applications were built on Python.
9. What deep learning frameworks do you recommend?
CNTK, TensorFlow, Torch, Theano, and mxnet are general purpose deep learning toolkits, while Caffe is more restricted with its focus on image tasks.
10. What deep learning books do you recommend?
For general deep learning concepts and techniques, I would recommend Deep Learning by Ian Goodfellow, Yoshua Bengio, and Aaron Courville, to be published by MIT Press.
For deep learning’s applications, esp. in speech recognition, I would recommend Automatic speech recognition — a deep learning approach, by Dong Yu and Li Deng, published by Springer.
11. Are there any other resources (i.e. websites, courses, tutorials) you recommend for learning deep learning?
Deep learning is a fast developing technique. The best approach to keep track of the progress is to read newly published papers or to attend conferences such as NIPS and ICML. Alternatively, you can pay attention to summer schools and tutorials given at different conferences. deeplearning.net seems to be a good website with deep learning related resources.
12. Are there events, conferences, workshops, etc you recommend for deep learning?
NIPS (Neural Information Processing Systems) and ICML (International Conference on Machine Learning) are two major conferences on machine learning (and deep learning). For specific application areas, you may want to attend related conferences. For example, majority of speech recognition papers are published in ICASSP (International Conference on Acoustics, Speech and Signal Processing) and Interspeech, while image related papers are mainly published at CVPR (Computer Vision and Pattern Recognition).
13. Are there any deep learning companies you are interested in?
Companies such as Microsoft, Alphabet (Google), Facebook, and Baidu all have strong deep learning teams.
14. Where do you see deep learning in the coming 10-years?
Deep learning is just one component in the big AI picture. The basic concept of deep learning will still be valid and will be useful in many applications, even 10 years later. However, it’s not sufficient to achieve our goal of artificial general intelligence with only deep learning. We will see more and more intelligent systems that are built using deep learning together with other techniques such as reinforcement learning, transfer learning, tree search, knowledge graph, and Bayesian learning.