How Deep is Deep Learning?
Undoubtedly Deep Learning is a recent significant step towards Artificial General Intelligence because of its sheer ability to learn most complex tasks. Deep Learning has been shown to achieve spectacular results in almost all domains. But as expected, there is always a price to pay for everything, especially for better things. And here the price is the interpretability and simplicity. Moreover, the amount of resources required by deep learning is huge, but that is not much of a concern in today’s era. With such huge promises, deep learning has become the panacea. Is it really true? How deep is Deep Learning? We explore such questions and will discuss some interesting findings and insights when deep learning is applied in education domain (Knowledge Tracing, to be precise).
Deep Learning was introduced in education domain very recently to trace the learner’s knowledge, which is referred to as Deep Knowledge Tracing (DKT). Knowledge Tracing models the process of knowledge acquisition by tracking learner’s progress/knowledge from interaction to interaction and generates a profile of learner’s strengths and weaknesses. DKT model uses a large hidden layer of LSTM units to model the dynamic and temporal nature of learner’s progression. funtoot is a personalised digital tutor which has more than 1 lakh active students across India. We apply Deep Knowledge Tracing on a dataset generated by 6th Grade students on funtoot which has more than 5 million datapoints (In education domain, the dataset of this size is a rare find). We compare it with the standard classic knowledge tracing models called Bayesian Knowledge (BKT) and Performance Factor Analysis (PFA). DKT outperforms BKT with a very good margin but DKT and PFA perform equally well. We analyse and understand the gain which DKT acheives over BKT and identify probable reasons (limitations) as to why BKT lags behind. Classic BKT, when tweaked and enhanced to overcome those limitations performs as good as DKT. Though, DKT being comparable to classic simple algorithms an advantage of ability to discover the relationships and interdepencies (pre-requisites) among skills.
Owing to a big hidden layer of LSTM units, DKT has parameters of the order of few hundred thousands, while the classical models like PFA and BKT have approximately few hundreds of them. Comparing just in the amount of parameters and the complexity of the models, DKT does not have an edge over much simpler models. A way to interpret this result is to appreciate the depth of the underlying domain of the problem. It seems the domain of knowledge tracing is shallow and the powerful deep learning models are unneccessary. We will discuss more about this in the talk.
This talk is based on our work published as “Few hundred parameters outperform few hundred thousand?” in Educational Data Mining Conference, 2017 (EDM2017).
- Define and Explain Knowledge Tracing
- Explain the domain, skills (knowledge components), funtoot platform and the dataset
- Discuss and Explain Knowledge Tracing Models
- Deep Knowledge Tracing (model and architecture)
- DKT Applications: Discovering relationships and interdepencies (pre-requisites)
- Bayesian Knowledge Tracing
- Performance Factor Analysis
- Analysis, comparison and study of these models
- Discovering Limitations and Enhancements of BKT
- Discussion on the depth of deep learning and the knowledge tracing domain
Amar Lalwani, Data Scientist @ funtoot, is responsible for research and development of funtoot’s Brain. funtoot is a personalised digital tutor in K-12 space for Math and Science. funtoot is actively used by more than 1 lakh students and more than 130 schools across India.
Amar Lalwani is also pursuing Ph.D. from IIIT-Bangalore in the area of Machine Learning and Artificial Intelligence.
- A talk at RMIT @ IIIT-Bangalore on “Genetic Grammars as a formalism for the Evaluation of TSP Heuristics”. RMIT (Ramanujan, Math & IT) is series held annually at IIITB. https://www.youtube.com/watch?v=YJ1l6DLLK34
- A talk on Artificial Intelligence and Genetic Algorithms @ Sri Kumaran Children’s Home, Bangalore. https://www.youtube.com/watch?v=fWUYxwv36Jk
- Slide deck on Knowledge Inference. https://speakerdeck.com/amar073/knowledge-inference
- Slide deck of the Paper Presentation at EDM-2017 (Educational Data Mining-2017). https://speakerdeck.com/amar073/edm-2017-few-hundred-parameters-outperform-few-hundred-thousand-thousand
- Above two slide decks have much in common with the proposed talk.
- Our paper at EDM (Educational Data Mining) Conference, 2017. http://educationaldatamining.org/EDM2017/proc_files/papers/paper_50.pdf
- This work is inspired by “How deep is knowledge tracing?” https://arxiv.org/abs/1604.02416