Though visualisation is used in data science to understand the shape of the data (data-vis), it is not widely used for the models developed; which are largely evaluated based on numerical summaries. Model visualisation (model-vis) can help understand: the shape of the model, the impact of parameters & different input data on the model, the fit of the model & where it can be improved.
Data science is a process of abstraction. In order to explain or to predict a real phenomena, the process start with framing the problem, acquiring & refining the data and then moves between the three layers of abstraction - transformations (data abstraction), visualisations (visual abstraction) and modelling (symbolic abstraction). All these three layers of abstraction work together to try and build a truer (or more closer) representation of the real phenomena.
Data visualisation (data-vis) helps us to understand the portrait and the shape of the data. The science of data-vis for exploratory data analysis is well developed, for both static graphics (scatter plot matrices, glyph based approaches, geometric transforms like parallel coordinates) and interactive graphics (layering, brushing and linking, projections and tours). See my talk at Fifth Elephant 2015 on Visualising Multi Dimensional Data - https://www.youtube.com/watch?v=X8rNDvPNg30. However, the power of visualisation is rarely leveraged for understanding the models developed better. Model evaluation is still largely restricted through numerical summaries. Extending visualisation to model building can be a powerful way to improve our understanding of the model.
Model visualisation (model-vis) can help us to understand the shape of the model and compare it to the shape of the data. It allows to see the fit of the model and understand where the fit can be improved. It also allows us to better understand the parameters in the model and how the model changes when the parameters change as well as how the parameters changes when the input data changes.
The science and tools for model-vis are still very under-developed. This talks looks at practical examples of doing model-vis in regression (linear, lasso), classification (logistic, trees, LDA) and clustering (hierarchical) problems that can help us better understand the model. This includes exploring model-vis approaches that:
- Visualise the model in data space as opposed to data in model space
- Visualise the entire space of models
- Visualise the same model with varying tuning parameters
- Visualise the same model with different input datasets
- Visualise the process of model fitting as opposed to final result
Integrating these approaches for model-vis as a part of model evaluation will strengthen the understanding of the model and lead to better model building for a data scientist. Model-vis can then complement data-vis for fitting better models as well as for communicating the insight from the data science process.
Post this talk, the audience will have a better understanding of the power of visualisation beyond data-vis to model-vis and use it to build better models as a data scientist.
A basic understanding of the data science process - Frame the problem, Acquire the data, Refine the data, Explore the data, Model the solution and Communicate the Insight.
Amit Kapoor is interested in learning and teaching the craft of telling visual stories with data. He uses storytelling and data visualization as tools for improving communication, persuasion and leadership. He conducts workshops and trainings on data visualisation and data science for corporates, non-profits, colleges, and individuals at narrativeVIZ Consulting. He also teaches storytelling with data as invited faculty in management schools e.g. IIM Bangalore, IIM Ahmedabad and design schools e.g. NID Bangalore.
His background is in strategy consulting in using data-driven stories to drive change across organizations and businesses. He has more than 14 years of management consulting experience, first with AT Kearney in India, then with Booz & Company in Europe and more recently for startups in Bangalore. He did his B.Tech in Mechanical Engineering from IIT, Delhi and PGDM (MBA) from IIM, Ahmedabad. You can find more about him at amitkaps.com and tweet him at @amitkaps.
- Visualising Multi Dimensional Data - https://www.youtube.com/watch?v=X8rNDvPNg30
- Learning Djembe Visually - https://www.youtube.com/watch?v=hA4sF02Ib0Q