Architectural Design for Interactive Visualization
Submitted by Amit Kapoor (@amitkaps) on Thursday, 28 June 2018
Visualisation for data science requires an interactive visualisation setup which works at scale. In this talk, we will explore the key architectural design considerations for such a system and illustrate using examples the four key tradeoffs in this design space - rendering for data scale, computation for interaction speed, adaptive to data complexity and responsive to data velocity. The key takeaways for the audience
- Ideas on the architecture design for making an interactive visualisation application for large data
- Approaches to balance the different trade-off - rendering, computation, adaptive and responsiveness - inherent in the design.
- Insight and learnings on how others have addressed these trade-offs
Visualisation is an integral part of data science process - EDA to understand the shape of the data, Model visualisation to unbox the model algorithm and, Dashboard visualisation to communicate the insight. This task of visualisation is increasingly shifting from a static & narrative setup to an interactive & reactive setup, which presents a new set of challenges for those designing interactive visualisation applications.
The talk covers the four major areas that impact the architecture design of interactive visualization at scale and will illustrate the different design trade-offs involved using exemplars and case studies for each.
Rendering for Data Scale: Envisioning how the visualization can be displayed when data size is small is not hard. But how do you render interactive visualization when data points is in millions or billions of data points?
- Bin-Summarize-Smooth e.g. Datashader, BigVis
- WebGL based Rendering e.g. Deck.gl
Computation for Interaction Speed: Making the visualisation reactive requires the user to have the ability to interact, drill-down, brush and link multiple visual views to gain insight. But how do you reduce the latency of the query at the interaction layer, so that the user can interact with the visualisation?.
- Aggregation & In-Memory Cubes e.g. Hashcubes, inMems, Nanocubes
- Approximate Query Processing / Sampling e.g. VerdictDB
- GPU based Databases e.g. MapD
Adaptive to Data Complexity: Choosing a good visualisation design for a singular dataset is possible after a few experiments and iteration. But how do you ensure that the visualisation will adapt to the variety, volume and edge cases in the real data?
- Responsive Visualisation to Space & Data
- Handling High Cardinality e.g. Facet-Dive
- Multi-Dimensional Reduction e.g. Embedding Projector
Responsive to Data Velocity: Designing for periodic query based visualisation refreshes is one thing. But streaming data adds a whole new level of challenge to interactive visualisation. So how do you trade-offs between real-time vs. near real-time data and its impact on refreshing visualization?
- Optimizing for near real-time visual refreshes
- Handling event / time based streams
Amit Kapoor teaches the craft of telling visual stories with data. He conducts workshops and trainings on Data Science in Python and R, as well as on Data Visualisation topics. His background is in strategy consulting having worked with AT Kearney in India, then with Booz & Company in Europe and more recently for startups in Bangalore. He did his B.Tech in Mechanical Engineering from IIT, Delhi and PGDM (MBA) from IIM, Ahmedabad. You can find more about him at http://amitkaps.com/ and tweet him at @amitkaps.