PyCon Pune 2017

A conference on the Python programming language


Sarah Masud


What makes a diamond costly? - An Introduction to ggplots in Python

Submitted Nov 28, 2016

Visualizing any dataset (specially when we don’t have domain knowledge) serves 3 purposes:
1. Understand how the attributes interact with each other.
2. Determine the attributes that are more important for predictive models.
3. Help create new hypothesis, and/or verify old ones.
This talks aims to help the audience understand the importance of data visualization, and role of ggplots as a tool, through the visual story of price vs intrinsic diamond properties.
Sample of what will be presented along with the demo, is attached in the slides.


Aim: To understand why is pre-modeling visualization a good idea.
1. What are various visualization tools in Python?
2. What are ggplots adn why use them?
3. Use Case- use functionality of ggplots to visualize the price evaltuion of a diamond.
Key take away: What are ggplots, and how they are useful in understanding new domains.


Laptops with Python 2.7
Python libraries- matplotlib, pandas, numpy, ggplots

Speaker bio

As an associate software engineer at Red Hat, I am currently working on an analytics project. My work under Dr. Tanvir Ahamd on Framework to Extract Context Vectors from Unstructured Data using Big Data Analytics was presented at Ninth International Conference on Contemporary Computing(Aug 2016). Currently I am envolved with IEEE WIE Stand project as mentor for the Data Science and Machine Learning track, and volunteer with Women Who Code, Lean In and Grace Hopper India. I am ever enthusiastic about Data Science, Women in Tech, and Open Source.




