Anthill Inside 2019

On infrastructure for AI and ML: from managing training data to data storage, cloud strategy and costs of developing ML models

Introduction to Probabilistic Programming - PyMC3 and Edward

Submitted by Hariharan C (@harc) on Saturday, 13 April 2019


Preview video

Section: Tutorials Technical level: Intermediate Session type: Tutorial

Abstract

Probabilistic programming differ from deterministic ones by allowing language primitives to be stochastic. In other words, instead of being restricted to deterministic assignments such as:

rent = 25000

one can specify a probability distribution from which this house with such a rent was drawn

rent ~ Normal(mu=25000, sigma=1000)

The expressiveness of the probabilistic programming frawework, both theoretical and practical, allows us to go further into replacing parameters of Machine Learning algorithms with distributions. How do we do that?

With enhancing concerns about trust in blackbox AI, cases of small data, why will probabilistic programming help?

PyMC, Edward, Tensorflow Probability, Where do I start?

I’m so used to blackbox ML, How do I wear a Bayesian hat?

This talk tries to answer these questions.

Outline

Technically, talk will help get started with coding in PyMC3 and Edward, understand their strengths and weakness. Starting from Bayesian Inference to applying the same concepts on ML. In that sense, get an overall idea of how and where probabilistic programming helps. Code and graphs can be shown via Jupyter Notebook.

Requirements

Basic understanding of widely used Probability Distributions like Normal, Poisson, Binomial. Basic understanding of Machine Learning, Neural Networks. Also Python.

It would be easier if you have Jupyter, Pymc3 and Edward installed apart from usual suspects like numpy/pandas/seaborn etc.

You might want to install a Tensorflow version < 1.7 for Edward compatibility.

Following are the pip packages I have installed for this session:

absl-py==0.7.1
astor==0.7.1
backports-abc==0.5
backports.functools-lru-cache==1.5
backports.shutil-get-terminal-size==1.0.0
backports.weakref==1.0.post1
bleach==1.5.0
cycler==0.10.0
decorator==4.4.0
edward==1.3.5
enum34==1.1.6
funcsigs==1.0.2
futures==3.2.0
gast==0.2.2
grpcio==1.20.1
h5py==2.9.0
html5lib==0.9999999
ipykernel==4.10.0
ipython==5.8.0
ipython-genutils==0.2.0
joblib==0.12.5
jupyter-client==5.2.4
jupyter-core==4.4.0
Keras-Applications==1.0.7
Keras-Preprocessing==1.0.9
kiwisolver==1.1.0
Markdown==3.1
matplotlib==2.2.4
mock==3.0.5
numpy==1.16.3
pandas==0.24.2
pathlib2==2.3.3
patsy==0.5.1
pexpect==4.7.0
pickleshare==0.7.5
prompt-toolkit==1.0.16
protobuf==3.7.1
ptyprocess==0.6.0
Pygments==2.4.0
pymc3==3.6
pyparsing==2.4.0
python-dateutil==2.8.0
pytz==2019.1
pyzmq==18.0.1
scandir==1.10.0
scipy==1.2.1
seaborn==0.9.0
simplegeneric==0.8.1
singledispatch==3.4.0.3
six==1.12.0
subprocess32==3.5.3
tensorboard==1.6.0
tensorflow==1.6.0
tensorflow-estimator==1.13.0
termcolor==1.1.0
Theano==1.0.4
tornado==5.1.1
tqdm==4.32.1
traitlets==4.3.2
wcwidth==0.1.7
Werkzeug==0.15.4

Speaker bio

I’m Hariharan and I’m usually curious and love learning new things. I graduated from BITS – Pilani and since been in the industry for roughly 7 years. I have predominantly worked in the field of Machine Learning in my time in the industry. I love watching football, cricket. I used to love playing them, not anymore 😊. I like quizzing, despite not being good at it.

Links

Slides

https://www.slideshare.net/hariharanchandrasekaran9/into-to-probproghari-2

Preview video

https://www.youtube.com/watch?v=5sdttP9tFgk

Comments

  • Zainab Bawa (@zainabbawa) Reviewer a month ago

    This appears to be a proposal for a tutorial, and is a fit for Anthill Inside. Moving your proposal to Anthill Inside since this is where we are covering Bayesian Networks and concepts.

    For this proposal to be considered as a tutorial, you have to submit the following details, by editing this proposal:

    1. What is the software that participants should install on their machines to test some of the concepts that you will discuss?

    In the slides, the following details have to be added:

    1. What is probabalistic programming and why is this approach better than other options available?
    2. Who can use or consider this approach?
    3. What are the advantages and disadvantages of this approach?
    4. Who can actually start using probabalistic programming? Can those with models in production use this approach? What legacy issues will participants face if they are to get started with this at a late stage in their modelling?
    5. Real-life scenarios and applicability of probablistic programming.
    6. Comparisons with other approaches.
    7. Pros and cons, including skill-sets needed in teams to use this approach.

    We need revised slides + a preview video (a two-minute elevator pitch on what this tutorial is about and why participants should attend) by or before 21 May to close the decision.

  • Hariharan C (@harc) Proposer a month ago

    Sure. Will work on this.

  • Hariharan C (@harc) Proposer 27 days ago

    Updated slides, video etc.

  • Neha (@nehadave78) 27 days ago

    Nice work. Looking forward to listen to you at the conference

  • David Brine (@david56) 16 days ago

Login with Twitter or Google to leave a comment