Getting started with machine learning: tools, algorithms and concepts
Submitted by Harshad Saykhedkar (@harshss) on Thursday, 12 October 2017
Section: Workshop Technical level: Beginner
This workshop will serve as a starting point for beginners in machine learning. I will cover a high level overview of field of machine learning and introduction to the Python data ecosystem in machine learning. I strongly believe that the best way to learn machine learning is by building few algorithms from scratch. So we will build a supervised ML application from scratch in Python. Since ML is a very vast field, I will spend some time on study guidelines and how to approach the field.
The audience can expect to take away the following after attending the workshop,
- Understanding of big picture of machine learning
- Implementation practice which helps in knowing ‘what is happening under the hood’ of most machine learning libraries
- A whirlwind tour of Python data ecosystem APIs (Numpy, Pandas and scikit-learn)
- Practical pointers on how to structure your study of machine learning
There are many ways to approach machine learning field. We can start with knowing the tools and the APIs and then gradually approach the underhood maths. Alternatively, we can start with maths and then APIs/tools can be learnt later. The workshop objective is to cover each aspect in some detail. The outline will
we as follows,
- Introduction to Python data ecosystem: few hands on exercises on numpy and pandas to serve as warmup (~ 30 minutes)
- Introduction to machine learning: mostly plain English content, covering big picture (~ 30 minutes)
- Building a regression/classifier from scratch in Python (~ 45 - 50 minutes)
- Solving a more involved problem by using scikit-learn APIs directly (~ 30 minutes)
- Next steps, how to study and which resources can be used (~ 20 minutes)
- Summarizing what we learnt, question-answers (~ 10 minutes)
Overall, I am expecting the workshop to take 3 hours +- 15 minutes. Note that this is a beginner workshop and if you are already a practicing data scientist then most of the material will be too basic for you.
- Laptop (operating system of your choice), charged battery + charger.
- Python installed on the laptop + IDE of your choice/termincal.
- No hard choice between python 2 Vs python 3.
- Following libraries MUST be installed.
- It won’t be possible to provide installation support at the time of workshop. So all requirements should be pre-installed. Without the installations, you won’t get anything out of the workshop.
Maths Knowledge Requirements
It would help if you brush up the following topics from high school. Although these are not mandatory, we will cover enough details at the time of workshop.
- Basics of derivatives and concept of maxima-minima.
- Basics of matrix and vector manipulation from linear algebra.
Programming Knowledge Requirements
- You should know basic programming
- Reading and Writing files
- Flow controls (if-else)
- Looping constructs like for loop, while
- Variable assignments
- In other words, you should have programmed at least few hundred lines in any mainstream programming language.
- The implementation choice for this workshop will be Python.
I work as head of data science at onlinesales.ai, an advertising technology startup based out of Pune. I have 7+ years of experience in data science and started in the field before it was a buzzword :-P. I have built multiple products, handled consulting assignments and delivered solutions using machine learning, R and Python. I hold a Master’s degree in Operations Research from Indian Institute of Technology, Mumbai.
Though I have done similar workshops multiple times before (few links given above), I try my best to do better in each iteration :-)