arrow_back Generating Data Analytics Reports using Scalable Config Driven Framework
Swing and a Miss: Deploying machine learning models for IoT enabled devices using Python
Submitted by sanjiv soni (@sanjivsoni) on Tuesday, 4 September 2018
Technical level: Beginner
The primary purpose of this talk to describe how we are using python and sklearn to model and analyse time series sensor data. In particular, I will walk through how we use Python to process data from an IoT enabled sensor attached to a cricket bat, build machine learning models on the data, and use open source tools to deploy our models in the sensor device as a smart IoT application.
The session will be beginning with exploring some basic concepts to help the audience in understanding the talk in a better manner. It will be followed by exploring how we are forming features by rapidly creating and validating the hypothesis using Jupyter notebooks. This part will be mostly about sharing some approaches with code bases in python. The next thing we will discuss is how open source Python tools can be used to build and deploy machine learning models in a standalone IoT motion sensing device. We conclude the talk by leaving some food for thought for the audience to ponder upon and point out an interesting and important area in Data Science and smart IoT world.
With the steep increase in the number of smart-things connected to the internet, the amount of data that is being generated by such devices is increasing exponentially. However, much of that data is not useful and therefore filtering unuseful data is an important task. How do we filter the important part and remove the noise from sensor data streams to generate actionable insights?
To demonstrate the problem we are placing a sensor device on a cricket bat. The IoT device is a miniaturised, wireless MEMS inertial measurement unit (IMU). The IMU incorporates three-axis sensing of bat acceleration and angular velocity with a low-power Bluetooth to transmit this data to a mobile.
First, we gather event-based data rather than storing the entire stream. This again poses the question: how do we define an event? What makes an event unique from the surrounding ‘non-event’ context? These are some of the questions that need to be answered in order to define an event. Watching a cricket batter stand and prepare to swing, the human brain continuously filters its visual perception and is able to detect and differentiate a swing from the pre- and post-swing activity. We need to be able to automate that same process. Some data instances can be tagged while other can’t be. This helps in training and evaluating machine learning models later.
Secondly, After we have extracted time series data based on the instances, we can start analysing these event-based sets of data to understand the language of sensor data. For this, we are using Jupyter Lab to interactively work with data. How does an accelerometer data depict the real world physical motion? This step helps us find the relation between the real world actions and the sensor data set. Well, the extraction process will be prone to noises.
The data comes in CSV files, python seems the right choice for us to read and analyse the data. Pandas and offer data frames that come handy to rapidly form and validate hypothesis interactively in Jupyter notebooks. Any analysis is incomplete without visualisation, that’s where Matplotlib helps us understand the data better. We quickly test the machine learning models by using Sklearn, which has most of the standard algorithms already implemented. This keynote will describe some of the analysis (along with python code) to show how we have taken several steps right from forming the hypothesis to implementing a solution in the device level layer. All of this demonstrates how Python and its rich set of libraries are helpful in forming solutions to some of the product related features.
Thirdly, we need to automate the task of classifying a particular instance from the stream. For this to happen, we can either feed a machine learning model or create a rule-based algorithm which can classify the events into buckets.
Now every step has its own set of challenges, firstly the application we are working on involves using motion sensors attached to the back of a cricket bat. There are network constraints in the field. If a sportsperson wants to know real-time analytics from the device, the segregation needs to happen offline. We have to deploy the models on the miniature sensor devices because sometimes the players don’t even carry their mobile phones to the playing area. Therefore our objective is to enable the devices to remain independent in running machine learning algorithms by themselves.
Sanjiv Soni is a Data Scientist at Montane Ventures, an early-stage venture capital fund based in India.. He is also an international fellow at University of San Francisco for Deep Learning Programme. Sanjiv has experience with Software and product ecosystem. He has interests in building software devised solutions to problems solved by humans.