Run Predictive Machine Learning algorithms on Hadoop without even knowing Mapreduce.
Submitted by GaganDeep Juneja (@gagandeepjuneja) on Tuesday, 3 June 2014
In this talk I will try to bring some new concepts that will help data scientists to run their predictive algorithms on hadoop with the help of PMML and cascading.
Data Scientists are very much familiar with working on tools like R, SAS etc. for them writing or converting algorithms into mapreduce is bit difficult. There are libraries such as Mahout is available which provides mapreduce implementation of many algorithms but you can not run your algorithm directly on a hadoop cluster. Before that you need to create a Data Model based on data and decide the values for some tweaking parameters and changing these parameters multiple time in hadoop job and running again and again is bit pain for a Data scientist, for a java developer it could be a fun. Data scientists can do Data modeling or model training in SAS/R very easily and efficiently.
I am working on develpoment of GUI Interface whihc helps users in creating and triggering cascading jobs. I will try to showcase my talk demonstration using this GUI tool. Its like using any other web based/ desktop client application. If user is not aware of Hadoop componants such as MapReduce, HDFS can easily work and run their algorithms on massive amount of data.
Gagan Deep Juneja is a Big Data Hacker at GoDataDriven. He has close to 6 years of experience in the Software Industry. He has worked on several projects using Java/J2ee and Hadoop as the primary technology. He has an inclination to open source technologies and likes to explore/delve into new frameworks . He is a committer and ppmc member to Apache Blur (incubating).He has been speaker at various meetup groups. He is a active blogger and in his free time loves exploring new technogolies and keeping himself updated with latest trends.