The Fifth Elephant 2013

An Event on Big Data and Cloud Computing

Harshad Saykhedkar


Finding order in the chaos : machine learning for web text analytics using R

Submitted Jun 3, 2013

Participant will gain understanding of the following (through R),

  • A short, plain english introduction to the ideas that underlie text mining.

  • How to import large unstructured text data & apply basic cleanup procedures ?

  • How to apply more advanced natural language processing methods to the data ?

  • How to convert the unstructured text information to data structures suitable for machine learning and visualization ?

  • How to apply unsupervised learning methods (clustering, Latent Dirichlet Allocation) to data for identifying topics in the web documents ?

  • How to apply supervised learning methods to the data for classification ?


Do you get the feeling of ‘the cart before the horse’ on hearing buzz-words like social data mining or sentiment analysis and so on? Fundamental text mining methods are the real ‘workhorses’ behind these buzz-words. This workshop aims to give understanding of the fundamentals in ‘learning by doing’ fashion.

Internet, the information beast, largely consists of unstructured text form data. R environment provides excellent set of tools to deal with this. We will take up a realistic problem of finding topics in web-documents and touch upon a number of relevant machine learning methods using R.

We will also cover some relevant and interesting business problems which can be tackled using these methods.


  • Laptop with working R installation

  • Internet connection ( to download relevant R packages)

Speaker bio

An avid R user, I work on applying machine learning methods to the field of digital advertising, @ Sokrati Inc. I have a prior experience of applying these methods to telecom and banking sector problems. I hold a master’s in Operations Research from IIT, Mumbai.


{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hosted by

All about data science and machine learning