The Fifth Elephant 2013

An Event on Big Data and Cloud Computing

Finding order in the chaos : machine learning for web text analytics using R

Submitted by Harshad Saykhedkar (@harshad-saykhedkar) on Monday, 3 June 2013

Section: Workshops Technical level: Beginner

View proposal in schedule


Participant will gain understanding of the following (through R),

  • A short, plain english introduction to the ideas that underlie text mining.

  • How to import large unstructured text data & apply basic cleanup procedures ?

  • How to apply more advanced natural language processing methods to the data ?

  • How to convert the unstructured text information to data structures suitable for machine learning and visualization ?

  • How to apply unsupervised learning methods (clustering, Latent Dirichlet Allocation) to data for identifying topics in the web documents ?

  • How to apply supervised learning methods to the data for classification ?


Do you get the feeling of ‘the cart before the horse’ on hearing buzz-words like social data mining or sentiment analysis and so on? Fundamental text mining methods are the real ‘workhorses’ behind these buzz-words. This workshop aims to give understanding of the fundamentals in ‘learning by doing’ fashion.

Internet, the information beast, largely consists of unstructured text form data. R environment provides excellent set of tools to deal with this. We will take up a realistic problem of finding topics in web-documents and touch upon a number of relevant machine learning methods using R.

We will also cover some relevant and interesting business problems which can be tackled using these methods.


  • Laptop with working R installation

  • Internet connection ( to download relevant R packages)

Speaker bio

An avid R user, I work on applying machine learning methods to the field of digital advertising, @ Sokrati Inc. I have a prior experience of applying these methods to telecom and banking sector problems. I hold a master's in Operations Research from IIT, Mumbai.


  • Harshad Saykhedkar (@harshad-saykhedkar) Proposer 6 years ago

    A request to all interested participants : Can you please spare 5 minutes to fill up the following survey form? It will go a long way towards tuning up the content of the workshop and speed.

    Click here to submit the form.

Login with Twitter or Google to leave a comment