Finding order in the chaos : machine learning for web text analytics using R
Submitted by Harshad Saykhedkar (@harshad-saykhedkar) on Monday, 3 June 2013
Participant will gain understanding of the following (through R),
A short, plain english introduction to the ideas that underlie text mining.
How to import large unstructured text data & apply basic cleanup procedures ?
How to apply more advanced natural language processing methods to the data ?
How to convert the unstructured text information to data structures suitable for machine learning and visualization ?
How to apply unsupervised learning methods (clustering, Latent Dirichlet Allocation) to data for identifying topics in the web documents ?
How to apply supervised learning methods to the data for classification ?
Do you get the feeling of ‘the cart before the horse’ on hearing buzz-words like social data mining or sentiment analysis and so on? Fundamental text mining methods are the real ‘workhorses’ behind these buzz-words. This workshop aims to give understanding of the fundamentals in ‘learning by doing’ fashion.
Internet, the information beast, largely consists of unstructured text form data. R environment provides excellent set of tools to deal with this. We will take up a realistic problem of finding topics in web-documents and touch upon a number of relevant machine learning methods using R.
We will also cover some relevant and interesting business problems which can be tackled using these methods.
Laptop with working R installation
Internet connection ( to download relevant R packages)
An avid R user, I work on applying machine learning methods to the field of digital advertising, @ Sokrati Inc. I have a prior experience of applying these methods to telecom and banking sector problems. I hold a master's in Operations Research from IIT, Mumbai.