PyCon, the gathering for the community using and developing the open-source Python programming language. This is the first year of the PyCon Pune where the community will meet for two days of talks and working on upstream projects in two days of dev sprint. CFP ends on 30th November AoE.
Creating spiders and crawling websites using Scrapy
It is necessary to extract data from websites for various purposes such as Data Mining, content optimisation, data analysis, archival of historical data etc. Scrapy provides an application framework for extracting structured data by crawling websites. This talk will cover the basics of Scrapy, its installation, and live examples of writing spiders (classes that defined how a website will be scraped) for beginners.
What is scrapy (2 minutes)
Dependencies and requirements (5 minutes)
Installation (2 minutes)
Creating a new Scrapy Project (2 minutes)
Example of a spider that crawls a site and extracts data (5 minutes)
Command line method of exporting the scraped data (5 minutes)
Making the spider follow the links (5 minutes)
Using spider arguments (3 minutes)
Examples (3 minutes)
Python and Java trainer
Managing Trustee of Peoples Education Trust
Contributer in Fedora Project
Pursuing PhD in Data Mining