PyCon Pune 2017

A conference on the Python programming language

PyCon, the gathering for the community using and developing the open-source Python programming language. This is the first year of the PyCon Pune where the community will meet for two days of talks and working on upstream projects in two days of dev sprint. CFP ends on 30th November AoE.

Hosted by

PyCon Pune 2017 more

Mohan Prakash

@mpduty

Creating spiders and crawling websites using Scrapy

Submitted Nov 29, 2016

It is necessary to extract data from websites for various purposes such as Data Mining, content optimisation, data analysis, archival of historical data etc. Scrapy provides an application framework for extracting structured data by crawling websites. This talk will cover the basics of Scrapy, its installation, and live examples of writing spiders (classes that defined how a website will be scraped) for beginners.

Outline

What is scrapy (2 minutes)
Dependencies and requirements (5 minutes)
Installation (2 minutes)
Creating a new Scrapy Project (2 minutes)
Example of a spider that crawls a site and extracts data (5 minutes)
Command line method of exporting the scraped data (5 minutes)
Making the spider follow the links (5 minutes)
Using spider arguments (3 minutes)
Examples (3 minutes)

Speaker bio

Python and Java trainer
Managing Trustee of Peoples Education Trust
Contributer in Fedora Project
Pursuing PhD in Data Mining

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hosted by

PyCon Pune 2017 more