PyCon Pune 2017

A conference on the Python programming language

Mohan Prakash

@mpduty

Creating spiders and crawling websites using Scrapy

Submitted Nov 29, 2016

It is necessary to extract data from websites for various purposes such as Data Mining, content optimisation, data analysis, archival of historical data etc. Scrapy provides an application framework for extracting structured data by crawling websites. This talk will cover the basics of Scrapy, its installation, and live examples of writing spiders (classes that defined how a website will be scraped) for beginners.

Outline

What is scrapy (2 minutes)
Dependencies and requirements (5 minutes)
Installation (2 minutes)
Creating a new Scrapy Project (2 minutes)
Example of a spider that crawls a site and extracts data (5 minutes)
Command line method of exporting the scraped data (5 minutes)
Making the spider follow the links (5 minutes)
Using spider arguments (3 minutes)
Examples (3 minutes)

Speaker bio

Python and Java trainer
Managing Trustee of Peoples Education Trust
Contributer in Fedora Project
Pursuing PhD in Data Mining

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hosted by

PyCon Pune 2017 more