Pradhvan Bisht

@pradhvan

A lazy programmer's guide to web scraping

Submitted Aug 10, 2017

As the ever increasing data in the today’s world data can be used in various ways. Thus web scraping helps you obtain that data efficiently and also saves a ton of time by making the data collection an automated task.

So the talk will be focused on getting started in web scraping using python. Scraping in python can be done in various different ways, the aim of this talk to provide the attendees with nitty-gritty details so that at the end of the talk, attendees will be able to judge on their own what approach to take and what libraries/tools to use depending on the problems they intend to solve. The talk will cover useful scraping libraries/tools and neat tricks and techniques required to scrape even the hard-to-scrape sites effectively. Hard-to-scrape can be described as sites which load the DOM with Javascript, or need authentication, or require captchas , involving cookies, e. t.c.

Also the talk will go through some real world example codes to give the attendees a gist of what all it takes to successfully extract the data they require. At the end some scraping ethics will be mentioned so that one doesn’t end up putting anyone in trouble.

Outline

This is the basic outline of the talk which may have some changes when the final talk is delivered

  1. What is web scraping ?
  2. Why should you scrape ?
  3. Things that might come handy
  4. How it’s done
  5. Comparing Parsers
  6. Preserving the data
  7. Code Examples
  8. What to use when to use
  9. Scraping Hacks
  10. Ethics of Scraping
  11. Q/A and General Discussion

Requirements

Prerequisites:
1.Basic HTML and CSS knowledge.
2.Knowledge of HTTP methods GET and POST .
3.Familiarity with python language.

Speaker bio

Hello world !
I am a web developer and an open source enthusiast, I am also a Python lover and use it to automate everything I can.
Being an open source developer I am an active member of the various local user group that supports and promote open source.
Recently I spoke at PyDelhi Conf 2017 about automation using python and my talk was titled as “A lazy programmer’s guide to automation”.

Slides

https://docs.google.com/presentation/d/1vH8iglKUqzzydG0NK_lW0TtghFxu6U29KHrOGlHNmEk/pub?start=false&loop=false&delayms=5000&slide=id.p

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hosted by

Submit Proposals for PyConf Hyderabad more