A weekly Python podcast hosted by Christopher Bailey with interviews, coding tips, and conversation with guests from the Python community. The show covers a wide range of topics including Python programming best practices, career tips, and related software development topics. Join us every Friday morning to hear what's new in the world of Python programming and become a more effective Pythonista.

Web Scraping in Python: Tools, Techniques, and Legality

June 05, 2020 0:50:10 36.24 MB Downloads: 0

Do you want to get started with web scraping using Python? Are you concerned about the potential legal implications? What are the tools required and what are some of the best practices? This week on the show we have Kimberly Fessel to discuss her excellent tutorial created for PyCon 2020 online titled “It’s Officially Legal so Let’s Scrape the Web.”

We discuss getting started with web scraping, and cover tools and techniques. Kimberly gives advice on finding elements inside of the html, and techniques for cleaning your data. She also notes a recent change to the legal landscape regarding scraping the web.

Kimberly is a Senior Data Scientist at Metis Data Science Bootcamp in New York City. She holds a Ph.D. in applied mathematics. We talk about her switch from academia to data science, and discuss her passion for data storytelling and visualizations.

Course Spotlight: Defining Main Functions in Python

This course will get you up to speed with defining a starting point for the execution of a program, and helps you to understand what goes into the main() function. Prepare for a deep dive as you go through the sections. It’s a worthy investment of your time to understand this vital entry point for your Python scripts and applications!

Topics:

  • 00:00:00 – Introduction
  • 00:01:31 – Kimberly’s background and Metis Data Science Bootcamp
  • 00:02:19 – NLP and work in advertising
  • 00:03:27 – Changes in the legality of web scraping
  • 00:06:12 – What are good projects for web scraping?
  • 00:06:56 – Tools to start web scraping
  • 00:07:51 – How to find the elements you want?
  • 00:09:00 – How much HTML should you know?
  • 00:10:49 – Inspecting elements in the browser
  • 00:14:30 – What are good sites to practice on?
  • 00:16:20 – Pausing between requests
  • 00:19:02 – Saving as you go
  • 00:20:54 – Real Python Video Course Spotlight
  • 00:21:55 – Navigating the DOM
  • 00:23:10 – Data cleaning and formatting
  • 00:28:26 – Dynamic sites and Selenium
  • 00:32:16 – Scrapy
  • 00:33:55 – PyOhio 2020
  • 00:35:40 – Transition out of academia
  • 00:38:40 – What are you excited about in the world of Python?
  • 00:41:05 – What do you want to learn next in Python?
  • 00:48:00 – What is a less known Python tip or trick?
  • 00:49:17 – Thanks and Goodbye

Show Links:

Javascript charting detour:

Support the podcast & join our community of Pythonistas