Day 15: Python – Notes on Webscraping

A summary from the enticing post “How to Web Scrape with Python in 4 Minutes”:

LIBRARIES:

import requests
import urllib.request
import time
from bs4 import BeautifulSoup

Accessing the target URL:

url = ‘http://web.mta.info/developers/turnstile.html'
response = requests.get(url)

Nest the data using BeautifulSoup data structure (See  the BeatifulSoup documentation).

soup = BeautifulSoup(response.text, “html.parser”)

Search for links:

soup.findAll('a')

Extract the link:

one_a_tag = soup.findAll(‘a’)[36]
link = one_a_tag[‘href’]

Another approach is at A Beginner’s Guide to learn web scraping with python!. This uses Selenium (a web testing library for automating browser activities), BeautifulSoup (for parsing HTML and XML documents), Pandas (for data manipulation and analysis – to extract the data and store it in the desired format).

More approaches:

Happy Scraping!

 

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s