This article is part of a sequence:
« Previously
Collect the lists of White House press briefings
Let's batch download a list of White House press briefings URL using Python and Requests.
Next »
Extracting absolute URLs from White House press briefings listings
Before we can download each press briefing, we need to extract their URLs from each of the downloaded index pages.
Table of contents
- Converting HTML text into a data object
- Importing the BeautifulSoup constructor function
- The "soup" object
- Extracting text from soup
- Finding a tag with find()
- Extracting attributes from a tag with attrs
- Finding multiple elements with find_all
- Finding nested elements
- Real world example.com
- Extracting individual press briefings URLs from the White House press briefings list
- Examining the source HTML behind each press release tag
- Processing the press briefings page as soup
- All together