This tutorilal is about how you can code an email crawler to scrape email ids from the web using python.
Coding such crawler can be useful for those who would like to understand how crawling and scraping works, also at the same time it can be a gem for email marketers looking for a way to automate their data extraction tasks.
Before jumping into the coding part, let's take a look at the basic functionality of this scrapper. The crawler has to perfom following tasks to scrape emails :
- Open start page
- Look for emails, add to db if found
- Look for new links, add to crawling queue if found
- Keep crawling untill all pages are crawled
Let's start by importing some libraries,
Step 1: Import Libraries
from urllib.parse import urlsplit
from bs4 import BeautifulSoup