Skip to content

Repository contains a web crawler that searches for emails in a webpage, along with a webscraping script that collects leads from various webpages online filters those links based on some criteria and adds the new links to a queue. All the HTML or some specific information is extracted to be processed by a different pipeline.

Notifications You must be signed in to change notification settings

ddayto21/Lead-Scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Repository Overview

This repository was built to provide business owners a way to save time by collecting thousands of business leads from Yellow Pages, a website that contains over 27 million businesses in the United States.

Python-Cover

We use 'requests', a Python library to collect large amounts of unstructured data from Yellow Pages. Then, we use BeautifulSoup to parse relevant information from HTML format. After this process, we use Pandas to create dataframes and save those leads to .CSV files that can be used for marketing campaigns.

Install the 'Requests' Library

$ pip install requests

Import the Requests Library

import requests

Send HTTP Request to Server

response = requests.get(url)

Extract Relevant Data from Response

We use BeautifulSoup, a Python library that makes it easy to parse data in HTML files.

Install the Beautiful Soup Library

$ pip install beautifulsoup4

Import the Beautiful Soup Library

 from bs4 import BeautifulSoup

About

Repository contains a web crawler that searches for emails in a webpage, along with a webscraping script that collects leads from various webpages online filters those links based on some criteria and adds the new links to a queue. All the HTML or some specific information is extracted to be processed by a different pipeline.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages