Skip to content

JustinThiede/crawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Crawler

A crawler to get all a tags of a page. It only crawls the targets domain. If you crawl example.com which has a link to whatever.com, whatever.com will not be crawled.

Instantiation

$crawl = new Crawler();

Set target to crawl

$crawl->crawl('https://example.com');

Get crawled links

Gets all the crawled links of that domain as a one dimensional array.

$crawl->getCrawledLinks();

Defaults

Scheme allowed

  • http
  • https

Extensions allowed

  • html
  • htm

Options

You can set and get allowed schemes and file extensions.

Setting allowed file extensions

$crawl->allowed('set', 'allowedFiles', '.pdf', '.png');

Removing allowed file extensions

$crawl->allowed('remove', 'allowedFiles', '.pdf', '.png');

Removing allowed schemes

$crawl->allowed('remove', 'allowedSchemes', 'http');

Getting allowed file extensions

$crawl->allowed('get', 'allowedFiles');

Getting allowed schemes

$crawl->allowed('get', 'allowedSchemes');

About

A crawler to get all links of a specific page.

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages