Skip to content

Latest commit

 

History

History
56 lines (45 loc) · 1.05 KB

README.md

File metadata and controls

56 lines (45 loc) · 1.05 KB

Crawler

A crawler to get all a tags of a page. It only crawls the targets domain. If you crawl example.com which has a link to whatever.com, whatever.com will not be crawled.

Instantiation

$crawl = new Crawler();

Set target to crawl

$crawl->crawl('https://example.com');

Get crawled links

Gets all the crawled links of that domain as a one dimensional array.

$crawl->getCrawledLinks();

Defaults

Scheme allowed

  • http
  • https

Extensions allowed

  • html
  • htm

Options

You can set and get allowed schemes and file extensions.

Setting allowed file extensions

$crawl->allowed('set', 'allowedFiles', '.pdf', '.png');

Removing allowed file extensions

$crawl->allowed('remove', 'allowedFiles', '.pdf', '.png');

Removing allowed schemes

$crawl->allowed('remove', 'allowedSchemes', 'http');

Getting allowed file extensions

$crawl->allowed('get', 'allowedFiles');

Getting allowed schemes

$crawl->allowed('get', 'allowedSchemes');