Crawler

A crawler to get all a tags of a page. It only crawls the targets domain. If you crawl example.com which has a link to whatever.com, whatever.com will not be crawled.

Instantiation

$crawl = new Crawler();

Set target to crawl

$crawl->crawl('https://example.com');

Get crawled links

Gets all the crawled links of that domain as a one dimensional array.

$crawl->getCrawledLinks();

Defaults

Scheme allowed

http
https

Extensions allowed

html
htm

Options

You can set and get allowed schemes and file extensions.

Setting allowed file extensions

$crawl->allowed('set', 'allowedFiles', '.pdf', '.png');

Removing allowed file extensions

$crawl->allowed('remove', 'allowedFiles', '.pdf', '.png');

Removing allowed schemes

$crawl->allowed('remove', 'allowedSchemes', 'http');

Getting allowed file extensions

$crawl->allowed('get', 'allowedFiles');

Getting allowed schemes

$crawl->allowed('get', 'allowedSchemes');

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Crawler

Instantiation

Set target to crawl

Get crawled links

Defaults

Scheme allowed

Extensions allowed

Options

Setting allowed file extensions

Removing allowed file extensions

Removing allowed schemes

Getting allowed file extensions

Getting allowed schemes

Files

README.md

Latest commit

History

README.md

File metadata and controls

Crawler

Instantiation

Set target to crawl

Get crawled links

Defaults

Scheme allowed

Extensions allowed

Options

Setting allowed file extensions

Removing allowed file extensions

Removing allowed schemes

Getting allowed file extensions

Getting allowed schemes