dcrawl – Web Crawler For Unique Domains


dcrawl is a simple, but smart, multithreaded web crawler for randomly gathering huge lists of unique domain names.

dcrawl - Web Crawler For Unique Domains


How does dcrawl work?

dcrawl takes one site URL as input and detects all a href= links in the site’s body. Each found link is put into the queue. Successively, each queued link is crawled in the same way, branching out to more URLs found in links on each site’s body.

dcrawl Web Crawler Features

  • Branching out only to predefined number of links found per one hostname.
  • Maximum number of allowed different hostnames per one domain (avoids subdomain crawling hell e.g. blogspot.com).
  • Can be restarted with same list of domains – last saved domains are added to the URL queue.
  • Crawls only sites that return text/html Content-Type in HEAD response.
  • Retrieves site body of maximum 1MB size.
  • Does not save inaccessible domains.

dcrawl Usage


Example:

There are other tools which do similar things, or could be scripted together recursively to perform a similar kind of task – but nothing this focused. Examples would be:

Host-Extract – Enumerate All IP/Host Patterns In A Web Page
Recon-ng – Web Reconnaissance Framework

You can download dcrawl go web crawler here:

dcrawl-master.zip

Or read more here.

Posted in: Hacking Tools

, ,


Latest Posts:


LambdaGuard - AWS Lambda Serverless Security Scanner LambdaGuard – AWS Lambda Serverless Security Scanner
LambdaGuard is a tool which allows you to visualise and audit the security of your serverless assets, an open-source AWS Lambda Serverless Security Scanner.
exe2powershell - Convert EXE to BAT Files exe2powershell – Convert EXE to BAT Files
exe2powershell is used to convert EXE to BAT files, the previously well known tool for this was exe2bat, this is a version for modern Windows.
HiddenWall - Create Hidden Kernel Modules HiddenWall – Create Hidden Kernel Modules
HiddenWall is a Linux kernel module generator used to create hidden kernel modules to protect your server from attackers.
Anteater - CI/CD Security Gate Check Framework Anteater – CI/CD Security Gate Check Framework
Anteater is a CI/CD Security Gate Check Framework to prevent the unwanted merging of filenames, binaries, deprecated functions, staging variables and more.
Stardox - Github Stargazers Information Gathering Tool Stardox – Github Stargazers Information Gathering Tool
Stardox is a Python-based GitHub stargazers information gathering tool, it scrapes Github for information and displays them in a list tree view.
ZigDiggity - ZigBee Hacking Toolkit ZigDiggity – ZigBee Hacking Toolkit
ZigDiggity a ZigBee Hacking Toolkit is a Python-based IoT (Internet of Things) penetration testing framework targeting the ZigBee smart home protocol.


Comments are closed.