CeWL is a Custom Word List generator which spiders a given site to create a word list of all words it finds on that site. It can also grab email addresses and usernames found in the HTML and in some document types including Office and PDF.
Useful for targeted penetration testing which involves brute force password cracking.
We first wrote about CeWL way back in 2009 not long after it first came out – it’s been updated plenty since then and is now at version 5.1.
There are also a bunch of other similar tools out there (some older some newer):
– Crunch – Password Cracking Wordlist Generator
– The Associative Word List Generator (AWLG) – Create Related Wordlists
– Wyd – Automated Password Profiling Tool
– CUPP – Common User Passwords Profiler – Automated Password Profiling Tool
– RSMangler – Keyword Based Wordlist Generator For Bruteforcing
If you combine the wordlists from the above tools with the commonly found standard password cracking wordlists, you should have a pretty comprehensive, targeted set of lists for bruteforcing with something like John the Ripper, thc-hydra or hashcat.
Usage
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 |
cewl [OPTION] ... URL --help, -h Show help --depth x, -d x The depth to spider to, default 2 --min_word_length, -m The minimum word length, this strips out all words under the specified length, default 3 --offsite, -o By default, the spider will only visit the site specified. With this option it will also visit external sites --write, -w file Write the ouput to the file rather than to stdout --ua, -u user-agent Change the user agent -v Verbose, show debug and extra output --no-words, -n Don't output the wordlist --meta, -a file Include meta data, optional output file --email, -e file Include email addresses, optional output file --meta_file file Filename for metadata output --email_file file Filename for email output --meta-temp-dir directory The directory used used by exiftool when parsing files, the default is /tmp --count, -c: Show the count for each of the words found --auth_type Digest or basic --auth_user Authentication username --auth_pass Authentication password --proxy_host Proxy host --proxy_port Proxy port, default 8080 --proxy_username Username for proxy, if required --proxy_password Password for proxy, if required --verbose, -v Verbose URL The site to spider. |
You can download CeWL v5.1 here:
Or read more here.