I just published RobotsDisallowed, a Github project that finds the most common Disallowed entries in the robots.txt files of the worlds top 100,000 websites.
I have it broken down into Top-n lists that pull out the top 10, 1000, 10000, etc. directories listed—in case you’re pressed for time on your assessment.
But I just added the best list of them all: the InterestingDirectories.txt list. This is a list of the directories from the Top 100K Disallowed entries that have the following words in them:
The other lists are great to have, but if you’re looking to find the highest value hits in the shortest amount of time, this is probably the list to use.
- The RobotsDisallowed project is located here.
- The purpose of this project is to help legitimate web testers find vulnerabilities before the bad guys do. Protip: the bad guys are already doing this.
- Improvement ideas welcome! I’ll put you in the credits. Looking for more good strings to improve the InterestingDirectories list, among other things.