TL;DR
I created a Twitter-bot which monitors multiple paste sites for different types of content (account/database dumps, network device configuration files, etc.). You can find it on Twitter and on Github.
Introduction
Paste-sites such as Pastebin, Pastie, Slexy, and many others offer users (often anonymously) the ability to upload raw text of their choice. This is helpful in many scenarios, such as sending a crash report to someone or pasting temporary code. However, in addition to some people not being careful with what they upload (leaving passwords and other sensitive data in the text), attackers have been starting to use these sites to share post-compromise data, including user account data, database dumps, URLs of compromised sites, and more.
Since there are so many users uploading text to these sites, it’s often difficult to find these interesting files manually. While techniques such as Google Alerts can be applied, the results are often a day or two old and are sometimes deleted. This prompted me to create a tool which monitors these sites in “real-time” (less than a minute of delay for the slowest sites) for specific expressions, and then automatically rank, aggregate, and post these results to Twitter for further analysis. I call this tool DumpMon.
There are a couple of similar tools available which do essentially the same thing as dumpmon – with just a few key differences:
- @PastebinLeaks – with its last tweet on December 16, 2011, PastebinLeaks no longer appears to provide pastebin monitoring. However, I really like how it integrated quite a few different expressions, such as one for HTTP passwords, Cisco and Juniper configuration files, etc. Unfortunately, as far as I can tell PastebinLeaks is closed-source.
- @PastebinDorks – This bot (intentionally closed-source, still in “alpha”) is still active and posts a few tweets per day. This bot appears to be primarily concerned with account credential dumps. I think the idea of assigning a numerical rank to a tweet could help determine the usefulness of a paste, but it makes the actual data found unclear.
- Open-Source. I’m always open to contributions via Github. I’m working on creating all the documentation – should be up soon.
- Monitors more than just Pastebin (full site listing in Appendix)
- Supports multiple file types (ie the Cisco configuration files and honeypot logs)
- For large account dumps, simply gives you the raw information (Emails: x, Hashes: y) directly in tweet
- Automatically run found hashes through large wordlists and posting results
- Allow users to tweet a regular expression they want monitored to the bot. The bot will then tweet them the paste once it finds a match
- Search for interesting details from other sources of information (such as popular forums, etc.) instead of just paste sites
- Allow caching of “most interesting” results to prevent deletion
- Create daily/monthly reports that show the amount of detected data for aiding in password research
It’s commonly that the most time-expensive part of web scraping is actually fetching the content. While I could go about speeding up this process by completely using an event-driven framework such as Gevent, Twisted, or others, I wanted to do my best to my best to respect the sites hosting the content. Also, I didn’t want the tool to get temporarily blocked… For a third time (my bad, Pastebin). With this being the case, my bot uses the following algorithm to only get new pastes using polite time constraints.
Appendix
Currently, dumpmon supports the following paste-types:
- Account/Database dumps
- Google API Keys
- Cisco Configuration Files (Juniper to be added soon)
- Honeypot Log Dumps
Dumpmon also supports the following paste-sites:
If you can think of any other paste sites you want added, let me know!


