Pitch: Github Takedown Tracker

Posted Jul 31, 2024 Updated Oct 26, 2024

By Chris Henk

5 min read

You can find a lot of mistakes in the noise by looking for retractions

At various points in my career, I’ve worked on things that are tented but end of being publicly disclosed to some degree due to miscommunication. The severity varies, but can be substantial. For instance, you may have an in progress project that you intend to release as open source, but the repo is switched to public well before it’s ready. This posses numerous risks:

Sloppy, doesn’t look good for your brand
Discloses what you’re working on which can:
- Eliminate first mover advantage
- Draw attention to a problem that isn’t well known before the solution is ready
Shows the draft state, which isn’t tailored for an external audience
May not have been audited yet

With most unintentional disclosures, people seem to get away relatively unscathed. If you’re using proper tools, you don’t distribute the resource itself you distribute access to the resource. This means it can be revoked, and unless someone has gone out their way to make a copy you’ve probably managed to scrub it from the internet.

This is where git repos pose an interesting conundrum. You can revoke access to the git host, by their nature git repos tend to promulgate between many computers. It’s normal and expected to pull down local copies of everything, and for people working in the repo it happens automatically. Say you accidentally check in secrets. It’s not as simple a matter as deleting it or cutting access. You need to rewrite the history, and ever copy needs to have its history scrubbed.

While you should always rotate leaked secrets / credentials, if ever there’s a time to do so it’s after they’ve been accidentally checked in to git!

Even if the repo doesn’t have credentials, accidentally making it public means the same challenges for scrubbing exist. Worse, you can rotate a secret but you can’t rotate that design spec. On the bright side, hopefully no one is looking and cloning what you publish.

But don’t be so sure. Bots are everywhere, as are researchers, and even people who are just curious. In fact, let’s consider how little work it would take to catch, clone, and notice these accidental repo publishes.

Finding Leaks at Scale

As an attacker or researcher, the challenge is that the world is full of noise. So much is published every day, how do I know what’s worth inspecting? There are numerous strategies:

Recently created
Recently removed
Short life span
Removed without going through normal state changes (e.g. period of staleness, archive mode first, etc)
Flagged for key terms (scan for words, credentials, etc.)
Filtering to only more interesting targets (big companies, small companies, key individuals)

Back in my internship at Citibank, ransomware was on the rise. Most big companies have lists of banned IPs and domains to protect against websites that distribute malware or act as command and control. More sophisticated attackers use nonsense, rotating domains for this purpose. They encode the list in a strange manner as a sort of weak encryption (because using good encryption is actually easy to detect and flag as interesting) to hide the list in the malware. Some use a sort of time based hash to generate domain names. Domain registration is cheap, so it’s well worth the expense to rotate them every week or so. As a security engineer, one of my projects was to create DNS mirrors that would note the daily deltas. If a domain was considered “new”, it was flagged as extremely suspicious. Other heuristics could upgrade to a default block as well. Additionally, we tracked if Citibank was connecting to new domains. It’s a big company, it’s pretty weird for you to connect to something no one else has before or recently.
While the exact nature of the system has certainly changed since my fellow intern and I stood it up, the point remains. Novelty is noteworthy, and a potential sign of intrusion.

Github Takedown Tracker

A bot to detect accidential repo publishes and subsequent retractions would be an interesting project out there for people looking for something to do, or students building out their portfolio. It’s a very approachable problem to solve. With something simple like a $5/mo Digital Ocean Droplet, you could have a bot that:

Periodically checks the GitHub organizations of major companies for their list of repos
New repos are downloaded
Periodically take snapshots, say weekly. This could scale relative to repo age and frequently of updates. Remove older snapshots or prune git history as needed if doing clones to stay lean
Flag interesting changes in a weekly report:
1. Repos removed
2. Large scale changes
3. New repos

In a moment’s glance, you’ll be able to notice whenever a big company tries to undo their mistake! You can learn a lot and even maybe make a bit of a name for yourself. Just remember, if you do catch anyone with their pants down reach out to them first to make a responsible disclosure and give them a chance to remedy before you share the story of your finding.

Defending

The biggest defense is to not leak info. But here are some tips for minimizing the impact when it happens:

Share access, not content
Keep access / audit logs
Balance the risk of keeping the information up vs the risk of calling attention to it by removing it
Ensure removals actually remove; lots of products have automatic history / change tracking now
Disclose issues to your security team immediately

software

Finding Leaks at Scale

Github Takedown Tracker

Defending

Trending Tags