Could Your Git History Be Secretly Hoarding Gigabytes of Deleted Data?

Run du -sh .git in that one project you’ve been working on for three years. If you see a number that makes you blink twice—like 2GB for a codebase that’s mostly React components and CSS—you’ve run into Git’s biggest personality trait: it never forgets. Even if you deleted that massive backup_database.sql or the hero-video.mp4 six months ago, Git is still lugging those bytes around in its hidden object database.

Git is an append-only ledger. When you "delete" a file in a new commit, Git doesn't actually erase the file from its storage. It simply records a new state where that file no longer exists in the current snapshot. The old version of the file remains safely tucked away in the .git/objects folder so you can check out an older commit at any time. This is a feature until you accidentally commit a 500MB log file and realize your git clone times have tripled.

Hunting the heavy hitters

Before you start nuking things, you need to know what’s actually eating the space. The problem is that the .git folder isn't organized by "file size" in a way that's human-readable. It’s a collection of compressed blobs.

To find the biggest offenders in your history, you can use this somewhat dense but effective plumbing command:

git rev-list --objects --all \
  | grep "$(git verify-pack -v .git/objects/pack/*.idx \
  | sort -k 3 -n \
  | tail -10 \
  | awk '{print $1}')"

This pipeline identifies the largest objects in your pack files and maps their hashes back to the file names. If you see a node_modules folder or a .DS_Store that somehow snuck in there in 2021, you’ve found your target.

Rewriting the timeline

Once you've identified a file that shouldn't be there, you have to perform "history surgery." You can't just delete it in a new commit; you have to go back in time and act as if that file was never committed in the first place.

For a long time, git filter-branch was the standard tool for this, but it's notoriously slow and easy to mess up. Nowadays, the community has moved toward git filter-repo. It’s faster and significantly more intuitive.

If you haven't used it, you might need to install it (e.g., brew install git-filter-repo). Here is how you'd scrub a specific massive file from every single commit in your history:

git filter-repo --path path/to/giant-file.zip --invert-paths

The --invert-paths flag tells Git: "Keep everything *except* this file."

Wait, a warning: This command is destructive. It rewrites your commit hashes. If your team has based their work on the old hashes, their local branches will become orphans, and everyone will have a very bad Monday. Only do this if you can coordinate a "stop-work" moment with your team.

Making the space "real"

After running a filter, you might notice that your .git folder size hasn't changed yet. This is because Git is conservative. It keeps the "orphaned" data around just in case you made a mistake and need to recover it via the reflog.

To actually reclaim that disk space immediately, you need to force a garbage collection and expire the reflog:

# Get rid of the safety net
git reflog expire --expire=now --all
# Repack the objects and delete the unreachable ones
git gc --prune=now --aggressive

Now run du -sh .git again. You should see a significant drop.

The "Force Push" fallout

Because you’ve rewritten history, your local repository and the remote (GitHub/GitLab) are now completely different realities. To get your cleaned-up repo back online, you have to force push:

git push origin --force --all
git push origin --force --tags

I've seen developers do this and then watch in horror as their teammates accidentally push the old "heavy" history back up five minutes later. The fix is for every contributor to do a fresh clone of the cleaned repository. Don't try to "pull" the changes into an old local copy; it’s a recipe for merge conflicts from hell.

Preventing future bloat

The best way to keep a repository lean is to never let the junk in. I’m a big fan of using a global .gitignore for things like .DS_Store or local IDE configs, but for project-specific binary assets, Git LFS (Large File Storage) is the proper way to go.

With LFS, Git stores a tiny text pointer in your repository, while the actual 100MB asset lives on a separate server. Your history stays fast, your .git folder stays tiny, and you don't have to play digital surgeon every six months.