Cleaning Up Git Repositories With The BFG Repo-Cleaner

Intro

Have you got some Git repositories that are exploding in size and using up too much of your bandwidth just to do a mere clone?
Have you ever wondered if there is any way to filter or cleanup your Git repository without using the complex git-filter-branch command?

The Problem

While developing of one of our major projects during the course of the last year we found ourselves struggling with the size of the repository. This happened because some huge binary files were committed during the initial phase and then updated by newer binaries throughout the project.
Where in the beginning it was not a problem at all to work with the Git repository, it became more and more difficult during the last phases of the project. Especially after moving the repository from an on-demand provider to our own company-wide Git server. The main problem was that pulling or cloning the repository from an external network caused the connection to be closed, but worked seamlessly from the company local network. (* later we figured out that there were some proxy and server issues throttling the traffic – but having a small repo size is always a good thing! :-))

The Solution: BFG Repo-Cleaner

BFG REPO-CLEANER

 

A really cool tool by Roberto Tyley (who happenes to be developing all kinds of java based tools):
“Removes large or troublesome blobs like git-filter-branch does, but faster – and written in Scala”

From the docs:

The BFG is a simpler, faster alternative to git-filter-branch for cleansing bad data out of your Git repository history:

  • Removing Crazy Big Files
  • Removing Passwords, Credentials & other Private data

The git-filter-branch command is enormously powerful and can do things that the BFG can’t – but the BFG is much better for the tasks above, because it’s:

  • Faster : 10 – 720x faster
  • Simpler : The BFG isn’t particularly clever, but is focused on making the above tasks easy
  • Beautiful : If you need to, you can use the beautiful Scala language to customize the BFG. Which is better than Bash scripting, or at least most of the time.

With easy terminal commands like the following ones you can start the cleanup process:

Delete all files named ‘id_rsa’ or ‘id_dsa’ :

Remove all blobs bigger than 1 megabyte :

Replace all passwords listed in a file (prefix lines ‘regex:’ or ‘glob:’ if required) with ***REMOVED*** wherever they occur in your repository :

These and many more details and the most recent docs can be found at http://rtyley.github.io/bfg-repo-cleaner/

Our Results

Using BFG Repo-Cleaner and doing some dry runs on a cleanly mirrored repository, we were able to reduce the size of the repository from 380~450 Megabytes down to 35 Megabytes.
I would highly suggest using it if you fear your repositories are gaining size and weight.

Showcase

Here is a video that compares git-filter-branch and BFG: git-filter-branch is running on a quad-core Mac clocked at 3.4 Ghz, while BFG is running on a Raspberry Pi.

[youtube http://www.youtube.com/watch?v=Ir4IHzPhJuI&w=560&h=315]

Never miss an update by following us and subscribing to our monthly newsletter!

Summary
Cleaning Up Git Repositories With The BFG Repo-Cleaner
Article Name
Cleaning Up Git Repositories With The BFG Repo-Cleaner
Description
Have you got some Git repositories that are exploding in size and using up too much of your bandwidth just to do a mere clone?
Author
Publisher Name
Atos Consulting CH
Publisher Logo

Leave a Reply

Your email address will not be published. Required fields are marked *