Wednesday 13 September 2017

Inserting a new commit tree into Git history

Git does an excellent job of making it easy to moved change sets around in the history via git rebase. Most of the time this is exactly what you need to do when altering history.

However, sometimes you want to insert a complete tree at a particular point in the history. You already have exactly that tree somewhere else in a different commit. Since Git actually works in terms of entire trees underneath (compressing them using deltas for storage only) this isn't particularly hard to do.

I needed to do this with a branch we used for storing upstream releases that we receive as tarballs. The history looks like this:

* b9a5c3889cb Import from linux-4.1-1.12.tar.bz2
* 0e9df32da3f Import from linux-4.1-1.9.tar.bz2
* 714ffa13302 Import from linux-4.1-1.6.tar.bz2
* 5d33c6e6665 Import from linux-4.1-1.3.tar.bz2
* 0434412e517 Import from linux-4.1-1.2.tar.bz2

Now suppose that before we've merged b9a5c3889cb we receive a linux-4.1.7.tar.bz2 tarball and want to insert it into the history. So, let's check out 714ffa13302 and import the tarball as usual.

git checkout 714ffa13302
rm -rf *
tar xjf /tmp/linux-4.1-1.7.tar.bz2
git add -A .
git commit -m "Import from linux-4.1.7.tar.bz2"

So, now we have:

* 31124b678e5 Import from linux-4.1-1.7.tar.bz2
* 714ffa13302 Import from linux-4.1-1.6.tar.bz2
* 5d33c6e6665 Import from linux-4.1-1.3.tar.bz2
* 0434412e517 Import from linux-4.1-1.2.tar.bz2

If we try and rebase b9a5c3889cb on top of 31124b678e5 then we'll just get a huge amount of conflicts as Git tries to apply the differences rather than the tree.

Instead we need to tell Git to construct a new commit with the correct parent:

new_parent=31124b678e5
transplanted_commits="0e9df32da3f..b9a5c3889cb"

git rev-list --reverse ${transplanted_commits} | while read commit; do
    echo "Dealing with ${commit}"
    new_parent=$(git cat-file commit ${commit} |
    awk '/^$/ {emit=1} { if (emit) print }' |
    git commit-tree -p ${new_parent} ${commit}^{tree} -F -)
    echo "new_parent ${new_parent}"
done

git commit-tree prints the new commit hash on stdout, so after running this you can just read out the last new_parent and run something like:

git checkout -b fixed-branch <last-new_parent>

and the history on fixed-branch will have been rewritten to contain the extra new commit.

The awk magic is required to remove the header information from the commit message before reusing it for the replacement commit.

Of course, better shell scripting could turn this into a more generic tool, but this was good enough for me.

I found the git commit-tree trick in the answers to this StackOverflow question.