Random bits of useless information: version control

Showing posts with label version control. Show all posts

Wednesday, 13 September 2017

Inserting a new commit tree into Git history

Git does an excellent job of making it easy to moved change sets around in the history via git rebase. Most of the time this is exactly what you need to do when altering history.

However, sometimes you want to insert a complete tree at a particular point in the history. You already have exactly that tree somewhere else in a different commit. Since Git actually works in terms of entire trees underneath (compressing them using deltas for storage only) this isn't particularly hard to do.

I needed to do this with a branch we used for storing upstream releases that we receive as tarballs. The history looks like this:

* b9a5c3889cb Import from linux-4.1-1.12.tar.bz2
* 0e9df32da3f Import from linux-4.1-1.9.tar.bz2
* 714ffa13302 Import from linux-4.1-1.6.tar.bz2
* 5d33c6e6665 Import from linux-4.1-1.3.tar.bz2
* 0434412e517 Import from linux-4.1-1.2.tar.bz2

Now suppose that before we've merged b9a5c3889cb we receive a linux-4.1.7.tar.bz2 tarball and want to insert it into the history. So, let's check out 714ffa13302 and import the tarball as usual.

git checkout 714ffa13302
rm -rf *
tar xjf /tmp/linux-4.1-1.7.tar.bz2
git add -A .
git commit -m "Import from linux-4.1.7.tar.bz2"

So, now we have:

* 31124b678e5 Import from linux-4.1-1.7.tar.bz2
* 714ffa13302 Import from linux-4.1-1.6.tar.bz2
* 5d33c6e6665 Import from linux-4.1-1.3.tar.bz2
* 0434412e517 Import from linux-4.1-1.2.tar.bz2

If we try and rebase b9a5c3889cb on top of 31124b678e5 then we'll just get a huge amount of conflicts as Git tries to apply the differences rather than the tree.

Instead we need to tell Git to construct a new commit with the correct parent:

new_parent=31124b678e5
transplanted_commits="0e9df32da3f..b9a5c3889cb"

git rev-list --reverse ${transplanted_commits} | while read commit; do
    echo "Dealing with ${commit}"
    new_parent=$(git cat-file commit ${commit} |
    awk '/^$/ {emit=1} { if (emit) print }' |
    git commit-tree -p ${new_parent} ${commit}^{tree} -F -)
    echo "new_parent ${new_parent}"
done

git commit-tree prints the new commit hash on stdout, so after running this you can just read out the last new_parent and run something like:

git checkout -b fixed-branch <last-new_parent>

and the history on fixed-branch will have been rewritten to contain the extra new commit.

The awk magic is required to remove the header information from the commit message before reusing it for the replacement commit.

Of course, better shell scripting could turn this into a more generic tool, but this was good enough for me.

I found the git commit-tree trick in the answers to this StackOverflow question.

Monday, 18 February 2008

Why I Like Perforce

After lots of articles explaining why I hate Perforce I thought it only fair to write a few explaining some of the things I like about it. I'm sure that other version control systems do a better job than Perforce does with some of these things but in my opinion Perforce does them better than CVS and current stable versions of Subversion at least.

Merge tracking

Perforce keeps track of what previous changes have been merged (or integrated) into a working tree and commits this information along with the files when the files are submitted. This means that it is often trivial to merge changes in from a branch, do a quick build to check that everything is fine and then check them in.

Change lists

Although it doesn't excuse the submit command not taking multiple filename arguments I think that I mostly like the idea of being able to group my changed files into change lists. The change list can be created and the diff checked over before finally checking it in. There's a certain risk of failing to check important files in if they also happen to be in a different long-lived changelist with this tactic though. It would be ideal if somehow change lists could be at less than file granularity but I'm not sure how this could be implemented without offering a list of patch hunks.

Perforce Proxy

P4P is essential when working remotely with a large depot. It intelligently caches file revisions that pass through it so that future requests for those files can just be retrieved from the cache greatly increasing performance and reducing network traffic. It's not perfect in that if you submit a file through the proxy it doesn't appear to cache the contents immediately thus forcing a further download of the time. Nevertheless if you have many clients or multiple users in the same location then a proxy is worth the tiny amount of effort it takes to set it up.

Update 2008/11/17: Subversion 1.5 now supports (to some degree) all of these features.

Monday, 11 February 2008

Why I Hate Perforce: 4. It's difficult to defer existing work

This is part of a series of articles explaining why I hate Perforce. Please see "Why I Hate Perforce: The Background" first.

The real world being the way it is work is often started or even mostly completed and then something more important comes along which means that work must be deferred, possibly indefinitely. It is important, if only for programmer self esteem, to archive that work safely before continuing. This needs to be done with the minimal of effort and risk because it generally happens only when something urgent needs to be done.

There are a number of ways of doing this.

1. If the work had been done on a task branch then any pending changes can just be checked in and the branch kept around but not merged for as long as necessary.

Unfortunately task branches have overhead and aren't always used. Creating a task branch retrospectively would seem like a sensible tactic but is hard work with Perforce because there's no equivalent to cvs up -r or svn switch to switch to a branch whilst preserving changes in the working copy. I've tried just updating the client spec to point to the new branch hoping that it would offer to merge the changes but Perforce just complains that it cannot clobber the files since they are opened for editing.

2. Keep the working copy (or client in Perforce terminology) around forever. The downside to this is that a large amount of disk space could be taken up and any finger macros may need to be re-learnt. It is also hard for someone else to continue the work because the Perforce client will be owned by the original author.

3. Archiving the entire working copy as a unit (e.g. using tar(1) or zip(1)) then revert the files in the working copy so that work can continue. This doesn't work well with Perforce because the working copy state is stored on the server. In order to do anything meaningful with the archive you'd need to revert your working copy back to the current revision at the time the working copy was created. If this isn't done there's a risk of confusion as to where changes were made. Other systems that keep sufficient state in the working copy (such as CVS and Subversion) don't suffer from this problem. In fact the working copy can be moved to a different location (or even a different machine) and work can continue there.

4. Produce a patch based on the current state of the depot that can be applied later. This would be a perfectly good solution if it weren't such a pain to generate sensible patch files with Perforce. Having tried hard to make p4 diff generate something acceptable to patch(1) I ended up writing a Ruby script to do it. This script is available from my Perforce Scripts page.

When I had to do this recently I ended up taking option 4. It did seem to work but it was far more effort than I expected. Next time it should be easier because I've already got the script!

Now read about Why I Like Perforce.

Edit 2010/12/01: Since this article was written Perforce 2009.2 has introduced shelving. This is certainly useful but doesn't solve many of the problems raised here. In particular changes can only be unshelved back to the same location in the depot (albeit perhaps on a different client spec or by a different user.) This means that moving the changes to a different branch is just as painful as is creating a branch retrospectively for shelved changes.

Random bits of useless information