Feb 262009

Following on my last entry, where I built a better pre-commit hook to ensure every commit in my Git history passed make check with flying colors, I realized something today about my good friend, rebase.

git rebase is a brilliant tool for editing history you haven’t pushed yet. If I have a set of ten commits, and realize the 3rd commit has an oversight I’d like to smooth out, I can make a new commit to fix the problem and then merge it with that old commit, resulting in a nice, clean set of commits for pushing.

However, using rebase at any time invalidates the work of my pre-commit hook. Why? Because any time I use rebase, I throw away whatever I’ve confirmed in the past about those commits. A rewritten commit is really a new commit, which means it hasn’t been tested, may never have existed as a working tree, and certainly isn’t the same as the previous commit, however similar its diff output may be.

What this goes to show is that immutability is a requirement of sane integration. Not only does code go into a commit, plus a date and a description, but also the work that went into verifying that commit. All of these details are bound up in the immutable state of that commit. If the state is change, and the immutability guarantee broken, all bets are off.

Thus the only way I could use rebase in confidence would be to run the same pre-commit again on every changed commit during the rebase operation — which is something Git doesn’t do. It thinks that if you rewrite a series of commits, the final HEAD will have the same contents as the previous HEAD, which is true. But the rebased commits leading up to that head, especially if their order was changed, now represent a fictitious history behind that new HEAD.

It makes me think more and more about the virtues of merging.

  3 Responses to “The saga of rebase versus merge”

  1. This is a good point as well (and I felt smart when I realised I were thinking about the original pre-commit problem, before you mentioned it!).

    I get the feeling though, that this is the conflict of git being used as a practical tool, and the way git “can” be used to do everything the right way — or at least pretend it was.

    It makes my head hurt… It sure sounds a noble goal to keep every commit being “perfect”, but there are obvious limits to that. For example, to go extreme, it *should* mean every commit which changes behaviour (even when you can’t see it does!) should change the tests as well, otherwise your commit might be untested. This is compared to the current situation, when you want all commits to be tested against some, possibly trivial testing suite. And to do that right obviously hinders outside development.

    Is it as dirty as I think to acknowledge, that testing boundary is different from commits? Or is it that git should have a way to force us to do it perfectly? :)

  2. One thing that you could do is to separate in at least two integration branches: master and devel. You could guarantee that merge commits into master pass muster in the ways you describe, while those in devel needn’t individually (although it makes git-blame and git-bisect much more useful if they do as well).

  3. My current thinking is to use merge on the topic branch, and then do a rebase just after the final merge to master (before pushing the updated master).

    E.g., to merge to master: “git merge $BRANCH; git rebase ORIG_HEAD”

    [If there's tweaking that needs to be done for "perfection", use rebase -i of course]

    I dunno what downsides this approach entails, but at least it concentrates the rebasing to where it’s easier to focus on potential problems… Since it seems to take forever for my topic branches to reach master, that should hopefully mean a lot less work for me.

    The one annoying I do know, is that it means I’ll probably have to re-resolve any merge conflicts (rerere won’t catch them because they happen in a different context).

 Leave a Reply



You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>