Git 201: Safely Using Rebase

This post is part of a Git 201 series on keeping your commit history clean.  The series assumes some prior knowledge of Git, so you may want to start here if you’re new to git.

✧✧✧

The rebase tool in git is extremely powerful, and therefore also rather dangerous. In fact, I’ve known engineers (usually those new to git) who won’t touch it at all. I hope this will convince you that you really can use it safely, and it’s actually very beneficial.

What is rebase?

To start, I’ll very briefly touch on what a rebase actually is. So, very briefly then, a rebase is a way to rewrite the history of a branch. Let’s assume you’re starting out with a standard working branch from master:

At this point, let’s say you want to update your branch to contain the new commits on master (i.e., D and F), but you don’t want to create a merge commit in the middle of your work. You could rebase instead:

git rebase master

This command will rewrite your current branch as though it had originally been created starting from F (the tip of master). In order to that, though, it will need to re-create each of the commits on your branch (C, E, and G). Remember that a commit is the difference applied to some prior commit. In our example, C is the changes applied to B. E contains changes applied to B, and G contains changes applied to E. Rebasing will be that we need to change things around so that C actually has F as a parent instead of B.

The problem is that git can’t just change C’s parent because there’s no guarantee that the changes represented by C will result in the same codebase when applied to F instead of B. It might be that you’d wind up with some completely different code if you did that. So, git needs to figure out what result C creates, and then figure out what changes to apply to F in order to create the same result. That will yield a completely new commit which we’ll call CC. Since E was based upon C, which has been replaced, git will need to create a new commit using the same process, which we’ll call EE. And, since E has been removed, that means we’ll need to replace G with GG. Once all of the commits have been created, git moves the branch pointer to the end of the newest commit:

While all this seems complicated, it’s all hidden inside of git, and you don’t really have to deal with any of it.  In the end, using rebase instead of merge just means changing a single command, and your commit history is simpler because it appears as though you created your branch from the right place all along.  If you’d like a much fuller tutorial with loads of depth, I’d recommend you head over here.

Rebasing across repos

If you’re working on a branch which only exists locally, then rebasing is pretty straight-forward to work with. It’s really when you’re working across multiple clones of a repo (e.g., your local clone, and the one up on GitHub) that things become a little more complicated.

Let’s say you’ve been working on a branch for a while, and somewhere along the way, you pushed the branch back to the origin (e.g., GitHub). Later on, though, you decide you want to rebase to pick up some changes from master. That leaves you in the state we see in this diagram:

If you were to pull right now, git would freak out just a bit. Your local version of the branch seems to have three new commits on it (CC, EE, and GG) while it’s missing three others (C, E, and G). Then, when git checks for merge conflicts, there’s all sorts of things which seem to conflict (C conflicts with CC, E conflicts with EE, etc.). It’s a complete mess.

So, the normal thing to do here is to force git to push your local version of the branch back to origin:

git push -f

This is telling git to disregard any weirdness between the local version of the branch and origin’s version of the branch, and just make origin look like your local copy. If you’re the only one making changes to the branch, this works just fine. The origin gets your new branch, and you can move right along. But… what if you aren’t the only one making changes?

Where rebasing goes wrong

Imagine if someone else noticed your branch, and decided to help you out by fixing a bug. They clone the repository, checkout your branch, add a commit, and push. Now the origin has all your changes as well as the bug fix from your friend. Except, in the meantime, you decided to rebase. That would mean you’re in a situation like this:

Now you’re stuck. If you pull in order to get commit H, you’re going to have all sorts of nasty conflicts. However, if you force push your branch back to origin (to avoid the conflicts), you’re going to lose commit H since you’re telling git to disregard the version of the branch on origin. And, if your friend neglected to tell you about the bug fix, you might do exactly that and never even realize.

Solution 1: Communicate

The best way to fix the problem is to avoid it in the first place. Communicate clearly with your teammates that this branch is a working branch, and that they shouldn’t push commits onto it. It’s a good idea for teams to adopt some clear conventions around this to make this kind of mistake hard to make (e.g., any branch stating with a username should only be changed by that user, branches with “shared”, “team” or some other prefix are expected to have multiple contributors).

If you can’t be sure you’re only one working on a branch, the next best thing is, before starting the rebase, talk with anyone who might be working with the branch. Say that you’re going to rebase it, and what they should expect. If anyone speaks up that they’re working on changes to that branch, then you know to hold off.

Once everyone has pushed up any outstanding changes, pull down the latest version of the branch, rebase, and then push everything back up as soon as possible. That looks like this:

git checkout mybranch
git pull
git rebase master
git push -f

Once you’ve finished, you’ll want to tell the other people working on the branch that they need to get the fresh version of the branch for themselves. That looks like:

git checkout master
git branch -D mybranch
git checkout mybranch

Solution 2: Restart the rebase

If you find yourself having just rebased and only then learn there are upstream changes you’re missing, the simplest way out of this difficulty is to simply ditch your rebase. Go back, and pull down the changes from the origin, and start over (after referring to solution 1). That would look something like this:

git checkout master
git branch -D mybranch
git checkout mybranch
git rebase master
git push -f

This will switch back to master (1), so that you can delete your local copy of the branch (2), and then grab the most recent version from the origin (3). Now, you can re-apply your rebase (4), and then push up the rebased branch before anyone else has a chance to mess things up again (5).

Solution 3: Start a new branch

If you find that you want to rebase right away, and don’t want to wait to coordinate with others who might be sharing your branch, a good plan is to isolate yourself from the potentially shared branch first, and then do your rebase.

git checkout mybranch
git checkout -b mybranch-2

At this point, you’ve got a brand new branch which only exists on your local machine, so no one else could possibly have done anything to it. That means you can go ahead and rebase all you like. When you push the branch back up to origin (e.g., GitHub), it will be the first time that particular branch has been pushed.

Of course, if someone else has added a commit to the old branch, it will still be stuck over there, and not on your new branch. If you want to to get their commit on your new branch, use git’s cherry pick feature:

git cherry-pick <hash>

This will create a new commit on your branch which will have the exact same effect on your branch as it did on the old one. Once you’ve rescued any errant commits, you can delete the old branch and continue from the new one.

✧✧✧

I’m hope this makes rebasing less scary, and helps you get a sense of when you’d use it and when not. And, of course, should things go wrong, I hope this gives you a good sense of how to recover.

Two last bits of advice… First, before rebasing, create a new branch from the head of the branch you’re going to rebase. That way, should things go completely wrong, you can just delete the rebased branch, and use your backup. And, finally, if you’re in the middle of a rebase which seems to be going a little nuts, you can always bail out by using:

git rebase --abort

So, feel free to experiment!

One thought on “Git 201: Safely Using Rebase

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s