Git is by far the world’s most popular source control system, but most people only use a tiny proportion of its true power.
This may be because of its incredibly steep learning curve, or because people still approach it with the mindset they had when using earlier centralized source control systems. While you can get by with GUI tools and a reasonably simple work flow, as a code base grows some problematic trends begin to appear in the history of the repository. These trends may seriously inhibit any kind of code archaeology, and can make finding where bugs were introduced practically impossible.
In the typical work flow, developers will either work from their local master, or take a named topic branch from it. They will make their code changes, create commits periodically, and merge their changes back into the remote master branch. If changes have been added to the remote master by other developers, they will be merged either into the local master branch, or into the developers topic branch as required.
In Centralized source control systems there was no alternative to merge, because there was no concept equivalent to a git “commit” in these systems. A merge was taking all of the changes you had made and blending them with the latest changes in the centralized server. Studious developers would always get the changes from the central repository locally first, validate the code was still building and working, before merging the changes back into the remote repository, but still it was a single change set that contained the changes you made. This meant that often change sets were very large, and contained refactoring as well as feature development. Large change sets make code reviews daunting and as a result less likely to find issues.
Enter git commits. The beauty of a git commit is that it can be as small or as large as you like. This means that you can separate out refactoring from file renames, formatting and actual new feature implementation. Most disciplined developers will create commits frequently while developing. This allows them to undo a change at will if they end up going down a dead end. For this alone the humble git commit deserves to be celebrated, and the mantra “commit early, commit often” should be a guiding principle. Problem is though, the process of software development is not usually a linear thing. Because software development is a creative process and we tend to gain better insights as we develop, we may be half way through feature development when we decide that a refactor is in order.
When you merge from one branch to another, git creates a “merge commit” which is a commit with 2 separate parents.
The merge commit contains the modifications from both branches and any changes that were required to correct merge conflicts and integrate the two branches. This seems (and can be) quite useful. Both branches have their original commits preserved, so you can see how the code evolved. The real problem arises when developers inevitably start to abuse merge. When a developer merges changes from remote branches back into their own local branch, to ensure they have the latest changes, very soon the branch structure begins to resemble the subway diagram of a large and very poorly planned city.
This can also happen when long lived feature branches are used within a large development team, and teams decide they need to share code across these feature branches. This makes reasoning about the evolution of the code base practically impossible, worse still, it makes it extremely difficult to see exactly what is being released from master.
With some discipline, teams can avoid this merge hell and make their git history a valuable resource in the process of development, while at the same time making code reviews easier, and more likely to find issues earlier in the development cycle. The following four rules, if followed, will set a team on the path to git history nirvana.
- History of a remote long-lived branch should be clean. This means linear. Merge commits only from Pull Requests that have already been rebased against the head of the remote branch.
- Each commit should be concerned with 1 thing and 1 thing only. Only implement 1 new idea at a time, and Never mix:
- Code Formatting
- File Moving/Renaming
- Bug Fixes
- Feature Development
- Each commit should build. Not just compile, but pass any unit/integration/automation tests.
- Code sharing should be done only via a minimal amount of long-lived remote branches. Ideally 1 remote branch (master or development, or whatever you chose to call your sacred remote branch).
To ensure these rules are followed, developers should adopt the following practices
- Branch early, branch often, commit early, commit often
- Rebase local feature branches against the remote master before creating a Pull Request. (Never merge)
- Tidy up your commit structure using interactive rebase before submitting pull requests.
To achieve this, developers need to dive below the surface of git, and really get to know some of the more advanced features like:
- Interactive Rebase
In future posts I will describe the techniques mentioned above, and hopefully demonstrate that this is not only possible, but incredibly useful, and in the long term will save you time, energy and sanity.