Better Documentation Through Commit Messages
As programmers, we see lots of code.
Sometimes we have questions about that code.
Sometimes we can get answers by bugging the developers who wrote that code. That often doesn’t work, though. Maybe those developers have moved on. Maybe they’ve forgotten. Maybe they just have other things to do. (After all, making a handful of individuals responsible for every question that might come up doesn’t scale very well.)
That’s where software documentation comes in.
There have been lots of approaches to documenting software over the years. Waterfall methodologies asked for extensive written requirements and architectural specifications. Newer approaches use web services like SharePoint or Google Docs, or wikis, or documentation files that live in the repo alongside the source code. Issue trackers like Jira can aggregate information about bugs, feature requests, and code changes that are made to address them. Tools like Doxygen, DocFX, and JSDoc let you maintain documentation as source code comments.
All of these approaches have shortcomings, though. Waterfall-style requirements and specs are expensive to write and expensive to keep up to date. Everybody hates SharePoint. Wikis and in-repo documents work better, but it can be easy to forget to update them as code is updated; even figuring out whether a document or wiki page is still current can be a challenge. Issue trackers are great at what they do, but using one to track down information about a particular piece of code may be hard. Source code comments work well, but they can be an awkward fit for higher-level information or historical information that isn’t directly tied to a chunk of code, and trying to write too much in comments can make the code itself harder to read.
Wouldn’t it be great if there were a documentation tool that’s directly tied to the source code (so that you can always access information that’s relevant to the code you’re looking at), that doesn’t clutter up the code itself, that’s carefully and automatically timestamped (so that you know how current its information is), and that’s guaranteed to be updated whenever the code is?
Oh, wait. There is.
Your version control system’s commit messages meet all of these criteria: you’re already writing them, the VCS carefully and automatically tracks them, and they’re automatically associated with the code, so you can easily access them whenever you need to. However, for this to benefit you, you have to write good commit messages.
I’ve found that the best way to write a good commit message is to imagine what questions others would have, then write my commit messages to answer those questions. I’ll even imagine specific scenarios to guide me:
First, suppose I’m asking for a code review from a developer who has some experience but is relatively new to the project. What questions would this developer come up with? What do they need to know to understand the commit?
Second, imagine a future maintainer needs to change the code I’m writing. Maybe there’s a new feature that impacts that piece of code, or maybe (gasp!) I introduced a bug. What should that maintainer know about the code’s context - why it’s written the way it’s written - so that they’ll know the ramifications of changing it?
The goal is to ensure that each change to the codebase is accompanied with a description of the context of the change, both to communicate to other developers on your team and to leave documentation for future maintainers. Then this information is readily available to whoever needs it and can be accessed through tools like git log
or git blame
.
Here are some more specific guidelines. A lot of this discussion focuses on Git, because that’s the most popular version control system, but the principles apply everywhere.
Follow a common format.
The Git convention is to use a one-line summary (up to 50 characters or so, and written as an imperative phrase), then a blank line, then one or more paragraphs of details (wrapped at 72 characters or so). This is, of course, only a convention. But there’s value to being consistent; for example, tools the command-line git shortlog
or GitHub’s “Commits” view (and even some non-Git tools like TortoiseSVN’s log viewer) take the each commit’s first line and use it as a summary.
Here’s a sample of a commit message that follows these conventions, courtesy of Chris Beam:
Summarize changes in around 50 characters or less
More detailed explanatory text, if necessary. Wrap it to about 72
characters or so. In some contexts, the first line is treated as the
subject of the commit and the rest of the text as the body. The
blank line separating the summary from the body is critical (unless
you omit the body entirely); various tools like `log`, `shortlog`
and `rebase` can get confused if you run the two together.
Explain the problem that this commit is solving. Focus on why you
are making this change as opposed to how (the code explains that).
Are there side effects or other unintuitive consequences of this
change? Here's the place to explain them.
Further paragraphs come after blank lines.
- Bullet points are okay, too
- Typically a hyphen or asterisk is used for the bullet, preceded
by a single space, with blank lines in between, but conventions
vary here
If you use an issue tracker, put references to them at the bottom,
like this:
Resolves: #123
See also: #456, #789
You can use Markdown for formatting. Tools like GitHub format it natively, and it looks nice enough as plain text too.
With your team’s agreement, other formatted information can go in your commit messages, too. For example, Arlo’s commit notation adds 1-3 character notations to each commit to express the type of commit and the relative risk that it introduces.
Use good spelling and grammar
You’re writing for humans, so make it painless for humans to read. Enable your IDE’s or text editor’s spell checker, and pay attention to it. Tools like Grammarly can also help. Besides making your communication more readable, good spelling helps with navigating the codebase and history. It’s frustrating to be unable to find something because the text you’re searching for is misspelled.
Explain the why, not the what
Discussions on how to write good code comments emphasize that good comments should explain why the code does what it does, not what the code does.
// Add one to x
x = x + 1;
The reason is that you can tell what the code does by looking at the code itself, so explaining that via comments is just repeating yourself. There’s value to summarizing what the code does (e.g., at the top of a function, to save other developers from having to read the entire function just to know what it does), and there’s a lot of value to explaining the reasoning behind the code, because that may not be at all obvious from reading the code itself.
Similarly, when writing a commit message, the commit message doesn’t need to explain all the details of what the commit does, because you can find that by looking at the diff. What’s really valuable is a summary (so that you don’t have to read the whole diff if you don’t want to) and an explanation of the reason why - the context behind the commit, and what the changes in the commit are intended to accomplish.
Include relevant information
Having a commit message that consists only of a Jira issue number isn’t ideal. It’s better if the commit messages summarize their changes and their rationale themselves, so that the code and its history is accessible in a single place, instead of making you access an external tool. Maybe you’ll change issue trackers, or maybe the issue tracker will be down, or maybe an outside developer needs to make code changes but doesn’t have Jira access. (And I’ve seen all of these happen.)
However, a project’s issue tracker has lots of important information: who requested a feature, how a bug can be reproduced, when and how it was scheduled to be fixed, and so on. Duplicating all of that in a commit message would be tedious and noisy, so it’s a good idea to include links or issue numbers for relevant issues when writing your messages.
Practice empathy
All of the guidelines we’ve discussed - writing messages so that they’ll answer questions for code reviewers and future maintainers, using a consistent and readable format, explaining why changes are being made, and making sure to include relevant information - have a common theme: they’re all about communication. Commit messages are communication to other developers on the team, both present and future. And this is true even if you’re a solo developer: “other developers” in that case means your future self when you come back to the code six months later or the poor guy who’s stuck maintaining your code after you win the lottery and retire to Tahiti. And the way for commit messages to be good communication is to practice empathy. This sounds like touchy-feely liberal arts stuff that has little place in the realm of computer science, but it’s really just saying that you need to think about how others (teammates or future developers) would think and feel and communicate accordingly. That’s why empathy is one of our core values here at Corgibytes. (We even gave a keynote about it.)
Many others have written about how to write good commit messages. I’ve found Chris Beam’s and Tim Pope’s articles particularly helpful.
Want to be alerted when we publish future blogs? Sign up for our newsletter!