Documentation at scale: The principles

I've been thinking about documentation a lot recently. Namely, how to maintain internal documentation in a large-scale project where multiple teams are involved and whose lifetime is measured in decades rather than years. The challenges in such environment are very different from what you experience when writing the documentation for your personal hobby project. In the latter case it's just a question of writing the document. In the former case, though, there's a whole bunch of different problems: People join the project and leave it all the time. Sometimes they die and take all their knowledge with them to the grave. Documents contradict each other. They are forgotten and not updated for years. When trying to use them you often find out that they are more of an obstacle to understanding than help. Documentation is a read-once affair. It's only read by newbies who don't have sufficient knowledge to improve it. Senior people, who can, at least in theory, fix it, never use it. And the list goes on and on.

So, here's a list of principles that could alleviate the problem:

  1. Acknowledge that brute force doesn't work.
  2. Make documentation a first class citizen.
  3. Make documentation executable.
  4. Track the intent.
  5. Measure it.

Acknowledge that brute force doesn't work

This is an observation from the wild: Asking developers to write and maintain documentation isn't enough. It never worked and it never will. Unless the whole system of dealing with documentation, both from technical and organisational perspective, is changed, the things won't get any better.

Accept that. Stop sweeping the problem under the carpet. Do something about it.

Make documentation a first class citizen

Documentation should have exactly the same status as code and should be treated in exactly the same way. If it's not you are sending a message to the developers that you don't care. And if you don't care why should they?

Make documentation executable

This point is meant to solve the 'read-once' syndrome. The goal is to make developers revisit the documentation on regular basis.

In devops world, it's easy. Just merge real-time statistics and diagnostics of the system with the documentation, be it dashboards, playbooks or whatever. It's 2015 after all and documentation doesn't have to be a static web page.

In pure software development world it's a bit more tricky but not impossible. Just take into account that a large codebase is a living and breathing system of its own. Consider Travis CI widget that started to appear in READMEs at GitHub lately. Does it make you revisit the README more often? Heck, yes!

Still, much more can be done. Documentation of a component can show recent changes made, rate of test failures over time, contact info of the person responsible for it. In some cases, documentation can even be scraped to drive the automation of the development process.

Track the intent

The biggest problem with long-lived software systems is thar nobody knows why they were built the way they were any more. Nobody knows about the use cases any more. Original developers have, one by one, left the company or retired and new ones have no idea. The system cannot be changed because nobody knows what it is supposed to do in the first place.

The sad thing is that most of that was actually documented in past but the documentation was lost. Most big companies have the practice of writing the design documents where use cases are explained and a solution is proposed. These, however, are often treated as a throw-away documents and couple of years later it is almost impossible to get one's hands on them. Even worse, if you are looking for an explanation for a particular phenomenon in the code you have no idea which design document describes it. The only way is to get as many design documents as you can and skim them for anything that may relate.

And it's really not a hard problem to solve. Just add the design document to the feature patchset, and then keep it in the codebase forever, an unmutable witness of past intent. Additionally, if the design document shares the source control repository with the code it's trivial to cross-reference the two.

Measure it

Last but not least, understand how the documentation is used. If it's a web page it's easy to track number of views. If it lives is in source control measuring number of edits isn't hard either.

If a page isn't used, it's a strong signal that there's a problem. Maybe it's so outdated that nobody cares looking at it any more? Maybe the code it refers to isn't used any more? Maybe the documentation doesn't have dynamic, "executable" aspect as discussed above so that only newcomers read the docs?

If a component shows a lot of change code-wise but no change in documentation something is probably wrong. It's time to trigger an alarm.

Is one department having a significantly different documentation usage that another one? What's going on?

August 9th, 2015

Discussion Forum