In this post I discuss similarities between coding and writing prose; then deduce a coding style guideline.
A colleague has recently sent me a code snippet and asked if there was any way to improve it. The snippet pretty much consisted of a single screen-long screen-wide LINQ statement1. Long chains of calls are bread-and-butter of LINQ, but maybe splitting them into smaller snippets would help clarify the intent of the code? People, including the code author, will read and re-read the code, and it is in their shared best interest to be able to understand the code quickly and without ambiguity. Help your code reviewers to understand and verify the code. Help future maintainers to navigate the code and find that statement they need to fix or update.
Traditionally, low-level code is accompanied by some varying amount of free-text comments, symbolic names, mnemonics to “translate” the implementation at hand into high-level human-readable intent of the code to allow quicker understanding. LINQ (among other technologies) brings higher-level programming concepts and abstractions, making your code closer to free text than ever before, reducing the need for extra comments. Why comment when you can write a programming language statement equally clear for a human as well as a robot? The question whether robots eventually would be able to understand your PM’s spec or even your dev design document without your help, remains open, so for now the best way to make our code to work as designed is to make the code read like the spec and let other people verify it. Thus our high-level code becomes sort of a spec and the process of coding becomes sort of technical prose writing. The good news here is that humankind has quite an experience in the area, working on prose writing for the last couple of thousand years. There were ups and downs along the way, certainly, likewhentheworddelimiterswentoutoffavorforawhile, but all in all over time some good guidelines were established to make our written communication more efficient.
Before the advent of the codex (book), Latin and Greek script was written on scrolls. Reading continuous script on a scroll was more akin to reading a musical score than reading text. The reader would typically already have memorized the text through an instructor, had memorized where the breaks were, and the reader almost always read aloud, usually to an audience in a kind of reading performance, using the text as a cue sheet. Organizing the text to make it more rapidly ingested (through punctuation) was not needed and eventually the current system of rapid silent reading for information replaced the older slower performance declaimed aloud for dramatic effect.
One of the most influential style-guides for English is “The Elements of Style” by Strunk and White (wiki, amazon, goodreads). I’ve already lost count of the occasions when that book has helped me to improve my e-mails and documents. And here is Strunk and White on the Paragraph:
13. Make the paragraph the unit of composition.
The paragraph is a convenient unit; it serves all forms of literary work. As long as it holds together, a paragraph may be of any length — a single, short sentence or a passage of great duration.
See, the paragraph is a convenient unit for literary work, but if our code is a spec then it can serve us too! Why not use it as a unit of code composition? A block of lines separated from the rest of the code by empty lines can be treated as a paragraph in our coding analogy.
If the subject on which you are writing is of slight extent, or if you intend to treat it briefly, there may be no need to divide it into topics. Thus, a brief description, a brief book review, a brief account of a single incident, a narrative merely outlining an action, the setting forth of a single idea — any one of these is best written in a single paragraph. After the paragraph has been written, examine it to see whether division will improve it.
Ordinarily, however, a subject requires division into topics, each of which should be dealt with in a paragraph. The object of treating each topic in a paragraph by itself is, of course, to aid the reader. The beginning of each paragraph is a signal that a new step in the development of the subject has been reached.
Ok, so this says if our method is simple or brief, it can consist of a single paragraph of code. But we should always re-read the code and see if we can aid the readers by splitting the statements into several paragraphs, each of them focused on a single topic.
As a rule, single sentences should not be written or printed as paragraphs. An exception may be made of sentences of transition, indicating the relation between the parts of an exposition or argument.
As a rule, begin each paragraph either with a sentence that suggests the topic or with a sentence that helps the transition. If a paragraph forms part of a larger composition, its relation to what precedes, or its function as a part of the whole, may need to be expressed. This can sometimes be done by a mere word or phrase (again, therefore, for the same reason) in the first sentence. Sometimes, however, it is expedient to get into the topic slowly, by way of a sentence or two of introduction or transition.
In our analogy, we should avoid code paragraphs consisting of a single statement, and each paragraph should begin with a statement or a type or an identifier that will suggest the topic of the whole paragraph. Now we can see how this can help the readers — just by scanning through the paragraphs one can comprehend the main steps in the method’s implementation and after that, knowing the higher-level structure dig deeper into the details. This can help reviewers to understand and verify the overall design. This can also help the maintainers to find the paragraph they need to fix or improve.
In general, remember that paragraphing calls for a good eye as well as a logical mind. Enormous blocks of print look formidable to readers, who are often reluctant to tackle them. Therefore, breaking long paragraphs in two, even if it is not necessary to do so for sense, meaning, or logical development, is often a visual help. But remember, too, that firing off many short paragraphs in quick succession can be distracting. Paragraph breaks used only for show read like the writing of commerce or of display advertising. Moderation and a sense of order should be the main considerations in paragraphing.
Ah, another point in favor of paragraphs — visual aid for the readers, catching their eyes and helping the navigation. Too few paragraphs — and the readers are scared. Too many paragraphs — and the readers are distracted.
To summarize: paragraphs are a well-known practice to improve literary prose, and this practice can be applied to your high-level code. Split large blocks of lines into smaller topical chunks. Just as a text is structured with whitespaces, punctuation, paragraphs and chapters, delimit the code you write into identifiers, statements, groups of statements, methods, classes, components, etc. Split long LINQ chains into smaller chains labeled with an identifier, explaining the result of the particular chain.
Check out this short ParallelGrep sample code, see how cleanly the paragraphs of code are split, you can quickly scan through the method to understand the main steps. Textual comments introduce paragraphs of code, however they don’t echo the code, merely set the topic of the code paragraph.
P.S. Interesting enough, there is also value in going the other way: from your code writing to your prose writing. The Pragmatic Programmer book reveals that “English is Just a Programming Language”, therefore “Write documents as you would write code: honor the DRY principle, use metadata, MVC, automatic generation, and so on”.