Clean Code
This file contains my notes (usually direct quotes) from Clean Code by Robert C. Martin.
Meaningful Names
Use Intention-Revealing Names
The name of a variable, function, or class, should answer all the big questions. It should tell you why it exists, what is does, and how it is used. If a name requires a comment, then the name does not reveal its intent. Good names give you the intention.
Avoid Disinformation
Avoid words whose entrenched meanings vary from our intended meaning. For
example, do not refer to a grouping of accounts as an accountList
unless
it's actually a List
.
Beware of using names that vary in small ways.
Make Meaningful Distinctions
It is not sufficient to add number series or noise words, even though the
compiler is satisfied. Number-series naming (a1, a2, ... aN)
is the
opposite of intentional naming. Such names are not disinformative - they are
noninformative; they provide no clue to the author's intention. Consider:
public static void copyChars(char a1[], char a2[]) { for (int i = 0; i < a1.length; i++) { a2[i] = a2[i]; } }
This function reads much better when source
and destination
are used for
the argument names.
Noise words are redundant. The word variable
should never appear in a
variable name. The name table
should never appear in a table name. Imagine
finding one class named Customer
and another named CustomerObject
. What
should you understand as the distinction? Distinguish names in such a way
that the reader knows what the differences offer.
Use Pronounceable Names
If you can't pronounce it, you can't discuss it without sounding like an idiot.
Use Searchable Names
The name e
is a poor choice for any variable for which a programmer might
need to search. In this regard, longer names trump shorter names, and any
searchable name trumps a constant in code.
Avoid Encodings
Encoding type or scope information into names simply adds an extra burden of deciphering.
Avoid Mental Mapping
Readers shouldn't have to mentally translate your names into other names they already know. This is a problem with single-letter variable names. In most other concepts than loop counters, a single-letter name is a poor choice. Clarity is kind.
Class Names
Classes and objects should have noun or noun phrase names like Customer
,
WikiPage
, Account
, and AddressParser
. Avoid words like Manager
,
Processor
, Data
, or Info
in the name of a class. A class name should
not be a verb.
Method Names
Methods should have verb or verb phrase names like postPayment
,
deletePage
, or save
. Accessors, mutators, and predicates, should be named
for their value and prefixed with get
, set
, and is
.
Don't Be Cute
Choose clarity over entertainment value.
Pick One Word per Concept
Pick one word for one abstract concept and stick with it. For instance, it's
confusing to have fetch
, retrieve
, and get
as equivalent methods of
different classes. A consistent lexicon is a great boon to the programmers
who must use your code.
Don't Pun
Avoid using the same word for two purposes. Using the same term for two different ideas is essentially a pun.
Use Solution Domain Names
Remember that the people who read your code will be programmers. So go ahead
and use computer science terms, algorithm names, and pattern names, math
terms, and so forth. The name AccountVisitor
means a great deal to a
programmer who is familiar with the Visitor pattern.
Add Meaningful Context
There are a few names which are meaningful in and of themselves - most are not. Instead, you need to place names in context for your reader by enclosing them in well-named classes, functions, or namespaces.
Functions
Small!
The first rule of functions is that they should be small. The second rule of functions is that they should be smaller than that.
Blocks and Indenting
This implies that the blocks within if
statements, else
statements,
while
statements, and so on should be one line long. Probably that line
should be a function call. Not only does this keep the enclosing function
small, but it also adds documentary value because the function called within
the block can have a nicely descriptive name.
This also implies that functions should not be large enough to hold nested structures. Therefore, the indent level of a function should not be greater than one or two. This, of course, makes the functions easier to read and understand.
Do One Thing
Functions should do one thing. They should do it well. They should do it only.
One Level of Abstraction per Function
In order to make sure our functions are doing "one thing," we need to make sure that the statements within our function are all at the same level of abstraction.
Switch Statements
My general rule for switch
statements is that they can be tolerated if they
appear only once, are used to create polymorphic objects, and are hidden
behind an inheritance relationship so that the rest of the system can't see
them.
Use Descriptive Names
Half the battle to achieving that principle is choosing good names for small functions that do one thing. The smaller and more focused a function is, the easier it is to choose a descriptive name.
Don't be afraid to make a name long. A long descriptive name is better than a short enigmatic name.
Function Arguments
The ideal number of arguments for a function is zero (niladic). Next comes one (monadic), followed by two (dyadic). Three arguments (triadic) should be avoided where possible. More than three (polyadic) requires very special justification - and then shouldn't be used anyway.
Common Monadic Forms
There are two very common reasons to pass a single argument into a
function. You may be asking a question about that argument, as in boolean
fileExists("MyFile")
. Or you may be operating on that argument,
transforming it into something else and returning it. A somewhat less
common, but still very useful form for a single argument function, is an
event. In this form there is an input argument but no output argument. Use
this form with care. It should be very clear to the reader that this is an
event.
Flag Arguments
Flag arguments are ugly. Passing a boolean into a function is a truly terrible practice. It immediately complicates the signature of the method, loudly proclaiming that this function does more than one thing. It does one thing if the flag is true and another if the flag is false!
Dyadic Functions
A function with two arguments is harder to understand than a monadic
function. For example, writeField(name)
is easier to understand than
writeField(output-Stream, name)
. Dyads aren't evil, and you will certainly
have to write them. However, you should be aware that they come at a cost
and should take advantage of what mechanisms may be available to you to
convert them into monads. For example, you might make the writeField
method a member of outputStream
so that you can say
outputStream.writeField(name)
. Or you might make the outputStream
a
member variable of the current class so that you don't have to pass it. Or
you might extract a new class like FieldWriter
that takes the
outputStream
in its constuctor and has a write
method.
Triads
I suggest you think very carefully before creating a triad.
Argument Objects
When a function seems to need more than two or three arguments, it is likely that some of those arguments ought to be wrapped into a class of their own. Consider, for example, the difference between the two following declarations:
Circle makeCircle(double x, double y, double radius); Circle makeCircle(Point center, double radius);
Reducing the number of arguments by creating objects out of them may seem
like cheating, but it's not. When groups of variables are passed together,
the way x
and y
are in the example above, they are likely part of a
concept that deserves a name of its own.
Verbs and Keywords
In the case of a monad, the function and argument should form a very nice
verb/noun pair. For example, write(name)
is very evocative. Whatever this
"name" thing is, it is being "written." An even better name might be
writeField(name)
, which tells us that the "name" thing is a "field."
Have No Side Effects
Side effects are lies. Your function promises to do one thing, but it also does other hidden things.
If your function must change the state of something, have it change the state of its owning object.
Command Query Separation
Functions should either do something or answer something, but not both. Either your function should change the state of an object, or it should return some information about that object. Doing both often leads to confusion.
Prefer Exceptions to Returning Error Codes
Returning error codes from command functions is a subtle violation of command
query separation. It promotes commands being used as expressions in the
predicates of if
statements.
Functions should do one thing. Error handling is one thing. Thus a function
that handles errors should do nothing else. This implies (as in the example
above) that if the keyword try
exists in a function, it should be the very
first word in the function and that there should be nothing after the
catch~/~finally
blocks.
Don't Repeat Yourself
Duplication may be the root of all evil in software.
How Do You Write Functions Like This?
When I write functions, they come out long and complicated. I massage and refine that code, splitting out functions, changing names, eliminating duplication. In the end, I wind up with functions that follow the rules I've laid down in this chapter. I don't write them that way to start. I don't think anyone could. If you follow the rules herein, your functions will be short, well named, and nicely organized.
Comments
Nothing can be quite so helpful as a well-placed comment. Nothing can clutter up a module more than frivolous dogmatic comments. Comments are, at best, a necessary evil.
The proper use of comments is to compensate for our failure to express ourself in code. Note that I used the word failure. I mean it. Comments are always failures. We must have them because we cannot always figure out how to express ourselves without them, but their use is not a cause for celebration.
The older a comment is, and the farther away it is from the code it describes, the more likely it is to just be plain wrong. The reason is simple. Programmers can't realistically maintain them.
Truth can only be found in one place: the code. Only the code can truly tell you what it does. It is the only source of truly accurate information. Therefore, though comments are sometimes necessary, we will expend significant energy to minimize them.
Explain Yourself in Code
Which would you rather see?
// Check to see if the employee is eligible for full benefits if ((employee.flags & HOURLY_FLAG) && (employee.age > 65))
if (employee.isEligibleForFullBenefits())
Good Comments
- Legal comments
- Informative comments
- e.g. Description of a regular expression
- Explanation of intent
- Clarification
- Warning of consequences
- TODO comments
- Amplification
- Javadocs in Public APIs
Bad Comments
- Mumbling
- Any comment that forces you to look in another module for the meaning of that comment has failed to communicate to you and is not worth the bits it contains.
- Redundant comments
- Misleading comments
- Mandated comments
- Journal comments (i.e. log of changes)
- Noise comments
- Don't use a comment when you can use a function of a variable!
- Position markers
- Closing brace comments
- Attributions and bylines
- Commented-out code
- HTML comments
- Nonlocal information
- Too much information
- Function headers
- Javadocs in nonpublic code
Formatting
You should choose a set of simple rules that govern the format of your code, and then you should consistently apply those rules. If you are working on a team, then the team should agree to a single set of formatting rules and all members should comply. It helps to have an automated tool that can apply those formatting rules for you.
Vertical Formatting
Small files are usually easier to understand than large files are.
Think of a well-written newspaper article. You read it vertically. At the top you expect a headline that will tell you what the story is about and allows you to decide whether it is something you want to read. The first paragraph give you a synopsis of the whole story, hiding all the details while giving you the broad-brush concepts. As you continue downward, the details increase until you have all the dates, names, quotes, claims, and other minutia.
We would like a source file to be like a newspaper article. The name should be simple but explanatory. The name, by itself, should be sufficient to tell us whether we are in the right module or not. The topmost parts of the source file should provide the high-level concepts and algorithms. Detail should increase as we move downward, until at the end we find the lowest level functions and details in the source file.
Vertical Openness/Density
Each group of lines represents a complete thought. Those thoughts should be separated from each other with blank lines. If openness separates concepts, then vertical density implies close association.
Vertical Distance
Concepts that are closely related should be kept vertically close to each other. Clearly this rule doesn't work for concepts that belong in separate files. But then closely related concepts should not be separated into different files unless you have a very good reason. We want to avoid forcing our readers to hop around through our source files and classes
Variables should be declared as close to their usage as possible. Because our functions are very short, local variables should appear at the top of each function.
Instance variables on the other hand, should be declared at the top of the class. This should not increase the vertical distance of these variables, because in a well-designed class they are used by many, if not all, of the methods of the class.
Dependent functions. If one function calls another, they should be vertically close, and the caller should be above the callee, if at all possible. This gives the program a natural flow. If the convention is followed reliably, readers will be able to trust that function definitions will follow reliably, readers will be able to trust that function definitions will follow shortly after their use.
Conceptual affinity. Certain bits of code want to be near other bits. For example, affinity might be caused because a group of functions perform a similar operation.
Vertical Ordering
A function that is called should be below a function that does the calling.
Horizontal Formatting
We should keep our lines short. The old Hollerith limit of 80 is a bit arbitrary, and I'm not opposed to lines edging out to 100 or even 120. But beyond that is probably just careless.
Indentation is important. It allows programmers to quickly hop over scopes.
A team of developers should agree upon a single formatting style, and then every member of that team should use that style. We want the software to have a consistent style. The last thing we want to do is add more complexity to the source code by writing it in a jumble of different individual styles.
Objects and Data Structures
There is a reason that we keep our variables private. We don't want anyone else to depend on them. We want to keep the freedom to change their type or implementation on a whim or an impulse. Why, then, do so many programmers automatically add getters and setters to their objects, exposing their private variables as if they were public?
Hiding implementation is not just a matter of putting a layer of functions between the variables. Hiding implementation is about abstractions! A class does not simply push its variables out through getters and setters. Rather it exposes abstract interfaces that allow its users to manipulate the essence of the data,without having to know its implementation.