Developer

Variable names should usually be descriptive

Chad Perrin says the long evolution of programming style leads us to one inescapable conclusion about variable naming conventions: what we should name our variables depends on context.

A programming practice that largely vanished with early versions of the BASIC programming language was the use of single-character variable names. The difficulty of deriving meaning from a variable named a, meant to represent a concept like "accounts receivable," impedes the quick and easy comprehension of what our code is supposed to accomplish for us. As a result, a common "best practices" rule of thumb was developed that could be expressed thusly: use descriptive names for variables.

This concept is central to the more modern, more general rule that, to the extent reasonable, source code should be self-documenting. It is preferable to be able to read source code directly and have all the information we need to understand it in full right there in the source, rather than having to read separate documentation of the code and keep track of where that documentation matches up with the syntactic elements of the source itself. It is preferable not least because of the tendency of documentation and source code to get out of sync. This applies almost as much to code comments explaining how code works as to separate documentation.

With self-documenting code, then, code comments can be reserved for why we wrote our source code the way we did. We increase the information density of our source code files without increasing the difficulty of reading them, this way. In fact, reading and understanding get much easier when the source code itself reads a bit like a story of how the code works, rather than like pseudorandom streams of characters.

The rule of thumb that we should use descriptive, (presumably) verbose variable names is just that, though — a rule of thumb. It should not be taken as a divine commandment, never to be broken. There are cases where long descriptions of what a variable does are not appropriate or desirable when trying to make our code as clear, and thus maintainable, as possible.

To the extent that a variable name is confined to a narrow context, it is often appropriate to use a shorter, less descriptive variable name. In fact, doing so is often more than appropriate — it is advisable, because a lengthier name would make it take longer for the developer to read and grasp the meaning of the variable in that context, cluttering up the code.

This is particularly the case where a single, primary looping variable is used to iterate through data stored in some kind of data structure, where the meaning of the variable name is fairly explicitly defined in an obvious, encompassing location in the code. Take a Ruby iterator block as an example:

purchase_list.sort.each {|p| puts product_descriptions[p] }

As we can see, the fact that we are iterating through a list of purchases is directly tied to the use of p as a looping variable. That variable, then, is used as an index or key used to retrieve product descriptions from the product_descriptions collection. Loops wherein a primary looping variable serves as the key used to retrieve a value from a collection constitute the most obvious example of where a single-character variable — often derided as a cardinal sin of programming — actually helps clarify code. Consider instead this needlessly verbose alternative:

purchase_list.sort.each do |purchase_list_item|

puts product_descriptions[purchase_list_item]

end

In this example, by choosing to be more explicit in describing the source of the looping variable, we not only clutter up the code so that it takes slightly longer to read and understand, we also create a mismatch between the name of the variable and how it is to be used within the loop. In addition, our lines of code are lengthened to the point where, for clarity, a single-line, simple iterator should be broken up into multiple lines — using do . . . end syntax instead of braces in accordance with Ruby coding style conventions for the multiline version. Another needlessly verbose alternative attempts to correct the deficiency in relevance to where the variable is used within the block:

purchase_list.sort.each do |product_key|

puts product_descriptions[product_key]

end

In this case, we tie the name of the loop variable to its use within the loop, but in so doing we divorce it from its source. Furthermore, in doing so, we make the name essentially redundant with the name of the collection for whose values it serves as keys, and convey no information that is not stunningly obvious to the reader.

Slavish devotion to the common rule of thumb that variable names should be verbosely descriptive produces an effective reduction in quick and easy reading and comprehension of code here, rather than an improvement. One of the most important factors in play is the fact that the beginning of the loop provides a directly corresponding connection between the loop variable's name and the source of its contents on each iteration. Another is its use as a key or index for a collection, where the syntactic significance of the variable in use is made clear by context.

That rule of thumb is certainly not without its merit in most cases, though. It is a rule of thumb because it works most of the time. For instance, given a variable whose scope is global to the current program file, a single-letter name provides little or no guidance within the context of its use in various parts of the program to its meaning for the algorithmic model as defined by the source code. More descriptive names are necessary for such circumstances because of the relative lack of cues in close proximity to the variable's points of use — a direct result of the distance within the source file from the source of the variable's value.

This is why one might create a hash like the following near the beginning of a program to capture command line arguments whose values must be used later in the program:

command = {

:name => ARGV.shift,

:target => ARGV.shift,

:attribute => ARGV.shift

}

This way, the programmer can see sources and names of command line arguments all in one place at the beginning when maintaining that particular part of the program, and code comments can be added to clarify why things are organized that way. This provides a sort of configuration rule set for the rest of the program, defining the relationship between program inputs and the datums derived therefrom as they are used later.

At the same time, the programmer reading through other parts of the program has an immediate cue as to the meaning of a given datum that originated as a command line argument when it is used within other code because of the descriptive hash-and-key names:

inventory_table = InventoryTracker.new(datafile)

if command[:name].downcase.eql? 'delete'

inventory_table[

command[:target]

].delete_entry(command[:attribute])

end

By contrast, the following could be disastrous for quick and easy comprehension of what our code is doing:

i = InventoryTracker.new(f)

if c[:n].downcase.eql? 'delete'

i[ c[:t] ].delete_entry c[:a]

end

In this case, compressing things into fewer lines of code provides greater brevity only at the cost of any contextual cues about the meaning and purpose of the variables, resulting in what looks more like line noise than actually useful notation. On top of that, we must take into account the difficulty of finding the appropriate variable when doing a text search for something like f or — worse yet, because variable assignment may not actually contain this specific group of characters — c[:t]. Instead, assignment may look more like this:

c = {

:n => ARGV.shift,

:t => ARGV.shift,

:a => ARGV.shift

}

Another bad approach would be to simply use ARGV indices:

if ARGV[0].downcase.eql? 'delete'

i[ ARGV[1] ].delete_entry ARGV[2]

end

Now, we have to remember the order in which we have specified users should enter their command line arguments every time we have to make use of one of those arguments in our code. By taking the first approach described here for managing command line arguments, creating a command hash at the beginning of the file with descriptive names, we keep all our non-descriptive variable names together within the context of early organization of the way we handle data in the program, and use more descriptive terms in later code to give the code a self-documenting quality.

The term "cargo cult programming" was coined to describe cases where people employ what they believe to be "best practices" and to copy code in full or at least in form, without understanding the reasons for the practices or how the code they copy works. Taking an approach like this results in abuse of descriptive verbosity in variable naming, turning a practice intended to clarify code into yet another way to obscure its meaning from the programmer who has to read it later.

Ultimately, the correct answer to questions like "How should I name my variables?" is "It depends." When using a rule of thumb, think critically about it to determine whether you are using the rule correctly.

About

Chad Perrin is an IT consultant, developer, and freelance professional writer. He holds both Microsoft and CompTIA certifications and is a graduate of two IT industry trade schools.

Editor's Picks