Developer

Never use dynamic variable names

How to dynamically name variables is a common subject of programming questions. That's a great way to create security problems, though.

You have all probably seen it — at least, all of you that pay any attention to online discussion of programming. Sometime, somewhere, somebody eventually crops up asking about how, in a particular language, to dynamically give an arbitrary name to a variable.

What usually this means is that the person wants something like the ability to take input from a user and use that input to name a variable. This seems to crop up more in PHP circles than anywhere else, in my experience, though it seems to come up everywhere eventually. It is usually possible, too, especially if your language of choice has eval. A rather ugly example in Ruby is:

var_name = gets.chomp          # get input from STDIN

var_value = gets.chomp # get more input from STDIN

eval("#{var_name} = var_value")

(Note: All code examples in this article are in Ruby. I chose Ruby because I think it is easy to read, which makes it useful for examples; because I like the language; and because I know Ruby well enough to provide relevant examples without having to think about it too hard. The principles are easily translatable to many other languages, however. Note that the chomp method just gets rid of the newline character at the end of a line of input.)

If you run the program and enter foo at the first prompt and bar at the second prompt, you'll end up with a variable named foo that contains the value bar. It's pretty simple, really — but don't do it.

It is really not something you should ever do. Please, please don't do this. The best case scenario would generally be ugly code that is difficult to reason through and, as a result, difficult to maintain. I think most people who come to mailing lists and other online discussion venues asking this question don't realize how difficult it is going to be trying to work with variables whose names they do not know in advance, once they have dynamically named their variables.

The common use case is probably one where the programmer wants to be able to dredge up values based on input from the user. For instance, if a variable is named foo because that is what the user inputs as a value at some point, the program can get output by prompting the user so that the user inputs foo again. Right?

The proper way to do this sort of thing is usually to use scoping rules in the language, most likely in concert with some kind of looping construct, so that a value can be stored where needed. That way, the "special place" where the value is stored isn't a dynamically named variable; it is, instead, the current scope of the program, which vanishes when the program is done with that variable. In other cases, it might make more sense to just create a new sequential entry in some kind of database or programmatic data structure, and refer back to that based on stored values instead of on the names of the variables that store those values.

Consider, for instance, an array of arrays. Why not just create an array with an arbitrary number of entries in it, adding to those entries as needed — and let each array element be a two-element array itself?

ar = [

[gets.chomp, gets.chomp],

[gets.chomp, gets.chomp]

]

If the user inputs foo, bar, baz, and qux in response to the input prompts, the above gives you the equivalent of:

ar = [

['foo', 'bar'],

['baz', 'qux']

]

You can iterate or recurse through the ar array, seeking out subarrays whose first element is 'foo', and getting the second element from that subarray, if you must:

user_call = gets.chomp

ar.each do |pair|

puts pair[1] if pair[0] == user_call

end

That will just print out the word bar, and it is pretty simple. It is also much easier to be sure you have secured properly than a bunch of eval expressions that take user input. If you actually use dynamically named variables, you will also have to dynamically determine how to call those variables, after all:

user_call = gets.chomp

puts eval("#{user_call}")

That may look simple, but it's an incredibly naive implementation. In order to even begin to pretend that code is secure, you will need to sanitize input, which means a lot more code. More on that in a moment.

Complex concepts that are used when simpler concepts will do just as well create maintenance problems. The more such problems you have, the more likely you are to introduce security issues into your software. This is not just a problem of making your life more difficult — it is also a problem of making your users' lives less secure. Don't do it. Do not try to dynamically name your variables. There may be times you can get away with it, but you are playing with fire if you try. There may be times it is even a better idea to do it than some alternative, but I can almost guarantee you will never run across such a circumstance. If you think you have one of those circumstances facing you, chances are much better that you simply need to rethink your solution to the problem at hand.

As I pointed out in The safest way to sanitize input: avoid having to do it at all, using code others have written and — more importantly — that others have tested much more than you will test your own code before releasing it is a great way to minimize the likelihood of introducing security vulnerabilities to your code. In essence, using multidimensional arrays to approximate dynamically named variables is a case of using code (the array-handling code) someone else (the language implementation developer) has written, instead of writing your own dynamic variable implementation (using an eval expression) yourself.

Even simpler, though, is just using a one-dimensional array. Often enough, when someone thinks he needs dynamic variable naming, all he really needs is an array. Your array can be named whatever is appropriate for the array, and you can use numbers to keep track of the elements, since array elements are numbered. If need be, you can even maintain two arrays so there's some kind of correspondence maintained between two sets of values. For instance:

arkey[0] = gets.chomp

arval[0] = gets.chomp

That gives you two arrays, one which contains what you wanted to use as the variable name and the other which contains what you wanted to assign to it as a value. You can just iterate through the first array — arkey — to find the element number that corresponds with the user input term you wanted to use as a variable name, then use that same element number to get the corresponding value:

num = nil

(0..(arkey.length - 1)).each do |n|

num = n if arkey[n] == 'foo'

end

puts arval[num]

If the corresponding arval value is all you need, you can simplify it:

(0..(arkey.length - 1)).each do |n|

puts arval[n] if arkey[n] == 'foo'

end

If you want to get really fancy, you might think you could perhaps use a hash (if your language supports it), which should be a better idea at least than using an eval expression. For instance:

ha = Hash.new

ha[gets.chomp] = gets.chomp

If you enter foo first and bar second, you'll get a hash element with the key name foo that contains the value bar:

puts ha['foo']

That will output bar. Easy-peasy, no dynamic variable naming needed — or so it seems. On the other hand, on closer examination, you will need to come up with some way to slot that foo in there dynamically if you want to call up values based on user input. Once again, you may need to use an eval expression. Perhaps you should just stick with arrays after all.

The point here is that there are ways to get the results you probably wanted to get out of dynamically naming a variable without having to use eval. Using an eval expression is a great way to get yourself in trouble if you're not very, very careful — especially if there is any chance at all that there will be any user input in your eval expression anywhere! In much the same way that SQL injection vulnerabilities can let people run arbitrary SQL queries, eval expressions that include user input can potentially result in users running arbitrary code as well. Any time you are tempted to use an eval expression on user input, or otherwise execute arbitrary user input, you should rethink your approach.

Earlier, I mentioned a previous article, The safest way to sanitize input: avoid having to do it at all. The first time I brought it up was to touch on the importance of using others' code to solve your problems, when reasonable to do so, because the other code has probably been well-tested and had bugs (including potential security vulnerabilities) shaken out. In particular, that article's mention of this rule of thumb for programming referred to others' code for validating input. As you might have guessed from the article, though, the best approach is to avoid having to do it at all — and not just to use others' code to do it.

One could make the argument that properly sanitized user input in an eval expression is no direct threat to security, and you'd be right. The problem is that you may never know for sure that all possible user input will be "properly sanitized". One reason to avoid using an eval expression for anything involving user input is, simply, because your best protection against unsanitized (or improperly sanitized) user input is to avoid having to sanitize that input at all. Any other approach to dynamically naming variables is likely to suffer similar problems to an explicit eval expression solution, and creating a situation where you need to sanitize input just creates more opportunity for security vulnerabilities to creep into your code.

In this case, taking the easy way out is the same as taking the cautious approach. If you do not use dynamic variable names, you will not have to deal with the problem at all, so just don't use dynamic variable names. It really is that simple.

About

Chad Perrin is an IT consultant, developer, and freelance professional writer. He holds both Microsoft and CompTIA certifications and is a graduate of two IT industry trade schools.

Editor's Picks