Enterprise Software

The UNIX shell game

Ed Gold explores several basic properties of UNIX's csh shell, and he explains how the shell can protect your system from errors.

Before I begin this week's topic, I want to answer the challenge question from my Daily Drill Down “UNIX in the house” (How can you remove a file named "-a”?). That question has its roots deep in personal experience. I have no idea what I did to create such a file, but UNIX can be funny that way. One of the easiest ways to create a file named "-a" is to just go into "vi" and type something like ":w -a". This UNIX fluke doesn't really cause any grief, but it’s definitely annoying, and it forces you to think about the underlying fundamentals of how the UNIX shell works.

The problem with removing "-a" centers around the fact that UNIX uses the dash character [-] to declare command line parameters. Thus, a statement like “rm –a” won't work because the “rm” command will just tell you that “-a” is an invalid command option. So, if you’re like me, you try something like "rm -i *" in an attempt to blast everything out of the directory. The "-i" tells the command to inquire before wasting each file. Unfortunately, this doesn't work at all. When you try this, again it tells you that "-a" is an invalid command option. This is UNIX at its most annoying. The root cause of this confusion is actually what makes the UNIX shell such a powerful ally most of the time.

The UNIX shell performs wild card substitution prior to processing the command line. So the "rm -i *" command first expands the "*" into a list of file names, then it attempts to execute the command. Unfortunately, "-a" almost always comes first in the file list, so the "rm" command still sees "-a" as a command option instead of a file name. Given these clues, do you have a solution yet?

My first solution was pretty ugly. I realized that ",a" would precede "-a" in the file list, so I created a bogus file called ",a", then I typed “rm -i *” which got rid of both of them. Had I been a bit brighter at the time, I would have realized that I could have just made up a bogus file name to precede "-a" in the command line, and that also would work. Thus, if you typed “rm -i dud –a”, you’d effectively tell the "rm” command to delete a file named "dud" and a file named "-a", and so the objective would be achieved (though only slightly more elegantly than my solution). I posed this question to a real UNIX guru and he looked bewildered and said, "Why wouldn't you just type “rm — -a”? Obviously, I never knew that UNIX accepts "—" as a standard way to declare the end of the command options. If you want the complete information on this, type man 3 getopt (on BSD systems, anyway).

Shell basics
So how does this tie into the topic for this week? If we try to figure out why the rm -i * command failed to remove the "-a" file, we develop a better understanding of the UNIX shell. The shell is UNIX's command interpreter. It takes commands from a file or other input device and processes them. The shell is your window to the operating system. It looks much like the basic DOS window, but clearly the UNIX shell is far more powerful.

There are several flavors of UNIX shells, such as sh, csh, bash, and tcsh, just to name a few of the most popular. I will focus mostly on csh because I’m most familiar with it. Most of the other shells have similar properties.

One of the more helpful shell properties is the history list. The shell retains the last n commands in a history list. If you type history to the csh interpreter, it will show you the history list with each line numbered. If you then type !12, csh will recall the 12th line from the history list and type it as if you had typed it at the screen. Similarly, if you type !v, csh will recall the most recent command that began with the letter "v". The [!!] is shorthand for recalling the previously executed command. The csh shell will even let you change part of the recalled line and/or add to the line before submitting it.

This is just the tip of the iceberg when it comes to the shell's ability to substitute data in place of special characters. It also allows for alias substitution, which lets the user define aliases for frequently used phrases. For example, on my system, I always alias the rm, cp, and mv commands to use the "-i" option, so that I never accidentally blow away a valuable file through an inadvertent typing error. In my .cshrc file, which csh reads and executes upon startup, I have the following aliases:
alias cp 'cp -i'
alias mv 'mv -i'
alias rm 'rm -i'
alias ls 'ls -CF'
alias ll 'ls -al'

We’re all familiar with the [*] wildcard character, which substitutes for the entire directory listing. The wildcard character [?] acts as a single character wildcard, while the wildcard character [~] refers to the current user's home directory. The shell file name matching capabilities are far beyond what anyone can ever remember. I can’t remember the last time I needed anything beyond these.

The shell also maintains variables, which makes it easier to write scripts and to pass data into programs. Commands such as set file="test.dat" will store the phrase "test.dat" in a variable that can later be retrieved by other commands such as echo $file. The shell will also allow you to take the output of one command and save it in a variable. For example the command set dateandtime=`date` will take the output of the date command and put its result into variable dateandtime. When the shell encounters backward quotes, it executes the command contained within, and outputs the result as the replacement for the quoted phrase.

Single vs. double quotes
This is a good time to draw a distinction between the use of single quotes and double quotes. A great experiment to illustrate this is:
>set items="one two three"
>echo $items
one two three
>echo "$items"
one two three
>echo '$items'

The single quotes prevent all further expansion from happening. However, the double quotes allow for variable substitution, history substitution, and other forms of substitution besides file name substitution.

Background vs. foreground
The shell is capable of launching multiple tasks at the same time. A task can be sent into the background by following the command with an [&] character. Observe the following sequence of commands:
>sleep 90 &
[1] 500

In the above dialog, the command sleep 90 was submitted to the background. The shell gave it a job number of [1] and told you that this is process number 500. We won't discuss processes right now, but basically we’ve launched this program to run in the background. If the program can run without input or output, it will stay content to run out of sight. We can bring that task out of the background by issuing the %1 command. The [%] character tells the shell that you want to bring a job out of the background and make it the foreground task. If you follow the % with a number, it applies the command to that job number. If no job number is given, it selects the last job number. You can see which jobs are running by typing the jobs shell command.

I prefer to run most long tasks in the background in case I crash my xserver or accidentally close a window. If a task was running in the foreground and I do something to kill the window in which I called the task, then the task dies with it. A background task will continue running unless it needs input from the shell that launched it.

Flow control statements
Perhaps the most wonderful thing about csh is its ability to build scripts from flow control statements. Let’s say you created a directory of 3,000 files and created all of them with the wrong file extension. You could manually change the file names by hand, or you could use the shell's looping powers to help you. Let’s say you had accidentally named all of the files with a ".txt" extension when really they should have been ".jpg" files.

Here’s an example of how you would fix such a problem:
>foreach FILE ( *.txt )
foreach? mv $FILE $FILE:r".jpg"
foreach? end

This example makes use of several shell capabilities. First, it uses the foreach construct to create a loop that iterates over each item within the parentheses, each time substituting the current item into the variable FILE. In each iteration of the loop, it then executes the mv instruction, which copies the file name FILE to the file name generated by stripping the extension from $FILE and adding the string ".jpg". The ":r" removes extensions from a variable, while ":t" removes prefix directory paths. These are very valuable tools that I use time and time again. The foreach construct allows you to place as many commands in the loop as you wish. When you have the sequence completely described, the end command tells the shell that it has the complete definition for the foreach command, and it can begin processing.

The csh shell closely resembles C-language in its control flow statements. It supports just about everything you’d expect: from if, then, and else to while.

Protecting from errors
One of the least known features of the shell is its ability to protect the system from user errors. Let’s say you accidentally wrote a program that started allocating memory and never stopped. This program would quickly absorb all of the system memory and starve the other processes for this precious resource. Fortunately, the shell acts as the first line of defense against such errors. If you type the command limit, the shell will show you all the current resource limit settings. On my FreeBSD machine, I have the settings set for unlimited cputime, unlimited filesize, 524 MB datasize, 64 MB of stacksize, unlimited coredumpsize, unlimited memory use, 1,064 file descriptors, unlimited memory locked, and 531 maximum processes. This basically prevents my default csh from allowing any process that uses more than 524 MB of memory, tries to spawn more than 531 processes, or creates more than 1,064 files.

Most users never discover the limit feature of the shell. Your first experience with it is usually an unpleasant one. I learned of the limit feature the first time I wrote a program that used more memory than the shell limit allowed. When this happens, the shell kills your program and leaves you with the mystery of wondering what killed it and how you make it look the other way. Most UNIX variants also have a kernel parameter that restricts these limits, so just because you raised the shell resource limits doesn't mean the kernel will let you have what you request.

All implementations of shells have an upper limit on the number of words that can be placed on a line and how long a line can be. This is a good reason for limiting the number of files in a directory. If there were 90,000 files in a directory, shell would have a hard time expanding the wildcard character "*" into the file list.

Stay tuned… my next Daily Drill Down will reflect on my past experiences and relate some interesting lessons I’ve learned.
How do you create a file with all spaces in its name? More important, how would you rename this file to something that would be easier to deal with (assuming you don’t know how many spaces were in the filename)?Send us your creative answers.If there’s enough interest, Ed will dedicate an entire Daily Drill Down to shell scripting. Let him know you’re interested.

Ed Gold grew up in Louisville, Kentucky, and he received his master’s degree in electrical engineering at the University of Louisville. Ed owns a small engineering consulting firm in Orlando, Florida, and he is working on the electro-optic subsystem for Lockheed Martin's Joint Strike Fighter proposal. Although his primary computing interests are in image processing and artificial intelligence, Ed is a dedicated FreeBSD/Linux enthusiast. He is currently working to improve the FreeBSD system install utility.

The authors and editors have taken care in preparation of the content contained herein, but make no expressed or implied warranty of any kind and assume no responsibility for errors or omissions. No liability is assumed for any damages. Always have a verified backup before making any changes.

Editor's Picks

Free Newsletters, In your Inbox