Lesser-known Linux commands: join, paste, and sort

So many Linux commands, and so little time. In this Daily Feature, Jack Wallen, Jr. begins chipping away at the extensive list of uncommon commands by describing three commands you should know: join, paste, and sort.

If you've spent any time in Linuxland, you know that the command line can be an all-consuming playground. Between learning the command syntax and what each command does, even the most advanced Linux guru doesn't have time to learn them all.

In this Daily Feature, I will take you on a tour of three lesser-known Linux commands: paste, join, and sort.

First of many
This Daily Feature is the first in a series of articles that will highlight either one uncommon Linux command or a group of lesser-known commands to help you master the command line.

Description: Paste is not quite what it sounds like it is. Unlike the defacto “paste” of the infamous “cut and” crew (where the user copies a section of data from a document to a buffer to be pasted into another document), the Linux paste command merges data from one file to another. Typically, paste is used to create columns of data with a user-specified delimiter (default being a tab).

Usage: The paste utility is a very basic tool for creating either rows or columns of data that are combined from two separate files.

Example: I have two files, one containing full names (first and last) and one containing social security numbers. Both files’ data are stored in a single column. My goal here is to take the data from file1 (first name, last name) and the data from file2 (social security number) and merge them into one file with two columns of data. For this example, I have the following contents in file1:
Jack Wallen
Jessica Wallen
Johnny Wallen
Jeri Wallen

The contents of file2 will look like:

By running the command paste file1 file2 > file3 and then viewing the file, the contents of file3 will look like Table A.

Table A
Jack Wallen 123-45-6789
Jessica Wallen 234-56-7890
Johnny Wallen 345-67-8901
Jeri Wallen 456-78-9012

This utility comes in very handy when you need to merge text files to import into a database or presentation.

Obviously, the default delimiter is a single tab (as shown above). Should the columns need to be separated by, say, a comma, the fix is only a switch away. The command paste -d ',' file1 file2 > file3 will format the content as:
Jack Wallen, 123-45-6789
Jessica Wallen, 234-56-7890
Johnny Wallen, 345-67-8901
Jeri Wallen, 456-78-9012

Description: The join command is like the paste command—only a bit more intelligent. The join command takes two files and merges their columns—as long as both files share a common field. In other words, for join to work properly, there must be a common field for each row of data. The common field functionality will keep a user from merging incorrect data together.

Usage: The join utility is a more advanced tool than paste for creating either rows or columns of data that are combined from two separate files.

Example: The files used in the paste example would not work for join because they do not share a common field. This problem can be overcome by using the nl tool. The nl tool numbers the lines for the output of a file. Because nl is only adding the numbers to the stdout (the prompt), I will have to redirect that output to a file. To expedite this task, I will combine both nl commands into one line with the command:
nl file1 > fileA ; nl file2 > fileB

Now I will have two new files (fileA and fileB) that have the contents of file1 and file2 (respectively) only with line numbers added at the beginning of each line. With the line numbers acting as common fields, I can now employ the join command to combine the files together. So the command join fileA fileB > fileC will create a new file (fileC) that looks like Table B.

Table B
1 Jack Wallen 123-45-6789
2 Jessica Wallen 234-56-7890
3 Johnny Wallen 345-67-8901
4 Jeri Wallen 456-78-9012

Remember that the communal fields must be exact. Should one file have an errant entry such as:
1 Jack Wallen
2 Jessica Wallen
7 Jackie Wallen
3 Johnny Wallen
4 Jeri Wallen
5 Jimmy Wallen
6 Jenny Wallen
8 Jolene Wallen

then the joining would begin but would stop at the errant field. So the output would be:

1 Jack Wallen 123-45-6789
2 Jessica Wallen 234-56-7890

The easiest way to get around this problem is by using the sort command.

Description: The sort command, as its name implies, will sort data according to the users’ needs.

Usage: The sort utility is the command to use when you need to have data in alphanumeric, dictionary, or reverse order.

Example: Let's say, for example, the data above needs to be joined but that the far left field is out of numerical order. (Remember the out-of-place 7?) To fix this particular issue, use the default sort command (redirecting the output to yet another file) like so:
sort file1 > file3

which will rearrange the content this way:
1 Jack Wallen
2 Jessica Wallen
3 Johnny Wallen
4 Jeri Wallen
5 Jimmy Wallen
6 Jenny Wallen
7 Jackie Wallen
8 Jolene Wallen

Now that the above list is in proper numerical order, it will be able to be join'd with the contents of the second file.

The sort command does have a number of useful switches. The sort switches include:
  • -d
    Sorts only blanks and alphanumeric characters
  • -f
    Ignores case
  • -i
    Ignores nonprinting characters
  • -M
    Sorts by month
  • -n
    Sorts numerically
  • -r
    Reverses the sort order
  • -k
    Starts at a user-defined position and ends at a user-defined position

The only switch that warrants further explanation is -k. The -k switch asks the user to enter a specific field to use as the sorting point. Using the list above, if I were to run the command sort -k 2 file1, the output would be sorted by using the second column (which, in this case, is first name) and would look like:
7 Jackie Wallen
1 Jack Wallen
6 Jenny Wallen
4 Jeri Wallen
2 Jessica Wallen
5 Jimmy Wallen
3 Johnny Wallen
8 Jolene Wallen

This sort command took the list and sorted alphabetically according to the second column, with a small glitch being that the command places “Jackie Wallen” before “Jack Wallen.”

Linux has a long list of commands that are just waiting to be explored. The /usr/bin alone has 3,123 commands. The commands you don’t yet know could be veritable pots-of-gold, hiding time-savers, easy fixes, and invaluable tricks at the end of the Linux command-line rainbow. So, unless you can name all 3,123 /usr/bin commands, this series of articles on the lesser-known Linux commands could bring you a whole lot closer to being the Linux deity you already think you are.