Master the lesser-known CUT and DIFF Linux commands

See how you can use the CUT and DIFF commands in Linux to search and manipulate files, especially when working with scripts and batch files.

Anyone who uses Linux on a regular basis knows that there are some commands that you use more than others. As such, it’s easy to simply forget about some of the lesser-used commands. Two of these lesser-used commands include CUT and DIFF. Here’s what these commands do, along with their syntax and some practical uses for them.

The CUT command
The CUT command is used to extract specific characters or fields from a file. As I go through the CUT command’s syntax, you’ll notice that many of the command options require you to enter a list. The list includes the parameters regarding which characters to cut. A list consists of values separated either by commas or by hyphens. A comma designates individual values, while a hyphen indicates a range of values. For example, 10,20 would indicate the values 10 and 20, while 10-20 would indicate 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, and 20.

The syntax for the CUT command is as follows:
CUT options [files]

Here is a list of the various options:
  • -B list or --BYTES list:This option specifies a list of positions. Only bytes within these specified positions will be copied.
  • -C list or --CHARACTERS list:This option allows you to cut the column positions identified within your list.
  • -D c or --DELIMITER c:You must use this option in conjunction with the -F option (see next item) to specify a field delimiter character. The default field delimiter character is a tab. If you use a special character, such as a space, the character must be surrounded by quotation marks.
  • -F list or --FIELDS list:This option cuts the fields identified within the list.
  • -N:The -N option tells the CUT command not to split multiple characters.
  • -S or --ONLY-DELIMITED:This command option must be used in conjunction with the -F command. This option allows you to suppress lines without delimiters.
  • --OUTPUT-DELIMITER=string: This option allows you to specify a string as the output delimiter. By default, the output delimiter is identical to the input delimiter.
  • --HELP: This option causes the CUT command to print a help message and then exit.
  • --Version: The Version option causes the CUT command to print its version information before exiting.

Now that you know the syntax for the CUT command, let’s take a look at some examples of how you might use the command in real life. For example, using the CUT command in conjunction with the PASTE command can allow you to perform file manipulations. Suppose that you wanted to cut the fifth character from a file and paste it at the beginning of the same file. You could do so with this command:
CUT –C4 file | paste – file

Another way that you could use the CUT command is to extract the usernames and real names from the /etc/passwd file. You could do so with the following command:
Cut –d: -f1,5 /etc/passwd

The DIFF command
Simply put, the DIFF command is designed to compare two files and show how the files are different from each other. As you’ll see, DIFF is a very powerful command with a lot of options.

The syntax for the DIFF command is as follows:
DIFF [options] [diroptions] file1 file2

Here is a list of its various options:
  • -a, --TEXT: This option tells the DIFF command to treat all files as text files. This allows you to scan binary files to see if they are identical. This is a popular hacking technique, known as diffing.
  • -b, --Ignore-Space-Change: Sometimes a text file will have multiple spaces where there should be just one. This option tells the DIFF command to ignore excessive space and to treat multiple spaces as a single space.
  • -B --ignore-blank-lines: As you might have guessed, this command option tells DIFF to ignore any blank lines that occur within a file.
  • -c: The lowercase -c option tells DIFF to implement context by printing three lines around each changed line. This option can help you to spot changes more easily.
  • -C n, --context[=n]: If you use the uppercase -C option, you can tell DIFF how many lines you want to place around each changed line. The default value is three.
  • -d, --minimal: The -d option is used to speed up comparisons. It tells the DIFF command to ignore segments with numerous changes and to output a smaller set of changes.
  • -D symbol, ifdef=symbol: The -D switch is used for working with C files. This option allows you to create an output file that’s based on all input files, including #IFDEF and #IFNDEF.
  • -e, --ed: The -e switch allows you to create a script file. The script file allows you to create one file from another by using the various DIFF options and the ed editor.
  • -F regexp, --show-function-line[=regexp]: This option works with context and unified DIFF. It shows the most recent line containing regexp prior to each block of changed lines.
  • -H: This option is similar to the -d, --minimal option. It speeds up the output of large files by scanning the files for small, widely scattered changes. Areas of the file that are filled with lots of changes are ignored.
  • --help: The --help option displays a brief help message.
  • --horizon-lines=n: This command is geared toward helping DIFF to achieve a more compact listing. It’s designed to keep a number (n) of lines on each side of a changed line when performing a comparison.
  • -I --ignore-case: Normally, the DIFF command is case-sensitive. However, you can use this option to force DIFF to ignore each character’s case within the files.
  • -i regexp, --ignore-matching-lines: The -i regexp option tells DIFF to ignore lines in the file matching the regular expression REGEXP.
  • -I –paginate: This option tells DIFF to paginate the output by passing it to print.
  • -L label, --label label, --label=label: The label option is used with context and unified DIFF. In such cases, the label can take the place of filenames. The first label replaces the first filename, while the second label replaces the second filename.
  • --left-column: If you perform a DIFF with a two-column output by using the -Y switch, then you can use this command option to display only the column on the left.
  • -n, --res: You can use this option to specify that the output should be in RCS DIFF format.
  • -N, --new-file: If you were to incorporate DIFF into an automated procedure, then there’s a chance that the procedure could someday be run against a file that doesn’t exist. This switch allows you to treat nonexistent files as if they exist but are empty.
  • -p, --show-c-function: If you run a DIFF against two C or Java files, you can use this option to show each block of unchanged lines. By default, this option assumes that you’re using the -c switch, but you can run the option in unified mode as well.
  • -q, --brief: If you use this option, then DIFF will report only that the files differ, not how they differ.
  • -r, --recursive: You can use this option to perform a recursive comparison of subdirectories.
  • -s, --report-identical-files: You can use this option to report when files are identical to each other.
  • -S filename, --starting-file=filename: You can use this option to compare files within a directory. You can specify a filename to start with, and any files prior to those files in the directory are ignored.
  • --suppress-common-lines: If you generate a two-column output file by using the -Y switch, you can use this command to prevent common lines from being displayed.
  • -t, --expand tabs: This switch expands tabs to spaces within the output file.
  • -T, --initial-tab: This command option inserts initial tabs into output to line up tabs properly.
  • -u: The unified DIFF is produced with the -u switch. A unified DIFF displays both the old and new versions of a line together, with three lines surrounding them.
  • -U n, --unified[=n]: This command is similar to the -u command, except that you can use the n variable to control how many lines surround the unified text.
  • -v, --version: This parameter prints the version of the DIFF command file.
  • -w, --ignore-all-space: The -w command causes DIFF to ignore all white space within a file.
  • -W n, --width=n: If you generate a two-column output by using the -y parameter, then you can use this command to set the width of the columns. The default value is 130.
  • -x regexp, --exclude=regexp: This option is for an exclusion. Basically, this command allows you to exclude files within a directory, so that those files are not compared.
  • -X filename, --exclude-from=filename: This option is a variation of the exclude. In this case, you specify a filename, but rather than the filename that you specify being excluded, the file that you specify is a text file containing an exclusion list.
  • -y, --side-by-side: This option formats the output into two columns.

The DIFF command can be used to ignore areas with a lot of differences and to scan for more subtle difference within the files. It’s especially useful when you’re building scripts and batch files based on generic versions you’ve copied or downloaded from other places. You can quickly find out what changes you’ve made when you’re debugging the scripts.

Editor's Picks