Developer

Script Teez: Search and identify files

Perl is a handy tool to have around. With the help of Vincent Danen, and his monthly Script Teez articles, you can employ Perl within a script file to take a single command-line argument as a wildcard, search for a string, and format the output.


This month, I am going to show you yet another Perl script that makes use of the find command. This time, however, I am going to approach it from a different angle and introduce a number of Perl functions I have not shown you before.

Instead of looking for a specific filename or wildcard, I am going to write a reusable script that will not have to be edited once it is initially written. This script will take a single command-line argument as the wildcard or filename to search for. It will then search the hard drive for any matching files and report them, but it will also use the file command to tell you what type of file it is. Because the output of the file command can be long at times, I will also format the output so it wraps nicely and provides a clean output.

Let's take a look at the script. I called it idfile, but you can give it any name that you like. Click here to view the script.

The first line, as always, points to our Perl interpreter. The next line prints out a message indicating that I am searching for all occurrences of $ARGV[0], which is the variable that represents the first argument on the command line. Knowing this, you can have more than one command-line argument by using $ARGV[1] for the second argument, $ARGV[2] for the third, and so on.

The next line takes the output of the find command and assigns it to the @search array. Here, use the find command to search from the root directory for anything matching $ARGV[0], which is your command-line argument. You don't want any error messages to be assigned to your array, so discard the error messages by redirecting them to /dev/null. The next line chops the @search array and removes the trailing newline character.

Next, start an if() loop. If the @search array exists, continue because the find command has found something to match your search request.

Then start a foreach() loop. This goes through the @search array one element at a time and assigns the current element to the variable $file. During the foreach() loop, I will work with the $file variable because that is the full path and filename of one match that the find command returned.

Next, you assign the output of the file command to the variable $line. This will contain the description of the file that the find command found and will be stored in the $file variable.

On the next line, a new array called @inline is created which contains the elements of the $line variable separated by a colon. The file command returns output like this:
./idfile: perl script text executable

You want to remove the filename from the returned output, so the colon is used as the delimiter to split the string accordingly. However, the returned output might also be something different. Click here to see an example.

Note there are a few more colons in this returned output, so you’ll need to be careful as your array may have more than just two elements depending upon the description of the file.

Because of this, you need to obtain the length of the array. To do this, use the code:
$num = @inline;

This assigns the number of elements in the @inline array to the variable $num. If the file processed was idfile, you would have two elements. If the file processed was gronk-1.0.tar.gz, you would have seven elements.

Now, you want to process each element individually, and you do this using a for() loop. The for() loop is written in a special way so you do not attempt to process more elements in the array than needed.
for ($i=0; $num>$i; $i++) {

This tells the for() loop to do a few things. The first time the for() loop is run, it will assign the $i variable a value of 0. This is your counter. Next, you tell the loop to execute if $num is greater than $i. The for() loop will terminate if the value of $i is the same as $num, which means that you only process the exact number of elements that exist in the array since $num contains the number of elements in the array. Finally, you tell the for() loop to increment the value of $i every time the loop reaches the end.

The next thing you do is test to see if $i is equal to 0. If it is, this means it is the first loop. This also means that you can use the variable $inline[0], which is the path and filename of the file you found. Because $i is equal to 0, you can write $inline[0] as $inline[$i] and get the same results.

Finally, if $i is not equal to 0, meaning it has gone through the loop at least once, you do another check. If $i is equal to 1, meaning the second time through the loop, you assign the value of $inline[1] (written as $inline[$i] since $i is equal to 1) to the variable $return. If this is not the second loop and $i is not equal to 1, concatenate the value of $inline[$i] to the existing variable $return separated with the ": " string. This preserves the description string that the file command returns.

Let's take a step-by-step look at this. The first time the loop is run, you assign the filename to the variable $file, like this:
$file = ``./grok-1.0.tar.gz:\n'';

The second time the loop is run, you assign the first part of the description string to the variable $return, like this:
$return = ``gzip compressed data, deflated, last modified'';

The third time the variable is run, you add any remaining element of the @inline array to the existing $return variable, which would look like this:
$return = "gzip compressed data, deflated, last modified" . ": " . "Sun Nov 5 13"

and so on until the full description from the file is placed into the variable $return.

Once you have finished and there are no more elements left in the array, you print to the console the value of $file. Next, print the OUTPUT format using the code:
$~ = "OUTPUT";
write;


This writes to the screen the formatted output called OUTPUT. This is defined with the last four lines of the script, beginning with the code:
format OUTPUT =

Here, you define OUPUT. The less than (<) character is used to determine how long the string to be printed should be. The tilde (~) characters prior to the string tell Perl that if there is any text left in the variable that cannot fit in the space determined by the number of < characters, to wrap it and continue printing until everything in the variable has been printed. The next line contains the variable to format, which in this case is $return. Finally, the sole period at the end of the format code tells Perl that you have finished defining the OUTPUT format.

Let's skip back a few lines in the code. You print No Matching Files Found! to the screen if the @search variable is empty, which means that the find command did not find any matching files. The last line of working code, prior to the OUTPUT formatting, is the exit(0) command, which tells Perl to exit the program with an error level of 0.

And that's all there is to it! I hope you’ve learned a few things with this piece of code that I’ve built upon other scripts I’ve explained in the past few months. I’ve shown you how to pass command-line arguments to a program and use them by utilizing the $ARGV[0] variable and how to format what you print to the screen using the format command.

Perl is quite an amazing language, and you can do a lot of things with it. This is just the tip of the iceberg. In fact, the external calls made to the find and file programs could have been done completely with Perl by using some third-party Perl modules that you can find on CPAN. Using the external commands is a quick and dirty way of accomplishing the same thing without installing any extra modules because the file and find commands should be standard on any Linux distribution. Enjoy!

About Vincent Danen

Vincent Danen works on the Red Hat Security Response Team and lives in Canada. He has been writing about and developing on Linux for over 10 years and is a veteran Mac user.

Editor's Picks

Free Newsletters, In your Inbox