I often find myself anticipating future contract requirements and wanting to add to my skill set. But staying on top of emerging technologies reminds me of trying to keep three beach balls under water at once. As soon as I have a solid grasp of one new programming language or concept or hardware interface, two more pop up.
A client engagement surfaced a few months ago that called for me to work with Python. I figured the easiest way for me to get up to speed would be to apply my knowledge of a similar language and translate an existing script to Python. After some investigation, I found out that Python was somewhat similar to Perl, a language I know fairly well. You can read about the Perl version of this script here.
I’m not going to show how I did the conversion but rather walk through the Python version of this script and illustrate the key statements that are used. The script I chose imports a delimited flat file containing 3,000 inventory items, extracts the item description field (which is variable-length) and converts it into three 30-character fields, and rewrites the file. You can see the script in its entirety in Listing A.
In Python, loops and flow control statements aren’t terminated, which can get a little confusing. You'll notice that I've added comments to indicate where code blocks are terminated. This helps me better organize and read my code.
Here we go.
import string #use string library
import re #use regular expression library
These first two lines import the string and regular expression classes. Python is fairly object-oriented and allows for classes to be imported, increasing the expandability and modularity of one's code.
inputfile = "c:\Work\CNET\Inventory.txt" #set "inputfile" to be the name of the delimitated file.
outputfile = "c:\Work\CNET\inv.txt" #set "outputfile" to the name of the output file.
This first fragment shows how to define a string variable in Python (e.g., variable_name = “string”). Python statements are terminated at the end of each line. Comments begin with a pound sign (#).
f = open(inputfile) #open "inputfile" for reading
o = open(outputfile, "w") #open "outputfile" for writing
These two statements create the file handles necessary for importing and exporting the two files. A file handle is simply a data structure that Python uses to access external files. When the open statement is used with only one argument, the file is opened for reading only; the “w” indicates that the file is opened for writing.
while (1): #process the input file
offset =0 #the first 30 charater field offset
line = f.readline() #assign the current line to "line"
if not line : break #exit the while loop at the end of the file
The while statement will execute the code contained within it until the condition is false. I used 1 because I chose to use the if statement to exit the while loop from inside. Python uses only indentation to block code; this requires you to pay close attention to what you are doing but helps ensure readable code.
The f.readline() statement uses the readline method for file handle objects. The method returns a string containing the current line of the file and moves the pointer to the next line. When the last line of the file is read, the pointer is null. The statement
if not line : break
is used to exit the loop because line will be equal to null at the end of the file, so the loop will exit accordingly.
line = line.rstrip() #remove the newline character (and any ending whitespaces)
cols = line.split('\t') #split on tabs
The rstrip method will remove any ending white space characters from the string object and return the new string. The split method takes one argument, the character to be split upon, and returns a list (or array) of strings. These methods were imported at the beginning of the script when the string class was imported. I think now would be a good time to also point out that object types are not differentiated syntactically in Python. The object type is simply defined at the time the variable is declared.
splitme = cols #set "splitme" to be the data from the 7th column
splitup = list(splitme) #set "splitup" to be a list of characters from the string "splitme"
The first line above shows how to point to an element in an array. The array is indexed from 0 to (n-1), where n is the number of elements. The second line demonstrates the list function, which takes a string as an argument and returns a list of characters.
p = re.compile('\s') #compile a regular expression object "p" to find spaces.
In this statement, I have used the regular expression class to create a regular expression object. The object must be “compiled” using the compile method. This method takes a regular expression as an argument that will be used in pattern matching. Here, ‘\s’ is used, indicating that a space is the only thing being sought. Different expressions could be used to match elements such as white spaces, any alphanumeric character, or any numeral.
if len(splitup) > 30: #if the item description contains more than 30 characters
This statement introduces the len function. This function takes a list as its argument and returns an integer whose value is equal to the number of elements in the list. I think it's quite handy.
for i in range(11): #count from 0 to 10
I found two interesting things when creating for loops in Python. The first is that for loops are iterated over a list of elements. I could have said
for i in [0,1,2, 3, 4, 5, 6, 7, 8, 9, 10]
but I chose the range function—which is the second interesting thing I discovered. The range function generates a list of integers automatically. It takes one or two arguments. If one argument is given (n), a list from 0 to n-1 is generated. If two arguments are given (m,n), a list from m to n-1 is generated.
Being able to iterate over lists is rather useful because you can iterate over a list of any object type. You could, for example, use a list of strings or a list of regular expression objects.
m = p.match(splitup[(30 -i)]) #find the first space
The match method of the regular expression class takes a character or string as its argument and compares it with the regular expression that was compiled. It returns true if the expression was matched.
As I mentioned earlier, loops and flow control statements aren’t terminated in Python. In Listing B, you can see I've added comments to indicate where code blocks are terminated.
newguy = string.join(splitup,'') + '\t' #make the list a string
The join method is the opposite of the split method. It takes the list of strings or characters to be joined as its first argument and a separator as the second. To concatenate two strings, the plus sign (+) is used.
The portion of code in Listing C didn't change from the Perl version of the script.
When the if condition preceding it is not met, the else conditional statement shown in Listing D is executed.
In Listing E, the write method for file objects is used. The only argument passed to it is the string to be written to the file.
Here's a recap of the elements I covered in this Python script:
- · Objects and classes
- · Variables (object and class types, scalars, and lists)
- · Flow control (while and for loops, and if/else statements)
- · Functions (e.g., len)
- · Methods of objects (e.g., join and split)
- · File I/O
Python is a fairly simple language to pick up. The conversion of the Perl script took about six hours. Check python.org for some great information. Its Windows download contains an editor and debugger, so writing, testing and executing code is easy and pleasurable.