Web Development optimize

Perl: My secret weapon


This week I have been extremely grateful that Perl is in my bag of tricks. There are two distinctions that you can make in the world of programming: people who work on lengthy projects, and people who work on short projects. I fall into the latter group, most of the time. Frequently, projects that I work on last less than a day.

Earlier this week, my boss asked for my advice with a project in Access; I gave him a suggestion of two. The next day, he came to me and asked me to do it. The SQL code he had written worked, but it was taking over four hours to process the 97,000 plus records. Access tends to be rather speedy, so I took a look at his query. While appeared simple, it essentially did a massive correlated query with a group aggregate function back to the original; not exactly efficient.

Not being very familiar with Access at all (I avoid it like the plague, to be honest), I looked for another way to do things. I really cannot stand Access. It is too “basic‿ and “helpful‿ to accomplish any real work at all, and relational databases are far too difficult for the average user to understand. It is a crippled product, in my mind. Not from a technological standpoint per se (although it is technologically crippled too), but from a suability standpoint. You simply cannot accomplish tasks with it very well.

Me being me, I immediately dumped the Access tables to CSV format, loaded them up into MySQL, and re-ran the query with the needed modification. While it did work and worked just as well as Access, it was only marginally faster. So I dumped the data back out of MySQL to a true CSV file (Microsoft products have a hard time with making a good CSV file) and decided to get buck wild with Perl on it.

Thirty minutes later, I had a Perl script that did in literally five seconds what took Access over four hours to do, and MySQL was probably going to chew on for three to four hours.

That’s the magic of Perl.

Perl is my secret weapon. It lets me do in a fraction of the time what other coders can spend all day on, for those small utility projects. It executes insanely fast for an interpreted language. It has an unbelievable number of nice libraries. And it has a great community.

I cannot count the number of times that I spent a fraction of the time needed for other tools to write something in Perl that ran faster than the other languages would run it. Whenever I get a dinky request for a one-off task, my first thought is always, “can Perl do this for me?‿

Indeed, the only real problem that I have with Perl is that its syntax for complex data structures like arrays of hashes of arrays can be a bit confounding. I am currently working on a short Perl script that I spent nearly a day wrestling with over these kinds of syntax issues. But that aside, Perl is a fantastic tool for the quickie jobs.

How do you use Perl to simply your life?

J.Ja

About

Justin James is the Lead Architect for Conigent.

55 comments
apotheon
apotheon

I haven't had a look at .NET 2.0, but I wasn't impressed with 1.0 in comparison with Ruby. 1.0's object orientation is like a less painful Java, which makes it still significantly less elegant than Ruby's.

JIm.frazier
JIm.frazier

One question...you mention elegance.... Elegance in what context ?

apotheon
apotheon

My thoughts on elegance start with a weblog entry I originally wrote for my TR blog, then later imported to my main weblog. It is titled (appropriately enough) [url=http://sob.apotheon.org/?p=113][b]Elegance[/b][/url].

apotheon
apotheon

whoops duplicate post I have no idea how TR managed to misplace this one.

JIm.frazier
JIm.frazier

In my other life I am a classical musician. Most every musician would characterize Mozart's music as the most elegant music ever written. If elegance is an idea rather than an application - Mozart's music is characterized by being made up of simple beautiful lines. The music gains it's complexity though the interraction of the lines forming a complex whole. It's complexity is also derived through it's adherence to the musical form of his time (construction). The form is rigorously enforced in all cases except where the artist is allowed some level of expression but never to excess. The form (ie programming language or method) can vary but the excesses are always excluded. Mozart hated complexity for complexity sake. However, even though Sonata Allegro form is much more complex in nature than say Theme and Variation or Rhondo it is still an accepted form... Nice idea Jim

JIm.frazier
JIm.frazier

Never used Ruby...but I wrote Perl for a long time including some modules for my own use. Take a look at .Net 2.0 and .Net 3.0. The SOA implementation through WCF is really cool We are planning to move all of our code in that direction. It positions you to dump Microsoft someday (if you choose). The SOA Class implementation exposes itself as a proxy (even for local applications) and uses SOAP messages for everything...An application really does not know if is talking to a Windows or a Linux remote class handler....Good to talk to you...I am a fan of Perl for quick and dirty stuff.. I will take a look at Ruby Jim Frazier

apotheon
apotheon

. . . then Ruby is quick and clean. I'm unlikely to try out .NET later than 1.0 any time soon. I've already dumped Microsoft as a platform, both personally and professionally, except for testing cross-platform stuff (mostly using IE to test web development). As such, I don't have a whole lot of opportunity or desire to use Windows-centric development environments. It might be worth reading some more of the technical literature on the subject, though, just so I know what I'm "missing" -- the better to be thankful, perhaps.

apotheon
apotheon

Apache servers are by far the majority of the webservers out there. Whoever told you Microsoft ran 90% of the webserver market is full of crap. Doing web development, I am most definitely tapping the deepest well for development platform market share: it's the ASP.NET people who are limiting their market.

JIm.frazier
JIm.frazier

You might someday want to tap the other 90% of the marketplace.....:) I do wonder sometimes what we would do if MS tunneled great chatting with you

Raja B
Raja B

I have seen that SQL code can be developed in ways that can make it run slow /extremely efficient. If you are able to use some technique in Perl that made the logic run fast the same technique can be applied to SQL programming to do the processing in proportional amount of time. Put out the SQL code and the perl code you have......

JIm.frazier
JIm.frazier

It has been my experience that the speed in a Sql Query is completely dependant upon the design and configuration of the Sql Database and the Sql Engine (ie Sql Server or Oracle) Perl is just handing off the queryText to the sql engine...Now if you saying that you use simple queries and reassemble the data inside the Perl Script...That could be the case. However, I have found it a better practice to configure the database/database engine for efficiency rather than add the complexity in the application code. Jim Frazier

Tony Hopkinson
Tony Hopkinson

If the boost in processing the data off the server is greater than the time to load and save in a live environment, then it's the best option, if performance is critical. I consider it an option of last resort though, using it either to get user feedback in a lengthy operation, or as straight performance boost. A lot of the time you can design these necessities out of your schema, however like any optimisation strategy you have to target, the right thing other wise you end up slowing down or complicating a lot of other things for a minimal gain.

apotheon
apotheon

Implementing application logic in SQL may very well result in significant loss in performance, as compared with doing so in Perl (or something even faster, like C, if you're willing to write C code for churning data). While there's something to be said for doing everything in one language (SQL) to cut down on programmer time spent on the task, for anything of more than a few lines of SQL, Perl more than makes up for that advantage by virtue of how quickly program logic can be developed by a skilled Perl hacker. It's a judgment call, really -- but I'll go for the Perl choice basically every time, not only for program performance and rapid development for nontrivial tasks, but also because SQL as a language is like a spork in the eye to me.

apotheon
apotheon

Something that's tied to a particular framework isn't the new query language we need. The last thing we need is vendor lock-in on a query language.

Justin James
Justin James

... but it may already (sort of) be here, in the form of LINQ. I need to check out LINQ more deeply, but people who have used it seem to love it. It isn't for everyone (it's a .Net thing), but the idea seems to be to bake OR msppings directly in at the Framework level. Assuming that I know what I am talking about, and I might not. J.Ja

apotheon
apotheon

What we need is a new query language.

Justin James
Justin James

There is also a fine line between application logic which belongs in the application, and business logic which belongs in the database in the form of stored procedures, materialized views, triggers, constraints, etc. I often view SQL as my Lex Luthor, but in reality, it is simply an ugly, inelegant language that happens to be the only available choice for certain tasks so I have to use it. J.Ja

apotheon
apotheon

I was making reference to the fact that poor software architecture contributes to bloat by making it difficult to slim down your code during maintenance keep track of errant bits of the application during refactoring. Instead of changing existing parts of an application, for instance, tight coupling in code can lead to programmers needing to essentially write wrappers for application modules to meet deadlines. Bloat happens when parts of a program are too closely coupled (among other reasons) to facilitate maintenance.

Justin James
Justin James

The C# language isn't bloated, but you need to fire up nthe whole .Net runtime to use it, that's the bloat. J.Ja

JIm.frazier
JIm.frazier

You can keep things separate by defining your business rules in their own components and then implement them through C# in the triggers/stored procedures. This way you can be sure that the same business rules are applied in the database as in the applications. Don't understand your bloat comment. It has been my experience that a well engineered/architected application would be the same regardless of the platform...I worked on Unix for a long time and you can engineer a bad piece of software on Unix just like you can on Windows and vice-versa.

apotheon
apotheon

I quite like the idea of being able to use a slicker language like C# for stored procedures. What I don't like is the way that leads to data and application logic being tightly coupled. The end result will, I'm sure, be the sort of instability and bloat that's common to third party Windows applications in general that make use of Microsoft technologies. I'd be inclined for the sake of maintainability, security, and modularity, as well as other reasons, to keep application algorithms and data fetch-and-carry strictly separated as much as is practical. There are exceptions to this, but they tend to make use of wholly different programming paradigms than aren't really feasible with common SQL/C#/Perl/C/Java/etc. code.

JIm.frazier
JIm.frazier

In Sql Server 2005 you can write your stored procedure and triggers in C# which means that you can implement business rules inside the sql structure......

apotheon
apotheon

It occurred to me that the SQL code might just not be terribly well optimized. I believe that the Perl has a reasonably good chance to outperform the SQL anyway, but I'd still be interested in seeing how the actual code matches up. Of course, if it's proprietary code, sharing it might violate the law.

Justin James
Justin James

Sorry for no responses on this thread, TR's system failed to notify me of posts! I cannot post the code, sadly. And there is no way that SQL can do what the Perl can do nearly as well; in SQL, it used a correlated sub query (no way around it, we were ranking within a group), whereas the Perl could whip right through it. SQL is sort of functional-programming esque; in a nutshell, it applies the same logic to every row equally. As a result, anything requiring incrementation just is not going to happen cleanly within a single SQL command. J.Ja

Justin James
Justin James

Yes, when you need to jump through hoops like that, it usually means that you need to add a column or two of keys to your data, or that you are missing data entirely that should have been in there in the first place. J.Ja

Justin James
Justin James

I agree, I just usually end up not using PHP. ;) J.Ja

Tony Hopkinson
Tony Hopkinson

that go about. Two pass sorts, correlated subqueries, inline queries, functions , joined views, cursors, CTEs ... They are all indicators that the data base schema does not match your requirements and a poor attempt to turn a set based language into a procedural one. It's like using a telephone directory to find the address of a phone.

Jaqui
Jaqui

;) I know, you aren't impressed with php, but it's a great example of the concept: use SQL to run a query to get a result set. use another language to work with the result set. You can have the flexability of sql for the actual db calls, and the power of a language better suited to manipulating the results so you don't get the performance hit of using sql to manipulate the result set.

Justin James
Justin James

Tony - I am sadly familiar with that! It is one of the perils of SQL programming that performance drop off occurs exponentially, not linearly like in most procedural languages as the data set increases in size. That is one reason why most developers should not be writing SQL, they have no idea how to optimize these things, what works well, what doesn't work well, individual database peculiarities, etc. That's one reason why I tend to use a full dataset instead of an extract when performing development. If it is too slow to be bearable in development, how will my users be able to use it? Gotta eat my own dog food before serving to anyone else, and it is better to see the performance hit in development rather than to discover it on the day of deployment or in the final stages of QA/QC. J.Ja

Tony Hopkinson
Tony Hopkinson

If you can't identify the members on which you want to carry out an operation it will struggle. You can use cursors but they tend to be very inefficient, correlated sub queries can work, but use one on a large numbers of rows, you create a performance problem. You may be able to live with it. Generally things go wrong when some one makes a bad assumption, like it's fast for a 100 records so it will be acceptable at a 100,000. Unfortunately you tend to suffer almost exponential degradation.

hotfusionman
hotfusionman

I once liked Perl a lot but find Ruby to be better. You can still essentially write Perl syntax if you want, but Ruby's built-in classes organize Perl's built-in functions (and many more) in a way that makes it trivial to know where to look up the method you need -- no more "in scalar context, function foo does this, in array context, it does that". If that was all there was to Ruby it'd be enough to warrant a look by Perl coders, but there's so much more good stuff (the language creator's goal is to make programming enjoyable) like mix-ins and metaprogramming, all readable and easily understandable. I'd take Perl over Python most any day, but Ruby over Perl any day at all.

apotheon
apotheon

. . . but I still prefer Perl for actually getting work done. Ruby has some minor hang-ups, mostly in the implementation (rather than the language design). For instance, the interpreter's slow, and threading is subtly broken. That'll get fixed, though. The main problem I have with the language is that while it's easier to pick up enough of the language to write a three-line script for some repetitive munging than Perl, it's much more difficult to pick up enough Ruby to write a fifty-line script to do more complex tasks. As such, I tend to go back to Perl where you don't have to know the language as much to get complex tasks done (and where I know more of the language anyway). While the Perl language is huge and sprawling with tentacles reaching into every crevice and cranny of your problem domain, you only ever need to know a limited subset of it to get work done. With Ruby, you need to know a greater percentage of the language and, in fact, a greater quantity of the language -- to say nothing of the fact that doing anything more complex than the supremely trivial requires you to know OOP techniques more thoroughly than Perl (which doesn't bother me, but might bother quite a few sysadmins who are used to Perl). . . . then again, for OOP, you've got to love pain to try to do anything significant in Perl. Use a more functional style with Perl, and you'll be better off. Fake OOP (if you have to) from time to time with closures. Avoid the official Perl object model like the plague: it sucks. Ruby's, on the other hand, is bar none the most elegant object model I've ever touched. It's also the easiest to use. What I'd really like to see for sysadmin scripts, though, is some variety of Lisp. Now, there's a family of languages for doing lots with very little code. For making it easier for most sysadmins to read, you could make use of the UCBLogo dialect. The only downside is the lack of infrastructure around it, because people don't already use it for these purposes much -- and we all know that with CPAN no other language compares to Perl for surrounding infrastructure to be used to Get Things Done.

DanLM
DanLM

I spend alot of time doing server scripts. I prefer .sh when I do server scripts because they are transportable to most nix flavors. But, a draw back of sh scripts is no tables/arrays. I can think a primary example of a utility script that I need to write, that perl just is standing at the line going. lol, look to the rest. then come back to the best. It is a backup script for cvs repositories on a unix box where the backups will be saved on a network drive that the cvs admin will have access to via windows. She also has mapped the cvs repository drive(which is unix) so that it looks like a regular folder. I want to encrypt this data after I comperss it. And the backup needs to run via crontab at midnight or somethin like that. I want the backup and restore process's to be as simplified as possible. One backup/One restore. Sooooooo. Backup with perl that use's a cpan module for making .zip files(found it, its there). Use openssl to encrypt it. This is the unix perl script. Restore with perl from the desktop, again using the cpan module for the .zip file and again useing openssl. This time for the decrypt process. I know, I have to install the openssl on her desk top. But, no biggie. The lady should be able to restore the repositories if I do this correctly by just launching the perl script from her desktop. All procedures will be in same script language and this actualy should be pretty straight forward and small for both scripts. Depends how creative I get with qualifers in the backup names. Ie: 20061020-cvssource-dailybk.zip.enc Or something like that anyway. dan

apotheon
apotheon

Try carrying a serious piece of admin shell scripting from a default install of Linux to a default install of FreeBSD some time. You'll discover just how unportable a bash script can be when trying to run it in tcsh. Perl is far more portable than shell. Hell, you can carry a Perl admin script to Windows if you write it carefully for portability. Try that with a shell script, without essentially installing half of unix on top of Windows via something like cedega.

jmgarvin
jmgarvin

Ok, with that little jibe out of the way and the fact that I agree Perl is FAR more portable (although wonky things ALWAYS happen to Perl when you port to Windows), Bash is pretty portable between *nix (assuming you install bash in Unix). BSD supports bash scripting (portablility wise) pretty well *IF* you have managed to get the third party bash shell installed, but I have found issues with Sun Unix and bash no matter what you do. However, porting bash to Windows is not only a nightmare, but pretty pointless. It would be almost like porting Windows script to *nix.

Tony Hopkinson
Tony Hopkinson

clever. Too clever in someways, there again I had a linux box with MySQL on it and I needed to FTP files off a windows box, that wasn't under my control and then import them into the database every five minutes. Even as a complete noob in perl, linux and mysql it was barely two pages long. Never heard a peep out of it for the next fifteen months I was there. An elegant solution to very messy problem. It was full of comments as well. LOL

forhire
forhire

I hear this claim that Perl is obfuscated and difficult to maintain quite often. It doesn't have to be. You can write Perl in a way similar to C or C++. You can use subroutines and objects if you want, you can indent your text if you want. The $ indication for a scalar variable, @ for an array, etc. actually make the language a bit easier to understand in many ways. Rather than having to develop a linked list routine in C, you can just use an array, hash, or complex data structure to store whatever you need to store. It's simple and elegant. I know people get a bit confused by regular expressions, but with a bit of practice, they are easy to use as well. -- System Administration and Security information http://SecurityBulletins.com/

Tony Hopkinson
Tony Hopkinson

Verbose obfuscation is what you don't want to see, you think you understand right up to the point where you didn't :D

apotheon
apotheon

Minimizing the number of keystrokes in the program's source is not elegance. As I pointed out, that's just gratuitous typing efficiency, and leads to obfuscation. A language that supports typing efficiency without requiring cryptic one-character function names and variables (and the like) is a language that fosters elegance, however -- and writing code for succinctness ("terse enough, but no more terse than that") even in source code form lends itself to elegance as well. In other words, when deciding to write code that is not so terse as to be unreadable, don't throw the baby out with the bathwater by writing code so verbose nobody can read it anyway without refactoring it. I'm sure you already know this, of course. I just feel that making it explicit in this discussion is a net win.

Tony Hopkinson
Tony Hopkinson

succinctness and terseness aren't necessarily synonymous. cp vs copy is the difference I was trying to get across. One of my readability rules is I try to stay away from abbreviations, that necessarily makes my code look more verbose, in a compiled environment it doesn't make it less elegant though. That's why I separated functional elegance from lexical efficiency.

apotheon
apotheon

Succinctness is a key element of elegance (though it's not the only element). Elegance, in fact, is something I might define as eschewing the gratuitous. Gratuitous code, repetition, unnecessary features, and so on are all enemies of elegance. There is a limit to how little you should be writing if you wish to achieve elegance, of course. That's the point where something stops being succinct and starts being terse. That's the point where you leave behind a language like Ruby (which lends itself to elegance) and find yourself staring at source code written in APL. That's also the point where you start using single-letter variable names, and start golfing your code to the point where it becomes obfuscated. Perl isn't exactly an elegant language, but it is a language that enables elegant programming exceedingly well. It also enables obfuscated, terse, inelegant programming, as well as unnecessarily verbose, overly commented, inelegant programming. Most people don't do the latter with Perl. A lot of people do the former (golfing down to obfuscated strings of special characters), however. The reason, I think, is simply that people think cleverness in the use of the language and as-short-as-possible programs are a good thing, and damn the consequences (such as obfuscation). In the attempt to capitalize on some of Perl's strength, they overdo it, and end up with something that has passed the realm of elegance and ended up looking like APL, or assembly. That doesn't mean that writing COBOL in Perl is the answer, though. So . . . "terseness" is something to approach in search of elegance, but don't quite get there. Gratuitous elimination of the volume of code is as inelegant as a bunch of gratuitous verboseness of code. Keeping your code succinct (which I might define to mean "as terse as necessary, and no more terse than that") is definitely an important characteristic of elegant source code. As I mentioned in the "Perl: My Secret Weapon" discussion, I've previously written at some length on the subject of [url=http://sob.apotheon.org/?p=113][b]elegance[/b][/url].

Tony Hopkinson
Tony Hopkinson

There are several paths to elegance, one of them is to make use of side effects, another is to make use of maths, another is domain based. Terseness has value in an interpreted environment because of parsing overhead, the elegance of the function has little or nothing to do with how terse it is.

forhire
forhire

I've found that no matter what the language, there are people who write very verbose and others who opt for an elegant solution. In shell scripting, I commonly see people write a whole page of code for something that can be easily done in 1 line using the 'xargs' command. I have rarely heard people complain that shell is obfuscated, though. Perl, in one sense, can be thought of as a standardized form of shell scripting. Rather than separate programs with obscure options and pipes to hook the pieces together, Perl has data structures and consistent syntax. Perl allows many variations in syntax, so you may run into the equivalent of local dialects when reading it. Rather than use an if..then statement, someone might opt for something like: print "What is the question\n" if $answer==42; -- System Administration and Security information http://SecurityBulletins.com/

Tony Hopkinson
Tony Hopkinson

They would be considered very verbose, like me :D A perl guru could probably do one of my 100 line scripts in about four, but I wouldn't be able to maintain it easily then. I can figure out anything eventually, but at work I can't afford the time to do so. Even less can I afford being the only one available who can do so, that would get me sideways looks from the boss.

Justin James
Justin James

While Perl lends itseld well to being onfuscated, it *is* possible to write clean code in it. And not only are regex's not hard to get the gist of, every languahe has the same syntax more or less for them anyways. And it isn't like Perl is worthless without regex's... it is still a good procedural, dynamic language. J.Ja

Justin James
Justin James

While Perl lends itseld well to being onfuscated, it *is* possible to write clean code in it. And not only are regex's not hard to get the gist of, every languahe has the same syntax more or less for them anyways. And it isn't like Perl is worthless without regex's... it is still a good procedural, dynamic language. J.Ja

apotheon
apotheon

Not only are regular expressions easy to use once you get the hang of 'em, but Perl's regular expressions are part of the reason that many programs involving several libraries and fifty lines of code just for handling text strings in C/C++ or Java can be whittled down to half a dozen lines of code in Perl. I've gotta agree with everything you said in that post, forhire.

bigbigboss
bigbigboss

I used APL in several lifes past. It was worse than Perl. I could understand what I wrote two days after I wrote it. It was a great example of a "write only" language. Perl is not as bad, but don't get me to read anyone else program. Just tell me what it supposed to do and I will rewrite it myself my own way.

apotheon
apotheon

Actually, implicit variables can [b]improve[/b] readability when used properly. In general, it's usually a good idea to declare and use variables explicitly, but sometimes it's a good idea to use them for readability purposes (among others). Not everyone uses them well, of course. As with any other language construct, misuse can lead to problems -- in this case, mostly with readability.

Justin James
Justin James

When people write PErl with implicit variables, I agree that it is hard to read. When someone is declaring their variables, Perl is not hard to read. then again, I feel that is true for any language. Declare your variables and make things easy on yourself or the next person to maintain the code. J.Ja

Justin James
Justin James

"too clever by half", if I recall by Queen's English (and British history) correctly. Perl is incredibly terse, dense, and clever, and the clever tricks are what make it impossible to read. when people do a million tricks with the implicit variables, the code shrinks in size and the code uses less resources, and the next person to touch the code goes insane. :) J.Ja

The Lost Boy
The Lost Boy

I use it for nearly everything because it is so fast and easy. One example, identifying police system records with mis-spelled car manufacturers and auto-correcting using Soundex.

JIm.frazier
JIm.frazier

best thing about Perl is that it uses 'C' language. I wrote Perl from 1996 - 2000 on the Windows platform doing everything from Web Development (PerlIS, and ASP) to admin scripts, to database programming. Moved to C# in 2001 with .Net. Haven't used Perl since "no need"....

Justin James
Justin James

... to dynamically generate and deploy configurations to over ten thousand routers, test the configuration, double check connectivity, and flag routers where the configuration change failed or (worse!) the router didn't respond after the update. Sure, I could have cobbled it together in csh with grep, sed, tee, and a few other parts, but with the CPAN telnet module just sitting there, how could I resist? It took a few hours to write and test the script, and the only thing I didn't do that I wanted to, was to multithread it. It is such an awesome little language. :) J.Ja