General discussion

Locked

Parse data with split

By Mark W. Kaelin Editor ·
Tell us how you prefer to parse data in Perl. Have you used split before?

This conversation is currently closed to new comments.

1 total post (Page 1 of 1)  
| Thread display: Collapse - | Expand +

All Comments

Collapse -

writing a lexer is easy in perl

by swstephe In reply to Parse data with split

Split is okay for simple jobs, but as the example stated, it doesn't handle quoted strings. Moving on to more complex formats requires a lexer to match patterns and return tokens and possibly values. After writing 100's of lexers in perl, I came up with a simple format that is fairly easy to read:

my @lexdat =
(
['ADD', qr/\+/, 0],
['SUB', qr/\-/, 0],
['MUL', qr/\*/, 0],
['DIV', qr/\//, 0],
['INT', qr/(0|[1-9][0-9]*)/, 1],
);

sub lexer
{
my $str = shift;
$str =~ s/^\s+//g; # skip whitespace
return ({type => 'EOS'},'') if $str eq ''; # end-of-string
foreach my $lex (@lexdat)
{
if ($str =~ /^$lex->[1]/g)
{
my $token = {};
$token->{value} = ($lex->[2]) ? $1 : $&;
$token->{type} = $lex->[0];
bless $token,'token';
$str = substr($str,pos($str));
return ($token,$str);
}
}
die "syntax error: invalid token at '$str'";
}

The example above implements a simple calculator syntax. Each call to "lexer" will return the next token, (blessed to the token name), and the rest of the string. It uses "qr" to quote a regular expression for each token and a flag of whether to keep or throw away the value.

Back to Web Development Forum
1 total post (Page 1 of 1)  

Software Forums