Now You Have Two Problems: Explaining Regular Expressions
From a post in The Perl Community, a Facebook Group:
Doubt 2: in perl:
s:\.(bat|pl)$::io;
s:^.*[\\/]::o;what s the above code does especially what is the use of ::io and ::o, here $ means ends with .bat or pl right . simply it starts with ‘s:’ wr the output of the cmd will store pls explain. Sorry if it is not valuable question as a beginner i dont have much knowledge on perl and like this so many doubts are , if u people dont mind i would like to clarify all the doubts in this forum.
OK, I’m not in the forum here, but regular expressions are good and fine things, so I’ll explain here.
s means substitution, and is usually written like s/ / / or s{ }{ }, and the pattern matched in the left-hand section is replaced by what is in the right. Perl allows many things to be separators — too many? — but here, they’re using :. Don’t do that. I’ll rewrite with curly braces, or {}.
s{\.(bat|pl)\$}{}io;
s{^.*[\\/]}{}o;
For both, the end with {}, which means that whatever matches is replaced with nothing, not even a space. Below we match the letter e in the string and remove it.
my $string = 'regular expressions';
$string =~ s/e// ;
print $string
>>> rgular expressions
Then there’s the modifiers. /i and /o. /o means optimized, and it often doesn’t work as well as you might want.
To prove it, here’s a benchmark you can run:
#!/usr/bin/env perl
use strict;
use warnings;
use feature qw{ say signatures state };
no warnings qw{ experimental::signatures };
use Benchmark qw{:all};
cmpthese(
10_000_000,
{
'Nonoptimized' => sub {
my $string = 'Regular';
$string =~ s/e//;
},
'Optimized' => sub {
my $string = 'Regular';
$string =~ s/e//o;
}
}
);
And a few results:
$ for i in {1..5} ; do ./benchmark.pl ; done
Rate Optimized Nonoptimized
Optimized 1373626/s -- -8%
Nonoptimized 1494768/s 9% --
Rate Nonoptimized Optimized
Nonoptimized 1245330/s -- -11%
Optimized 1394700/s 12% --
Rate Nonoptimized Optimized
Nonoptimized 1461988/s -- -0%
Optimized 1466276/s 0% --
Rate Nonoptimized Optimized
Nonoptimized 1497006/s -- -8%
Optimized 1623377/s 8% --
Rate Optimized Nonoptimized
Optimized 1615509/s -- -3%
Nonoptimized 1658375/s 3% --
Yeah, it improves the speed, but inconsistently. I used to use /o all the time, but I never use it any more.
The other modifier, /i, is case insensitive. m{e}i will match both e and E.
s{\.(bat|pl)$}{}io;
The important part is {\.(bat|pl)$}, and we’ll break that up.
Within a regular expression, . is the wildcard. It matches everything. \. escapes that, so here we’re looking for a literal period character, followed by (bat|pl), which is either the string bat or the string pl. With this regular expression, we fill $1 with either bat or pl, depending on what is in the string.
$_ = 'foo.pl';
s:\.(bat|pl):$1:io;
>>> foopl
We don’t necessarily want to capture the match, we just want to match it. Non-capturing matches are written like (?:bat:pl), which is another reason to not use : as your separator.
Finally, there are two special characters to note: ^ is the start of the string, and $ represents the end of the string. So, if the string is vampire.bat.py, we don’t want to match .bat, because that’s not at the end of the string. So .(?:bat|pl)$ only matches .pm and .bat (or capitalized), for removal.
Anyway…
The other regex, which is, again:
s:^.*[\\/]::o;
It starts with the carot, ^, which matches the start of the string. This is followed by .*. . is the wildcard, and * indicates zero-or-more instances of anything, followed by a character class, indicated by square brackets, containing a normal slash — / — and a backslash — \ — but since we use the backslash to escape special characters, we have to escape the backslash with a blackslash, so [\\/].
This regular expression matches everything up to and including the last slash or backslash, then replaces it with an absence. /usr/bin/perl would become perl, for example, but /usr/bin/ would just be an empty string.
I think the point is to turn /full/path/to/my/application/file.pl and turn it into file. And, if that was my goal, I would do something different.
my ( $output ) = $string =~ m{([^\\/]+).(?:bat|pl)$}i;
For more information, read Perldoc’s perlre, the official documentation for Perl’s regular expressions.