Search

The Online Encyclopedia and Dictionary

 
     
 

Encyclopedia

Dictionary

Quotes

   
 

Perl

Programming Republic of Perl logo
Programming Republic of Perl logo

Perl, also Practical Extraction and Report Language (a backronym, see below), is a programming language released by Larry Wall on December 18, 1987 that borrows features from C, sed, awk, shell scripting (sh), and (to a lesser extent) from many other programming languages.

Contents

Rationale

Perl has been ported to over a hundred different platforms, and the mission of making the language available everywhere is commemorated in the name of the main newsgroup for discussion of issues relating to perl development, "perl5-porters". Perl is widely used in web development, finance and bioinformatics, and indeed in most sectors where a premium is placed on rapid development and the availability of a large number of standard and 3rd-party modules. Because of its wide availability, Perl, like Java, is often considered to be a platform in its own right, packaging a Unix-like environment in which software can be written once, and generally run without modifications almost everywhere. In addition to its numerous binary ports, Perl can, with only 6 reported exceptions, be compiled from source on all Unix-like, POSIX-compliant or otherwise Unix-compatible platforms, including AmigaOS, BeOS, Cygwin, and Mac OS X. Custom binary ports are available for Windows (ActivePerl) and Mac OS Classic (MacPerl). [1] http://www.perl.com/CPAN/ports/

Perl was designed to be a practical language to extract information from text files and to generate reports from that information. One of its mottos is "There's more than one way to do it" (TMTOWTDI - pronounced 'Tim Toady'). Another is Perl: the Swiss Army Chainsaw of Programming Languages. One stated design goal is to make easy tasks easy and difficult tasks possible. Its versatility permits versions of many programming paradigms: procedural, functional, and object-oriented (though some claim that Perl is not a cleanly designed language because of its multiple paradigms). Perl has a powerful regular expression engine built directly into its syntax. Perl is often considered the archetypal scripting language and has been called the "glue that holds the web together", as it is one of the most popular CGI languages. Its function as a "glue language" can be described broadly as its ability to tie together different systems and interfaces that were not designed to interoperate.

Perl is one of the programming language components of the popular LAMP free software platform for web development.

Perl is free software, available under a combination of the Artistic License and the GPL. It is available for most operating systems but is particularly prevalent on Unix and Unix-like systems (such as Linux, FreeBSD, and Mac OS X), and is growing in popularity on Microsoft Windows systems. As an example of Perl in action, Wikipedia itself was a CGI script written in Perl until January 2002. Another example is Slashdot, which runs on the Perl-based Slashcode software. When used on the web, Perl is often used in conjunction with the Apache web server and its mod_perl module.

Perl is regarded by both its proponents and detractors as something of a grab bag of features and syntax. The difference between the two camps lies in whether this is seen as a virtue or a vice. Perl votaries maintain that this varied heritage is what makes the language so useful. Reference is often made to natural languages such as English and to evolution. For example, Larry Wall has argued that:

... we often joke that a camel is a horse designed by a committee, but if you think about it, the camel is pretty well adapted for life in the desert. The camel has evolved to be relatively self-sufficient. On the other hand, the camel has not evolved to smell good. Neither has Perl.

In recognition of its ugly-but-useful nature, Perl has adopted the camel as its mascot; and the O'Reilly manual on Perl, Programming Perl, is known as the camel book: so named because of the camel that graces its cover.

Implementation

A huge collection of freely usable perl modules, ranging from advanced mathematics to database connectivity, networking and more, can be downloaded from a network of sites called CPAN, an acronym for Comprehensive Perl Archive Network. Most or all of the software on CPAN is also available under either the Artistic License, the GPL, or both. As of January 1, 2005, CPAN includes more than 7,000 modules, contributed by nearly 4,000 authors.

A major advantage of Perl for those who work with large quantities of data is that it is highly scalable, as it does not place arbitrary limits on the sizes of its built in data structures. Resources permitting, a Perl program can read an entire multi-gigabyte file into RAM. This flexibility in the face of bulky data dumps has made Perl popular among bioinformatics researchers, who routinely read substantial fractions of the human genomic sequence into Perl data structures.

CPAN.pm is also the name of the Perl module that downloads and installs other Perl modules from one of the CPAN mirror sites: such installations can be done with interactive prompts, or can be fully automated.

Although Perl has most of the ease-of-use features of an interpreted language, it does not strictly interpret and execute source code one line at a time. Rather, perl (the program) first compiles an entire program into an internal form (a parse tree) which is then optimized before being run. Perl's formal grammar is interesting in that it is context-free but cannot be parsed by a Yacc generated parser. This produces a number of differences from traditional interpreters. Any syntax errors are caught during the compile stage instead of later during execution. Subroutines calls can be placed in the file before the subroutines themselves are defined. And long-running programs are rather fast and efficient compared to strictly-interpreted languages, at the expense of short programs suffering the overhead of the compile-optimize stage. Since version 5.005 it has been possible to compile a Perl program to byte code to save the compilation stage on later executions, though the "interpreter" is still needed to execute that code. This could be seen as a precursor to Parrot.

Current version

The current version, 5.8.6, released on November 27 2004, includes Unicode support. Development of the next major release, Perl 6, is also underway. It will run on Parrot, a virtual machine which is being developed as a possible multi-language target architecture.

Built-in data types

Perl has three built-in data types: scalars, arrays, and hashes. A scalar holds a single value, such as a string, number, or reference. Arrays are ordered lists of scalars indexed by number starting at 0. Hashes, or associative arrays, are unordered collections of scalar values indexed by their associated key.

Scalars, arrays, and hashes can be assigned to named variables. The first character of the variable name identifies the type of data held within the variable. The remaining part identifies the particular value the variable refers to.

Names of scalar values always begin with '$', regardless of whether the variable referred to belongs to an array or hash. For example,

  $months[11]           # the 12th element of the array @months
  $address{'Jim'}       # the 'Jim' element from hash %address

Arrays are named with '@', indicating that multiple values are to be returned. For example,

  @months               # ( $months[0], $months[1], ..., $months[n] )
  @months[2,3,4]        # same as ( $months[2], $months[3], $months[4] )
  @address{'Jim','Bob'} # same as ( $address{'Jim'}, $address{'Bob'} )

Entire hashes begin with '%', as in %address.

Control structures

The basic control structures are similar to those used in the C or Java programming languages:

Loops

 label while ( expr ) block
 label while ( expr ) block continue block
 label for ( expr1 ; expr2 ; expr3 ) block
 label foreach var ( list ) block
 label foreach var ( list ) block continue block

where block is a sequence of one of more perl statements surrounded by braces:

 { statement(s) }

The label, which is terminated by a colon, is optional, but, if present, can be used by loop control statements.

  • The next statement moves to the next iteration of the loop identified by the label.
  • The last statement immediately terminates execution of the loop identified by the label.
  • The redo statement restarts the current iteration of the loop identified by the label.

Within nested loops, the use of the label with next, redo and last enables control to move from an inner loop to an outer one, or out of the outer loop altogether.

In the for statement, the semantics are similar to C. The first expression is evaluated prior to the first loop iteration; the second expression is evaluated prior to each iteration and the loop is terminated if it evaluates to boolean false; and the third expression is evaluated after each iteration, prior to deciding whether to perform the next.

The use of a continue block after a while or foreach statement allows code to be executed after each iteration - even if the current iteration has been cut short by a next statement.

In foreach, the var is a scalar variable. It is optional, and, if omitted, the default loop iterator variable $_ can be used instead.

In addition, any simple expression (that is, any non-block) can be executed repeatedly by following it by one of the qualifiers:

    while (expr)
    until (expr)

which have the expected meaning.

When combined with the do construct, this provides a form of looping:

  do block while (expr);
  do block until (expr);

The test is performed after each iteration, so that execution of the block is guaranteed to take place at least once.

Due to a linguistic foible of the Perl language, these are not regarded as true loops; the next, last and redo statements cannot be used inside a 'do' block.

By itself, do executes a string of statements once, and evaluates to the value of the last statement:

  do block;

If-then statements

 if ( expr ) block
 unless ( expr ) block
 if ( expr ) block
 else block
 if ( expr ) block
 elsif block
 else block

block is { statement(s) }

The expression is evaluated in a boolean sense. If it is numeric, any non-zero value is true. If it is string, any string of non-zero length except "0" is true. "" is false. "0" is false. "0.0" is true. "00" is true. "-0" is true.

An empty array or hash is evaluated in a scalar context, yielding false:

 my @a=(); print "True" if @a;

Nothing is printed in this example.

Statement modifiers

For simple statements, while, until, if, unless and foreach can also be used as statement modifiers:

 statement while Boolean expression;
 statement until Boolean expression;
 statement if Boolean expression;
 statement unless Boolean expression;
 statement foreach list;

The above modifiers cannot be nested, so this would be incorrect:

 statement if expression for list;

and should be written as:

 expression and statement for list;

The keywords for and foreach are synonyms and are always interchangeable.

As noted earlier, the modifiers while and until can be combined with do to create a multiple-statement looping structure.

Switch statement

There is no explicit switch statement in Perl 5 similar to that used in C. Switch statements are built with general blocks:

 SWITCH: {
     if (expression) { statement(s); last SWITCH; }
     if (expression) { statement(s); last SWITCH; }
     default statement(s);
 }

or:

 SWITCH: {
     statement, statement, last SWITCH  if expression;
     statement, statement, last SWITCH  if expression;
     default statement(s);
 }

or using elsif:

 if (expression)
     { statement(s) }
 elsif (expression)
     { statement(s) }
 elsif (expression)
     { statement(s) }
 else
     { default statement(s) }

In Perl 6 the switch statement is built with the given topicalizer and when "smart match" statements:

 given expression {
     when expression { statement(s) }
     when expression { statement(s) }
     default { statement(s) }
 }

Subroutines

Subroutines in Perl can be specified with the keyword sub. Parameters passed to a subroutine appear in the subroutine as elements of the local (to the subroutine) scalar array @_. Calling a subroutine with three scalar variables results in a @_ with three elements, usually referred to as the scalars $_[0], $_[1], and $_[2]. Also shift (from shell scripting) can be used, without specifying @_, to obtain each value.

Changes to elements in the @_ array within the subroutine are reflected in the elements in the calling program.

Subroutines naturally return the value of the last expression evaluated, though explicit use of the return statement is often encouraged for clarity.

An example subroutine definition and call follows:

 sub cube
 {
   my $x = shift;
   return $x ** 3;
 }

 $z = -4;
 $y = cube($z);
 print "$y\n";

Named parameters are often simulated by passing a hash. For example:

 sub greeting
 {
   my %person = @_;
   return "Hello, $person{first} $person{last}!\n";
 }

 print greeting(
   first => 'Foo',
   last  => 'Bar'
 );

Perl and SQL databases

DBI/DBD modules can be used to access most ANSI SQL databases, including MySQL, PostgreSQL and Oracle.

Perl 5

Perl5, the most current production version of perl, is an interpreter which processes the text of a Perl script at runtime. Thus, the debugger is invoked directly from the command line with

  perl -dw ScriptName.pl Argument1 ... ...

Note that there is no limit to the number of arguments: Perl is polyadic; any number of arguments can be passed to any Perl subroutine, in general. This concept of "no arbitrary limits" is present in most other parts of the language as well. Perl can read an entire file into a variable, if the machine has the memory for it.

Perl 6

Main article: Perl 6

Perl 6 is currently under development, and is planned to separate parsing and runtime, making a virtual machine that is more attractive to developers looking to port other languages to the architecture. Perl 6 plans to parse itself, and moreover expose its parser to the language itself. That is, a module could alter the grammar for the program that imported it.

Parrot is the Perl 6 runtime, and can be programmed at a low level in Parrot assembly language (PASM) or Intermediate Code (IMC or PIR, for Parrot Intermediate Representation).

Perl code samples

The canonical "hello world" program would be:

 #!/usr/bin/perl -w
 
 print "Hello, world!\n";

The first line is the shebang, which indicates the interpreter for Unix-like operating systems. (It is the most common, but not the only way of ensuring that the perl interpreter runs the program.) The second line prints the string 'Hello world' and a newline (like a person pressing 'Return' or 'Enter').

Some people (including Larry Wall) humorously claim that Perl stands for "Pathologically Eclectic Rubbish Lister" due to its philosophy that there should be many ways to do the same thing, its growth by accretion, and its origins in report writing.

There are many other jokes, including the annual Obfuscated Perl contest, which makes an arch virtue of Perl's syntactical flexibility. The following program, which prints a greeting that is modified by a regular expression, is a mild example of this pastime:

 # A sample Perl program
 $_ = "Hello, world! The magic number is 234542354.\n";
 print;
 s/\d+/-1/;
 print;

Here is its output:

 Hello, world! The magic number is 234542354.
 Hello, world! The magic number is -1.

Regular expressions with Perl examples

Regular Expression Description Example
Note that all the if statements return a TRUE value
. Matches an arbitrary character, but not a newline.
$string1 = "Hello World\n";
if ($string1 =~ m/...../) {
  print "$string1 has length >= 5\n";
}
( ) Groups a series of pattern elements to a single element. When you match a pattern within parentheses,

you can use any of $1, $2,

... later to refer to the previously matched pattern.
$string1 = "Hello World\n";
if ($string1 =~ m/(H..).(o..)/) {
  print "We matched '$1' and '$2'\n";
}

Output:
We matched 'Hel' and 'o W';
+ Matches the preceding pattern element one or more times.
$string1 = "Hello World\n";
if ($string1 =~ m/l+/) {
  print "There are one or more consecutive l's in $string1\n";
}
? Matches zero or one times.
$string1 = "Hello World\n";
if ($string1 =~ m/H.?e/) {
  print "There is an 'H' and a 'e' separated by ";
  print "0-1 characters (Ex: He Hoe)\n";
}
? Modifies the *, +, or {M,N}'d regexp that comes before to match as few times as possible.
$string1 = "Hello World\n";
if ($string1 =~ m/(l.+?o)/) {
  print "The non-greedy match with 'l' followed by one or ";
  print "more characters is 'llo' rather than 'llo wo'.\n";
}
* Matches zero or more times.
$string1 = "Hello World\n";
if ($string =~ m/el*o/) {
  print "There is an 'e' followed by zero to many";
  print "'l' followed by 'o' (eo, elo, ello, elllo)\n";
}
{M,N} Denotes the minimum M and the maximum N match count.
$string1 = "Hello World\n";
if ($string1 =~ m/l{1,2}/) {
 print "There exists a substring with at least 1";
 print "and at most 2 l's in $string1\n";
}
[...] Denotes a set of possible character matches.
$string1 = "Hello World\n";
if ($string1 =~ m/[aeiou]+/) {
  print "$string1 contains one or more vowels.\n";
}
| Separates alternate possibilities.
$string1 = "Hello World\n";
if ($string1 =~ m/(Hello|Hi|Pogo)/) {
  print "At least one of Hello, Hi, or Pogo is ";
  print "contained in $string1.\n";
}
\b Matches a word boundary.
$string1 = "Hello World\n";
if ($string1 =~ m/llo\b/) {
  print "There is a word that ends with 'llo'\n";
} else {
  print "There are no words that end with 'llo'\n";
}
\w Matches alphanumeric, including "_".
$string1 = "Hello World\n";
if ($string1 =~ m/\w/) {
  print "There is at least one alphanumeric ";
  print "character in $string1 (A-Z, a-z, 0-9, _)\n";
}
\W Matches a non-alphanumeric character.
$string1 = "Hello World\n";
if ($string1 =~ m/\W/) {
  print "The space between Hello and ";
  print "World is not alphanumeric\n";
}
\s Matches a whitespace character (space, tab, newline, form feed)
$string1 = "Hello World\n";
if ($string1 =~ m/\s.*\s/) {
  print "There are TWO whitespace characters, which may";
  print " be separated by other characters, in $string1";
}
\S Matches anything BUT a whitespace.
$string1 = "Hello World\n";
if ($string1 =~ m/\S.*\S/) {
  print "There are TWO non-whitespace characters, which";
  print " may be separated by other characters, in $string1";
}
\d Matches a digit, same as [0-9].
$string1 = "99 bottles of beer on the wall.";
if ($string1 =~ m/(\d+)/) {
  print "$1 is the first number in '$string1'\n";
}

Output:
99 is the first number in '99 bottles of beer on the wall.'
\D Matches a non-digit.
$string1 = "Hello World\n";
if ($string1 =~ m/\D/) {
  print "There is at least one character in $string1";
  print " that is not a digit.\n";
}
^ Matches the beginning of a line or string.
$string1 = "Hello World\n";
if ($string1 =~ m/^He/) {
  print "$string1 starts with the characters 'He'\n";
}
$ Matches the end of a line or string.
$string1 = "Hello World\n";
if ($string1 =~ m/rld$/) {
  print "$string1 is a line or string";
  print "that ends with 'rld'\n";
}
\A Matches the beginning of a string (but not an internal line).
$string1 = "Hello\nWorld\n";
if ($string1 =~ m/\AH/) {
  print "$string1 is a string";
  print "that starts with 'H'\n";
}
\Z Matches the end of a string (but not an internal line).
$string1 = "Hello\nWorld\n";
if ($string1 =~ m/d\n\Z/) {
  print "$string1 is a string";
  print "that ends with 'd\\n'\n";
}
[^...] Matches every character except the ones inside brackets.
$string1 = "Hello World\n";
if ($string1 =~ m/[^abc]/) {
  print "$string1 does not contain the characters ";
  print "a, b, and c\n";
}

The 'm' in the above regular expressions, for example m/[^abc]/, is not required in order for perl to recognize the expression as a 'match' (cf. 'substitute': s/a/b/); /[^abc]/ could just as easily be used without the preceding 'm'. The 'm' operator can be used to alter the delimiting character; for example, m{/} may be used to enhance the legibility of patterns such as /\//. See 'perldoc perlre http://www.perldoc.com/perl5.8.4/pod/perlre.html ' for more details.

Name

Perl was originally named "Pearl", after "the pearl of great price" of Matthew 13:46. Larry Wall wanted to give the language a short name with positive connotations, and claims he looked at (and rejected) every three- and four-letter word in the dictionary. He even thought of naming it after his wife Gloria. Before the language's official release, Wall discovered that there was already a programming language named Pearl , and changed the spelling of the name.

Several backronyms have been suggested, including the humorous Pathologically Eclectic Rubbish Lister. Practical Extraction and Report Language has prevailed in many of today's manuals, including the official Perl man page. It is also consistent with the old name "Pearl": Practical Extraction And Report Language.

The name is normally capitalized (Perl) when referring to the language, and uncapitalized (perl) when referring to the interpreter program itself. (There is a saying in the Perl community: "Only perl can parse Perl.") It is not appropriate to write "PERL" as it is not an acronym.

Fun with Perl

In common with C, obfuscated code competitions are a popular feature of Perl culture. Similar to obfuscated code but with a different purpose, Perl Poetry is the practice of writing poems that can actually be compiled by perl. This hobby is more or less unique to Perl, due to the large number of regular English words used in the language. New poems are regularly published in the Perl Monks site's Perl Poetry http://www.perlmonks.org/index.pl?node=Perl%20Poetry section.

Another popular pastime is Perl golf. As with the physical sport, the objective is to reduce the number of strokes that it takes to complete a particular objective, but here "strokes" refers to keystrokes rather than swings of a golf club. A task, such as "scan an input string and return the longest palindrome that it contains", is proposed, and participants try to outdo each other by writing solutions that require fewer and fewer characters of Perl source code.

Another tradition among Perl hackers is writing JAPHs, which are short obfuscated programs that print out the phrase "Just another Perl hacker," (including comma).

One of the most bizarre Perl modules is Lingua::Romana::Perligata. This module translates the source code of a script that uses it from Latin into Perl, allowing the programmer to write executable programs in Latin.

Perl humor

  • Perl humour on wikibooks http://wikibooks.org/wiki/Programming:Perl_humour
  • State of the Onion 2003 (Larry Wall on Perl 6) http://www.perl.com/pub/a/2003/07/16/soto2003.html
  • Larry Wall quotes http://www.cmpe.boun.edu.tr/~kosar/other/lwall.html
  • Lingua::Romana::Perligata - Write Perl in Latin! http://search.cpan.org/perldoc?Lingua::Romana::Perligata
  • A tutorial on Perligata http://www.perlmonks.org/index.pl?node_id=253797
  • Perl Purity Test http://www.softpanorama.org/Bulletin/Humor/humor092.html

See also

External links

  • Perl.org http://www.perl.org/ – The Perl Directory
  • Perl.com http://www.perl.com/ – Perl on O'Reilly Network
  • Perldoc http://www.perldoc.com/ – online Perl POD documentation

User groups

  • Perl Mongers http://www.pm.org/ – local user groups in cities worldwide
  • PerlMonks http://www.perlmonks.org/ – an active and popular online user group and discussion forum
  • use Perl; http://use.perl.org/ – Perl news and community discussion

Distributions

  • CPAN http://www.cpan.org/ – Comprehensive Perl Archive Network, Perl source distribution
  • Search the Comprehensive Perl Archive Network http://search.cpan.org/
  • ActiveState http://www.activestate.com/ – Perl for Microsoft Windows platforms
  • IndigoPerl http://www.indigostar.com/indigoperl.htm – another distribution of Perl for Microsoft Windows
  • Windows ports on CPAN http://www.cpan.org/ports/index.html#win32 – more distributions of Perl for Microsoft Windows
  • CPAN ports http://www.cpan.org/ports/index.html – binary distributions for other platforms

Development

  • Perl 5 development http://dev.perl.org/perl5/
  • Perl 6 development http://dev.perl.org/perl6/
  • Parrot virtual machine http://www.parrotcode.org/
  • Project Ponie http://www.poniecode.org/ – Perl 5 running on top of Parrot

History

  • Perl Timeline http://history.perl.org/PerlTimeline.html
  • First reference to "Perl" on Usenet http://groups.google.com/groups?selm=4628%40sdcrdcf.UUCP
  • The origin of Perl http://dev.perl.org/perl1/ – "Stability. Speed. Simplicity. perl1 is here."

Miscellaneous

  • Perl related websites http://dmoz.org/Computers/Programming/Languages/Perl/ in the Open Directory Project
  • Scripting on the Lido Deck by Steve Silberman, Wired Magazine article about Perl Whirl 2000 http://www.wired.com/wired/archive/8.10/cruise_pr.html
  • Interview with Larry Wall on Perl (May 01, 1999) http://www.linuxjournal.com/article.php?sid=3394
  • An example of a golf contest http://www.perlmonks.org/index.pl?node=(Golf)%20Nearest%20Neighbors
  • A script written in Piratese http://www.perlmonks.org/index.pl?node_id=292135
  • Beginner's Tutorial for Perl Language http://www.geocities.com/binnyva/code/perl/tutorial/index.html

Books

Our sister project, Wikibooks, provides an electronic book on Perl.


Major programming languages (more)

Ada | ALGOL | APL | AWK | BASIC | C | C++ | C# | COBOL | Delphi | Eiffel | Fortran | Haskell | IDL | Java | JavaScript | Lisp | LOGO | ML | Objective-C | Pascal | Perl | PHP | PL/I | Prolog | Python | Ruby | SAS | Scheme | sh | Simula | Smalltalk | SQL | Visual Basic




Last updated: 02-08-2005 12:25:36
Last updated: 05-01-2005 23:37:46