Writing Your Own Form Handling Scripts (Perl CGI), Part I
If you've been wanting to write form handling scripts with Perl CGI, this article is for you.
For demonstration, we'll use a form with a hidden field, text field inputs, a textarea, radio buttons, a drop-down list box, and checkboxes.
This tutorial is in four parts:
- How to put information from the form into the script.
- The example form.
- Beginning the script.
- Getting the info into the script.
- Verifying email addresses are formatted correctly, displaying error messages, and an exit subroutine.
- Some security issues and how to handle them:
- Referrer check.
- GET versus POST.
- Form content length.
- Embedded HTML.
- The script up to this point.
- How to store the form information in a database file on your server -- in any plain text format, including tab- and comma-delimited formats that can be imported into Excel and other spreadsheet and database programs.
- How to send the form information to yourself in an email -- formatted however you please, including HTML.
- How to personalize the "thank you"/confirmation page for your form user.
Okay, let's get right to it.
I. How to put information from the form into the script.
a. The example form.
Below is the example form we'll use for this tutorial. (It is recommended, although not strictly necessary, that you know how to make forms before pursuing this tutorial. See the 2-part "HTML Form Tutorial")
<form method="POST" action="myscript.cgi"> <input type="hidden" name="Subject" value="testing..."> Name: <input type="text" name="username"> <br><br> Email: <input type="text" name="email"> <br><br> Gender: <input type="radio" name="gender" value="female">Female <input type="radio" name="gender" value="male">Male <br><br> Favorite Color (select one): <select name="favorite color"> <option value="white">White</option> <option value="red">Red</option> <option value="yellow">Yellow</option> <option value="blue">Blue</option> <option value="purple">Purple</option> <option value="orange">Orange</option> <option value="green">Green</option> <option value="black">Black</option> </select> <br><br> Favorite Color Runner-ups (select any):<br> <input type="checkbox" name="c2" value="white">White<br> <input type="checkbox" name="c2" value="red">Red<br> <input type="checkbox" name="c2" value="yellow">Yellow<br> <input type="checkbox" name="c2" value="blue">Blue<br> <input type="checkbox" name="c2" value="purple">Purple<br> <input type="checkbox" name="c2" value="orange">Orange<br> <input type="checkbox" name="c2" value="green">Green<br> <input type="checkbox" name="c2" value="black">Black<br> <br> Your Message:<br> <textarea name="message" cols="22" rows="5"></textarea> <br><br> <input type="submit" value="Send Form Info"> </form>
b. Beginning the script.
It is highly recommended that you have a Perl reference available. This tutorial will not attempt to explain basic Perl concepts except within the context of the subject at hand.
If you do not have a programming background, I recommend "Perl for Dummies (with CD ROM)" by Paul Hoffman. The CD contains a Perl interpreter to install on your computer. (A similar title by the same author, "Perl 5 for Dummies," is also on bookshelves. I've never read that one, so can't make a recommendation.)
For learning Perl, I also recommend "Learning Perl" by Randal L. Schwartz & Tom Christiansen.
As a reference, "Programming Perl" by Larry Wall, Tom Christiansen & Randal L. Schwartz can't be beat, at least in my opinion. My copy is dog-eared and the covers are curling. I use it every day.
Begin your form handling script with these four lines and save it as "myscript.cgi":
#!/usr/bin/perl # By [your name here] use strict; my %In = ();
The first line is the location of Perl on your server and the second line is a comment where you put your name as author, date, version number, whatever information you want recorded with the script. You may use additional comment lines as needed.
The third line specifies that only lexical variables are used in the script (unless specifically named with "use vars"). Lexical variables are preceded with the word "my" when declared or the first time they are used. "use strict" is a great typo catcher; if you mistype a variable name, it will generate an error message.
The fourth line declares the variable %In and ensures the variable is empty. This is the variable that will be used to hold the information submitted by the form.
Use "use strict" and "my" variables. It will save you a lot of grief from chasing elusive bugs.
c. Getting the info into the script.
When the form is submitted to your script, the variable %In will contain the form information. (%In could be named %FORM or any other name beginning with "%" -- I use %In because it represents information brought in.)
Here is a subroutine that will put form information into the variable %In
sub Parse { my $buffer; if ($ENV{REQUEST_METHOD} eq 'GET') { $buffer = $ENV{QUERY_STRING}; } else { read(STDIN,$buffer,$ENV{CONTENT_LENGTH}); } my @p = split(/&/,$buffer); foreach(@p) { $_ =~ tr/+/ /; my ($n,$v) = split(/=/,$_,2); $n =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C",hex($1))/eg; $v =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C",hex($1))/eg; $v =~ s/(\<.*?)(embed|object|script|applet)(.*?\>)/$1$3/gis; if($In{$n}) { $In{$n} .= "\t$v"; } else { $In{$n} = $v; } } } # sub Parse
To call the subroutine, use:
Parse();
The working program, complete with all of today's code, is printed near the end of this article.
d. Verifying email addresses are formatted correctly, displaying error messages, and an exit subroutine.
The subroutine ValidEmail returns 1 if the email address it was given is correctly formatted. Otherwise, it returns 0.
The subroutine ErrorHTML creates a web page with the error message(s) it was given, the messages in an unordered list.
The subroutine Exit merely calls Perl's built-in exit function. Exit is a placeholder in case your system requires a different exit routine (if your system is mod_perl, for example). Whenever your script needs to exit, call this Exit subroutine instead of the Perl exit function. That way, if you need a different exit routine sometime in the future, you do not have to search the entire script for exit calls; you merely modify this one Exit subroutine. (Note that the ErrorHTML subroutine calls the Exit subroutine.)
Here are the three subroutines:
sub ValidEmail { if($_[0]=~/([\.\-\_]{2,})|(@[\.\-\_])|([\.\-\_]@)|(\A\.)/) { return 0; } if($_[0]=~/^[\w\.\-\_]+\@\[?[\w\.\-\_]+\.([\w\.\-\_]{2,3}|[0-9])\]?$/) { return 1; } return 0; } # sub ValidEmail sub ErrorHTML { my $s = join("\n<li>",@_); print "Content-type: text/html\n\n"; print <<HTML; <html><body bgcolor="white"> <blockquote><blockquote> <h4>Message:</h4> <ul> <li>$s </ul> </blockquote></blockquote> </body></html> HTML Exit(); } # sub ErrorHTML sub Exit { exit; }
Here are examples of use:
unless($In{email}) { ErrorHTML('An email address is required.\ } unless(ValidEmail($In{email})) { ErrorHTML('Sorry, invalid email address format.\ }
Note: $In{email} contains the email address provided on the form.
e. Some security issues and how to handle them.
i. Referrer check.
You'll probably want to restrict your form's use to your own domain. The following code will present an error message unless the form being used is on your server with your domain's URL:
my $AuthorizedDomain = 'mydomain.com'; my $FormDomain = lc $ENV{HTTP_REFERER}; $FormDomain =~ s!^https?://(?:www\.)?(.*?)(?:/.*)$!$1!; unless($FormDomain eq lc $AuthorizedDomain) { ErrorHTML('Unauthorized access.\ }
Replace mydomain.com, above, with your own domain name.
ii. GET versus POST.
Method GET creates URLs with a ? followed by data, like http://domain.com/script.cgi?one=this&two=that
The Parse subroutine, above, accepts information with method GET as well as method POST (method POST transmits the information in a way not visible in the URL).
In some situations, GET is preferred, such as when you want the information visible within the URL, in which case you would change your example form's FORM tag from method="POST" to method="GET". However, if you won't need method GET, then replace the above Parse subroutine with the following ParsePost subroutine. ParsePost will process only method POST information. It will return a 1 if method POST information is available and return 0 otherwise.
sub ParsePost { return 0 unless $ENV{REQUEST_METHOD} =~ /POST/i; my $buffer; read(STDIN,$buffer,$ENV{CONTENT_LENGTH}); my @p = split(/&/,$buffer); foreach(@p) { $_ =~ tr/+/ /; my ($n,$v) = split(/=/,$_,2); $n =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C",hex($1))/eg; $v =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C",hex($1))/eg; $v =~ s/(\<.*?)(embed|object|script|applet)(.*?\>)/$1$3/gis; if($In{$n}) { $In{$n} .= "\t$v"; } else { $In{$n} = $v; } } return 1; } # sub ParsePost
You can call the above subroutine with:
unless(ParsePost()) { ErrorHTML('Unauthorized access.\ }
iii. Form content length.
Most browsers have a maximum content size of 32k. However, there may still be situations where you want to limit the length of the content. The following will keep the first 250 characters of the form textarea name="message" whenever more than 250 characters are entered:
if(length($In{message}) > 250) { $In{message} = substr($In{message},0,250); }
iv. Embedded HTML.
In the Parse and ParsePost subroutines, you'll find the line:
$v =~ s/(\<.*?)(embed|object|script|applet)(.*?\>)/$1$3/gis;
That line deletes the HTML tag names "embed," "object," "script," or "applet" whenever found between angle brackets. If you prefer to delete all HTML tags (everything between angle brackets, including the angle brackets themselves), use this line instead:
$v =~ s/(\<.*?\>)//gs;
f. The script up to this point.
Here is the form handling script up to this point:
#!/usr/bin/perl # By [your name here] use strict; my $AuthorizedDomain = 'mydomain.com'; my %In = (); my $FormDomain = lc $ENV{HTTP_REFERER}; $FormDomain =~ s!^https?://(?:www\.)?(.*?)(?:/.*)$!$1!; unless($FormDomain eq lc $AuthorizedDomain) { ErrorHTML('Unauthorized access.\ } unless(ParsePost()) { ErrorHTML('Unauthorized access.\ } unless($In{email}) { ErrorHTML('An email address is required.\ } unless(ValidEmail($In{email})) { ErrorHTML('Sorry, invalid email address format.\ } if(length($In{message}) > 250) { $In{message} = substr($In{message},0,250); } ErrorHTML('Script paused here.\ # temporary line Exit(); sub ParsePost { return 0 unless $ENV{REQUEST_METHOD} =~ /POST/i; my $buffer; read(STDIN,$buffer,$ENV{CONTENT_LENGTH}); my @p = split(/&/,$buffer); foreach(@p) { $_ =~ tr/+/ /; my ($n,$v) = split(/=/,$_,2); $n =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C",hex($1))/eg; $v =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C",hex($1))/eg; $v =~ s/(\<.*?)(embed|object|script|applet)(.*?\>)/$1$3/gis; if($In{$n}) { $In{$n} .= "\t$v"; } else { $In{$n} = $v; } } return 1; } # sub ParsePost sub ValidEmail { if($_[0]=~/([\.\-\_]{2,})|(@[\.\-\_])|([\.\-\_]@)|(\A\.)/) { return 0; } if($_[0]=~/^[\w\.\-\_]+\@\[?[\w\.\-\_]+\.([\w\.\-\_]{2,3}|[0-9])\]?$/) { return 1; } return 0; } # sub ValidEmail sub ErrorHTML { my $s = join("\n<li>",@_); print "Content-type: text/html\n\n"; print <<HTML; <html><body bgcolor="white"> <blockquote><blockquote> <h4>Message:</h4> <ul> <li>$s </ul> </blockquote></blockquote> </body></html> HTML Exit(); } # sub ErrorHTML sub Exit { exit; }
The "Script paused here." message in the above code will be removed when the script is completed.
Part II of this series shows you how to take the information in the %In variable and email it to yourself (or to anybody else).
See you then :)
Will Bontrager