CSE2325/3325: Lecture 18

CSE3325: Forms & CGI

In the previous lecture:

Javascript Object Model

In this lecture:

Working with server scripts
Using forms to supply data

Reference:

Stein, L.D.: How To Set Up and Maintain a Web Site, 2nd edn, Addison Wesley 1997, Chpter 8.

What are scripts?

Scripts are external programs run by a server in response to a request from a web browser.
Scripts may accept input parameters from a web browser along with the request to be executed.
Scripts may return output to be displayed in the browser.
Scripts running on a web server add to the WWW the ability to synthesize responses to changing conditions... they add a dynamic aspect to the web.
Scripts can be written in any langauge, interpretted (eg. PERL) or compiled (eg. C).

Common Gateway Interface (CGI)

CGI is an interface or gateway between server and script.
A gateway may link between the web server and a database search engine for example.
CGI compliant scripts will run on CGI compliant servers.
Servers running on Unix, VMS, OS / 2, Windows NT / 95 are CGI compliant.
Old Macintosh web servers (pre OS-X) are not CGI compliant but simple measures allowed scripts to run on them. Current Macintoshes running OS-X may also run the Apache web server which is CGI compliant.
Many scripts are front ends to UNIX programs (such as emailers and text search engines) and so, even though the script itself may run on a non-Unix machine, the back end program doing the work may not run (or even exist)!
Have a look at FastCGI for an alternative that extends and enhances the CGI model:

Enables applications to persist between client requests, eliminating application start up overhead and allowing the application to maintain state between client calls.

Enables applications to reside on remote systems (rather than having to reside on the same system as the Web server)

A Few Examples

A counter telling me that I have re-loaded this page times since 14 Sept 98.
The interface on the Yellow Pages website.
The php web site.

Identifying Requests to Execute Scripts

When a user requests a URL pointing to a script, the server executes the script.
The server can identify the URL as a script rather than a document to be retrieved by
- the directory the URL indicates (frequently .../cgi-bin or a subdirectory)
  
  James, how long until the <A HREF="/cgi-bin/bombTimer"> bomb detonates? </A>
- a unique file extension (frequently .cgi)
  
  James, how long until the <A HREF="blah/bombTimer.cgi"> bomb detonates? </A>
Frequently, scripts are authorized and installed by the system administrator to prevent malicious, careless or ignorant folk from installing programs which may breach security.
Scripts may be run under a special username (eg. www) with no special priveleges. This may help prevent inadvertent or malicious damage.
Scripts may be run within a wrapper as the user who owns the script. Special security checks ensure the server's security is not compromised (cgiwrap and cgiwrapd - see below for examples).

Passing 'Hard Coded' Parameters to Scripts

Parameters may be passed to a script directly through the URL.

<A HREF="/cgi-bin/search?James%20Bond">Where is 007?</A>
The ? is appended to the URL and precedes the query string which constitutes the argument list passed to the script.
The %20 escapes the space character in the search string.
Query strings usually (not necessarily) fall into one of two formats:
- Keyword list in the form:
  
  value1+value2+value3+...
  
  <A HREF="/cgi-bin/search?Secret+Agent+James+Bond">
  Where is 007?
  </A>
  
  This format is often used for scripts which do word searches.
- Named parameter list in the form:
  
  name1=value1&name2=value2&name3=value3...
  
  <A HREF = "/cgi-bin/search?job=Secret%20Agent&name1=James&name2=Bond">
  Where is 007?
  </A>
  
  This format is useful for complex data where various options may or may not be specified depending on conditions and a name must be associated with each datum to determine its meaning.
Path information (such as the path to a file to be searched by a script) may be incorporated into the URL of a script by appending it to the URL.

.../cgi-bin/bombTimer/james0/bombFiles/bomb.txt

After the server has decoded the URL of the script, the additional path information is passed to the script. (The ? and a query string may be appended following the additional path information as usual.)

Passing User - Specified Parameters to Scripts

User input may be gathered using fill-out forms containing text entry boxes, radio buttons etc.
Scripts may create their own fill out forms...
1. Script is called without parameters
2. Script requires parameters so it creates an input document which is despatched to the browser.
3. User enters required data to input document and submits it.
4. Browser calls the script, passing it the contents of the input document as parameters.
5. Script processes data and returns result.
A custom interface may be written to such scripts by creating a fill out form which collects the necessary data and sends it to the script as parameters directly.

Have a look at the interface to the Altavista search engine.

(Have a look at a query in the browser's URL entry box)

Front End to Named Parameter List Scripts

Secret agent name:

Which secret device do you want for your mission?

How many of this secret device did you destroy last mission?

Do you promise not to destroy any more of these devices? Yes No

Secret agent password:

Here's some of the HTML...

<P>Secret Agent Name:
< INPUT TYPE="text" NAME="name">

...

<INPUT TYPE="submit" VALUE="Transmit Order to HQ">
< INPUT TYPE="reset" VALUE="Eat Order">

< /FORM>

The form is marked by <FORM> tags...
The ACTION attribute tells the browser where to send the submitted parameters.
The METHOD attribute specifies the means by which the browser submits information to the script. This can be one of two request methods implemented in HTTP (see Stein p47 for further details):
1. The GET command
  
  tells the server to return an entire document to the browser. This is the command most commonly used when retrieving data from the web. A script call using GET is made by appending the query string to the script's URL.
  
  In some cases, the URL may be truncated to 255 characters - do not use the GET method if you have a lot of parameters to pass or some of the information may not get through to the script.
2. The POST command
  
  tells the server to treat a document as an executable and pass it some information. Using this method the parameters are transmitted between server and client along a communications channel opened especially.
  
  The POST method does not suffer from the "truncation problem".
  
  A well written script should handle both POST and GET submissions.
The INPUT tags denote form elements (text entry boxes, push buttons etc)
The INPUT tag of type="submit" is a button which places the form data into a named parameter list. The parameter names are the names of the form elements, their values are the values of the respective input elements.
The INPUT tag of type="reset" is a button which... I wonder!?
Check out the document source to see how some of the other elements are described. There are more besides! (Refer to an HTML guide)
Netscape recognizes an ACTION attribute:

<FORM ACTION = "mailto:fox.mulder@fbi.org" METHOD = POST>

No prizes for guessing that on submission, this mails the contents of the form to the address given.

Remember Clickable Image Maps?

Originally clickable image maps were implemented using CGI scripts.
The user clicked on an image, the x,y coordinates of the click were sent to a script which read them and returned a URL which was then sent to the web browser which sent the URL back to the server which returned the requested document.
No wonder servers began incorporating the functionality of these scripts!
... a scheme which was further accelerated by the client side image map!

CGI Magic

really

main(int argc, char **argv)
{
    printf("Content-type: text/html\n"); // tell server MIME type of returned doc.
    printf("\n");                        // blank line
	
    printf("<HEAD><TITLE><BR>\n');       // output HTML header info.
    printf("Echo Script Response<BR>\n");
    printf("</TITLE></HEAD><BR>\n");
	
    printf("<BODY>\n<P>\n");             // output HTML body echoing the
    printf("%s", getenv("QUERY_STRING"));// environment variable QUERY_STRING
    printf("</BODY>\n");
}

The above script echoes the input sent to it.
The script receives its input in the environment variable QUERY_STRING after submission from a form using METHOD=GET.
The PATH_INFO environment variable contains any path information appended to the the URL.

cgi-wrap

Let's call the same script again (from the same form) but this time, the ACTION attribute of the FORM tag will call the script via cgi-wrapd...

An additional note for completeness...

Parameters can be read into a C program where the form applies the POST method, like this (see the Stanford site from which this info. originates for details):

"The POST query string is encoded in precisely the same form as the GET query string, but instead of being passed in the URL and read into the QUERY_STRING variable, it is given to the CGI program as standard input, which you can thus read using ANSI functions or regular character reading functions. The only quirk is that the server will not send EOF at the end of the data. Instead, the size of the string is passed in the environment variable CONTENT_LENGTH, which can be accessed using the normal stdlib.h function:"

char *value;
int length;
value = getenv("CONTENT_LENGTH");
sscanf(value, "%d", &length);

This lecture's key point(s):

CGI defines a standard way to transfer information from the client-side (Browser) to the server.
Forms provide standard data entry and selection widgets for users to submit data.
CGI scripts can generate dynamic or query specific web pages 'on the fly'

CSE3325 courseware | CSE3325 lecture notes