TECHNICAL EXPLANATION |
So how do we obtain your information from regular HTML pages? Very simple. We make the HTML page executable (chmod o+x) on our server and these variables are automatically parsed into our page as you access it.
Any example variable in HTML code would like:
<!--#ECHO VAR="REMOTE_HOST" -->
Any time your browser views this page, this tag would be replaced with your REMOTE HOST (somewhere.somebody.net) name. When you view the source of the page, you will not be able to see this variable. Instead you will see the result (somewhere.somebody.net) only.
This page technically is using SSI (Server Side Inlcudes) to parse the information to the output on the fly. But you would ask: Wouldn't the html page's output still be HTML, therefore the variable would be displayed as it is without substitution when it is parsed?
Quite true. But here is what we did:
On Apache Servers, if the server is compiled with XBitHack option, you can enable the execution mod in normal HTML files and the files variables will be substituted accordingly same way SSI pages are handled. But this would put the server in extreme load since all the regular HTML pages have to be executed before sent to the client. Though nothing would change on the output of normal pages, the sever would get excess load which might slow it down. But no worries, there is a way around. Here is what we put into our server configuration file (httpd.conf):
- <Directory /www/htdocs/somebody>
- XBitHack on
- </Directory>
This saves our server from the extra load it would get since it will execute the files only in our main directory, not in subdirectories.
Here is more explanation about what ENVIRONMENT VARIABLEs are, what they do and how they are used. We will mainly refer to CGI (Common Gateway Interface) scripts since this is where the main use of ENVIRONMENT VARIABLEs are.
When the Common Gateway Interface gathers information for you, the amount of information it gathers is extensive-not only the information that's directly related to your application, but information about the current state of the session environment, such as who's executing the program, where they're doing it from, and how they're doing it. In fact, more than a dozen distinct pieces of environment information are available every time a CGI application executes.
To store all this information, the CGI functions of the server place it all into your system's environment variables, allowing persistent global access to this data from anything that cares to take a look at it. Just like you might have a PATH or a HOME environment variable, now you have environment variables telling you what script is being executed and from where it's being executed.
Three distinct sets of environment variables exist, if grouped by purpose. The first one of these groups is called server-specific variables.
Server-Specific Environment Variables
When it records information, the server starts with itself. Server-specific variables record information such as the port the server is running on, the name of the server software, the protocol being used to process requests, and the version of the CGI specification the server conforms to.
Variable Purpose GATEWAY_INTERFACE CGI version that the server complies with. Example: CGI/1.1 _SERVERNAME Server's IP address or host name. Example: www.somebody.net SERVER_PORT Port on the server that received the HTTP request. Example: 80 (most servers). SERVER_PROTOCOL Name and version of the protocol being used by the server to process requests. Example: HTTP/1.0. SERVER_SOFTWARE Name (and, normally, version) of the server software being run. Example: Apache 1.3.b5
In general, the information provided by the server-specific environment variables isn't going to be of much use to your application because it almost always is the same. The real exception to the rule takes place when you have a script that can be accessed by multiple servers or by a server that supports virtual addressing-one server responding to multiple IP addresses. For instance, if your server is of a commercial nature, you might have one machine running several virtual servers on different IP addresses to provide for a unique server name for each customer.
After the server has the chance to describe itself to your program, it moves on to the meat of the information-the components directly related to the user's request.
Request-Specific Environment Variables
Unlike the information about the server, which rarely changes, the information for each request is dynamic, varying not only by which script is called but also by data sent and the user who sent it. At one point or another, all this information may be of use to a script you write, but three basic environment variables are always important to any script: REQUEST_METHOD, CONTENT_LENGTH, and QUERY_STRING. The latter two are used in different situations:
- CONTENT_LENGTH is useful to POST requests for determining input size.
- QUERY_STRING is the data passed when a GET request is used.
The combination of variables tells you how the request was sent, determines how much information was available, and can provide you with the information itself. Unless your script accepts no input, you'll be using these three variables quite a bit.
Variable Purpose AUTH_TYPE Authentication scheme used by the server (NULL if no authentication is present). Example: Basic or Digest. CONTENT_FILE File used to pass data to a CGI program (Windows HTTPd/WinCGI only). Example: c:\temp\324513.dat. CONTENT_LENGTH Number of bytes passed to Standard Input (STDIN) as content from a POST request. Example: 9. CONTENT_TYPE Type of data being sent to the server. Example: text/plain. OUTPUT_FILE Filename to be used as the location for expected output (Windows HTTPd/WinCGI only). Example: c:\temp\132984.dat. PATH_INFO Additional relative path information passed to the server after the script name, but before any query data. Example: /scripts/forms. PATH_TRANSLATED Same information as PATH_INFO, but with virtual paths translated into absolute directory information. Example: /www/htdocs/somebody/forms. QUERY_STRING Data passed as part of the URL, comprised of anything after the ? in the URL. Example: part1=hi&part2=there. REMOTE_ADDR End user's IP address or server name. Example: 207.213.172.210 REMOTE_USER User name, if authorization was used. Example: john REQUEST_LINE The full HTTP request line provided to the server. (Availability varies by server.) Example: GET /info.html HTTP/1.0. REQUEST_METHOD Specifies whether data for the HTTP request was sent as part of the URL (GET) or directly to STDIN (POST). SCRIPT_NAME Name of the CGI script being run. Example: log.cgi.
Of all these variables, REQUEST_METHOD, QUERY_STRING, CONTENT_LENGTH, and PATH_INFO are the most commonly used environment variables. They determine how you get your information, what it is, and where to get it, and they pass on locations that may be needed for processing that data. In the following sections, you look at them in an arbitrary estimation of how often they're used.
REQUEST_METHOD
When you try to determine how data has been sent to your application, the method of the request is the first thing you need to identify. If you're using a form, you can choose which data-sending method is used; if you're using a direct link such as <a href="/scripts/myscript.pl?data"></a>, your script is invoked with the GET method.
Identifying REQUEST_METHOD is necessary for any application except one type-a program that requires no input. If your application is a random-link generator or a link to output a dynamically generated file that doesn't depend on what the user inputs, you don't need to know whether it was sent via GET or POST because your program doesn't require any input. It might want to read the other environment variables, but no input data exists to be parsed, just a semi-fixed output: the end result doesn't depend on any data from the user, just their action of executing it.
Assuming that your CGI application is like many, though, getting the data from the link or user is the next thing on your list of processes. Then you need either QUERY_STRING or CONTENT_LENGTH.
Other possible selections are available for the REQUEST_METHOD value besides just GET and POST, including DELETE, HEAD, LINK, and UNLINK. The use of these other values isn't as common.
QUERY_STRING
The data that's passed when using the GET method is normally designed to be somewhat limited in size, because QUERY_STRING holds all of it in the environment space of the server. When your application receives this data, it comes URL encoded. That means it's in the form of ordered pairs of information, with an equal sign (=) tying together two elements of a pair, an ampersand (&) tying pairs together, a plus (+) sign taking the place of spaces, and special characters that have been encoded as hexadecimal values. A sample from a form with multiple named elements might produce a full request that looks like this:
http://anyone.net/log.cgi?field1=data1&field2=data2+more+data+from+field2&field3=data3The part that comprises QUERY_STRING is automatically chopped to include only that information after the question mark (?). So, for that request, the QUERY_STRING would be as follows:
field1=data1&field2=data2+more+data+from+field2&field3=data3Interpreting this URL-encoded information is easy and just requires a parsing routine in your script to break up these pairs.
CONTENT_LENGTH
When the POST method is used, CONTENT_LENGTH is set to the number of URL-encoded bytes being sent to the standard input (STDIN) stream. This method is useful to your application because no end of file (EOF) is sent as part of the input stream. If you were to look for EOF in your script, you would just continue to loop, never knowing when you were supposed to stop processing, unless you put other checks in place. If you use CONTENT_LENGTH, an application can loop until the number of bytes has been read and then stop gracefully. The formatting that will be read from the STDIN block follows the same URL-encoding methods of ordered pairs and character replacement as QUERY_STRING and can be parsed the same way.
When considering what method (GET or POST) is best suited to your application, consider the amount of data being passed. GET relies on passing all data through QUERY_STRING and thus can be limited in size. For large amounts of data, the STDIN buffer has a virtually unlimited capacity and makes a much better choice.
PATH_INFO
Another thing that you can include in the URL sent to the server is path information. If you place this data after the script but before the query string, your application can use this additional information to access files in alternate locations.
For instance, if you have a script that might need to search in either /docs/november or /docs/december, you can pass in the different paths, and the server automatically knows the location of these files relative to the root data directory for your server. So if you use the URL http://anyone.net/scripts/search.cgi/docs/december?value=abc, the PATH_INFO would be /docs/december. The companion variable PATH_TRANSLATED can give you the actual path to the files based on PATH_INFO, instead of just the relative path. So /docs/december might translate on your server as /www/htdocs/somebody/docs/december. Using this variable saves you the work of having to figure out the path for yourself.
Other Variables
In addition to the primary variables, some other data could come in quite handy in your application. Looking at each individual environment variable is a good idea because you'll become familiar with just what purpose the variable is designed for, as well as what other purpose you could find for it.
You'll automatically know where a user is calling you from because REMOTE_ADDR provides his or her IP address. In case your script forgot, you can see what its name is (by using SCRIPT_NAME). Path information can be passed to your program to reference data files in alternate locations, and you can see the full URL that led someone to the script (by using REQUEST_LINE). Whether you use the information is up to you, but it's there for the taking.
Client-Specific Environment Variables
Last but not least is information that comes from the software from which the user accessed the script. To identify these pieces of information uniquely, the variables are all prefixed with HTTP_. This information gives you background details about the type of software the user used, where he or she accessed it, and so on. Most commonly used client-specific variables: HTTP_ACCEPT, HTTP_REFERER, and HTTP_USER_AGENT.
Variable Purpose ACCEPT Lists what kind of response schemes are accepted by this request REFERER Identifies the URL of the document that gave the link to the current document USER_AGENT Identifies the client software, normally including version information
The formats of these HTTP header variables look like the following:
- HTTP_ACCEPT: */*,image/gif,image/x-xbitmap
- HTTP_REFERER: http://anyone.net/before.html
- HTTP_USER_AGENT: Netscape/4.03 (WinNT, I 32-bit)
These variables open up some interesting possibilities. For instance, certain browsers support special formatting (tables, backgrounds, and so on) that you might want to take advantage of to make your output look its best. You can use the HTTP_USER_AGENT value, for example, to determine whether your script has been accessed using one of those browsers, and modify the output accordingly. However, because some browsers accessing your script may not set the HTTP_USER_AGENT field to a value you're expecting, make sure that you include a default case that will apply if you can't isolate what type of browser is being used.
In addition to the variables listed in table 4.3 are other HTTP environment variables, but you're much less likely to run into browsers that set these fields with any regularity until newer browsers integrate them and people then migrate to the newer browsers. For reference Additional Client-Specific (HTTP_) Environment Variables HTTP_:
Variable Purpose ACCEPT_ENCODING Lists what types of encoding schemes are supported by the client ACCEPT_LANGUAGE Identifies the ISO code for the language that the client is looking to receive AUTHORIZATION Identifies verified users CHARGE_TO Sets up automatic billing (for future use) FROM Lists the client's e-mail address IF_MODIFIED_SINCE Accompanies GET request to return data only if the document is newer than the date specified PRAGMA Sets up server directives or proxies for future use Not every browser fills out the same HTTP_ variables. If you make your application dependent on any, you can run into problems. Be sure to verify support of HTTP_ environment variables for the browsers you're concerned about.
1996-2010 © Somebody Inc |