Page Load Count Accuracy
One would think counting page loads is a trivial task. Simply increment a count whenever a browser loads the page.
Thinking about it a bit, one realizes the mechanism that records page loads is likely to be less than 100% accurate.
What is a Page Load?
Introducing more uncertainty, the definition of a page load is nebulous.
When a page begins to load and is then interrupted (the browser window closed or a different page starts to load), was it a page load? Some say yes. Others say no. Still others say it depends.
If almost all of the content loads before the interruption, is it then a page load?
How about if the page content just barely starts to appear on the page when the user clicks on a link causing a different page to start loading. Was the first page then a page load?
Then there are page loads by software other than browsers (like content grabbers and SE spiders). Is it a page load only when a human is present to see it?
Some people consider page reloads to be page loads. Others do not.
This is a situation where you'll need to determine for yourself what a page load means to you. Deciding what a page load is and using that definition consistently will allow you to see trends and to spot significant achievements.
Several Ways to Count Page Loads
All of the methods described below rely on logging page requests or page load events. Some are more reliable than others. Similarly, some easier to implement than others.
Whichever method you decide to use, when used consistently, can provide valuable trend information even if the numbers aren't entirely accurate according to your definition of page load.
Scanning Server Access Logs
Scanning the server access log can provide accurate counts of requests for pages.
All page load and reload requests are recorded in the log, except those loaded into the browser from cache on local hard drives. A "request" is, for this article, the server log entry when any software asks the server for a web page.
When logged page requests are used as a guide for determining number of page loads, consider the following:
-
Log entries when page requests are not completed (the page load might be interrupted immediately after the request is made) could be considered invalid. Yet, it is nearly impossible to identify those entries.
-
When pages are loaded from cache there is no log entry.
-
Spiders and other page requesting software will cause a log entry to be made.
Whatever software is used to scan the server access log, it will need to filter out all requests that are not requests for web pages.
The access log can provide a page request count, not a page load count. A page request is when a browser (or other software) asks for a web page. A page load is when the page is received (or whatever definition you've decided upon).
A real-world example: About half a year ago willmaster.com went past the 20,000 page average per day mark, according to Awstats software (which filtered out known spiders and robots). Yet, when I scanned the logs, I realized many of the page request entries, at least 10%, was our own software in the act of retrieving templates and other files to create and deliver composite web pages on-the-fly.
I did not want to count those retrievals as page loads. Therefore, the "20,000 celebration" had to wait.
PHP Page Load Logs
When PHP code within the web page is used to record when that page loads, the log is updated before the page is delivered to the browser. The PHP code is run every time the page is loaded from the server.
The PHP code is run immediately after the server gets the page request. Therefore, the log will come up with nearly the same count as the server access log web page request count.
This method has the same three inaccuracy considerations as the server log request count method (above).
SSI Launches a Counter Script
Using SSI to launch a counter script to record the page load has a similar effect as using PHP code. The script is run before the page is delivered to the browser, every time the page is loaded from the server.
Like the PHP method, this page load log will come up with nearly the same count as the server access log web page request count. The script is run immediately after the server gets the page request.
This method also has the three inaccuracy considerations as the server access log request count method.
Image Launches a Counter Script
An image tag can be used to launch a page log counter script. The script's URL is the value of the <img... tag's src attribute. The script returns an image after it's logged the page load.
This method kicks in only after the web page arrives at the browser. The location of the image tag in the web page source code determines how soon during the page load the counting script will run.
The count is subject to images being turned off. Also, if the browser caches images, reloads will not affect the count. (The no-cache meta tag might prevent the image from caching, causing otherwise missed reloads to be counted.)
The placement of the image, near the top of the page or near the bottom, can have an effect on the count when a page load is interrupted.
JavaScript Launches a Counter Script
JavaScript loads a page log counter script. Use a src attribute in the script tag to specify the script's URL.
Like the image-launched script, this method kicks in only after the web page arrives at the browser. The location of the script tag in the web page source code determines how soon during the page load the counting script will run.
The placement of the JavaScript, near the top of the page or near the bottom, can have an effect on the count when a page load is interrupted.
Using JavaScript requires browsers to have JavaScript enabled. Otherwise, no page load is counted.
Also, reloads probably will not be counted because many browsers default to caching JavaScript.
The Best Method
The method that may be the hardest/most expensive to implement, scanning server logs, may also be the most accurate. It takes sophisticated software to scan the logs and extract only pertinent information.
The PHP and SSI methods are also highly accurate depending on your definition of page load.
If your definition of page load says a page load is counted only when the page is completely loaded, then the image launch method may be best.
The method that may be the easiest to implement is also the least accurate. A small percentage of browsers have JavaScript turned off, making this easy method less accurate by that percentage than an image-launched counter might be.
The JavaScript method may be easiest because no attention needs be paid regarding special file name extensions for web pages, unlike SSI and PHP. And the counter script does not need to reply with an image, like an image-launched counter script would need to.
The most accurate and the easiest are at opposite ends of the pole, in this case.
Will Bontrager