Software, your way.
burger menu icon
WillMaster

WillMaster > LibraryWebsite Owner Tools

FREE! Coding tips, tricks, and treasures.

Possibilities weekly ezine

Get the weekly email website developers read:

 

Your email address

name@example.com
YES! Send Possibilities every week!

Ignoring Bots

What if bots contaminated your statistics?

You would be uncertain about which pages are the most popular with real people. Page load counts would be bloated, perhaps providing a false sense of security. You would be uncertain about which links real people actually click.

(It is understood that statistics software can filter out bots that identify themselves as such, but generally not all that masquerade as a browser.)

What if every bot that accessed your web site had a real out-of-pocket cost?

That out-of-pocket cost thing is what precipitated the comprehensive bot filter software this article has for you. Virtually all bots are ignored, whether or not they identify themselves as such.

The Situation That Precipitated the Bot Filter

A website uses a certain IP address geolocation information service. Suddenly, service drastically reduced the number of free lookups a site can make. The site's traffic, with bots included, is higher than the new free threshold.

That's why the bot filter was built in the first place, to keep the number of IP address lookups for that site as low as possible.

At Willmaster, we use the filter for logging web page loads. And that is what the filter in this article does.

How the Bot Filter Works

A lot of bots visit websites, more bots than a person would think, many masquerading as a real browser.

Site scrapers come by once in a while, sometimes also pretending to be a real browser. I frequently have the thought everybody and their Aunt must be building databases — for ad delivery, for selling statistics, for deep data mining, or perhaps with hopes of building another Google.

And, of course, there are legitimate search engine spiders.

The filtering challenge was to ignore all bots while letting through all real browsers used by real people. And to do it silently (no CAPTCHA stuff).

The newly-created bot filter assumes a page request is made by a real browser when these two conditions are both true:

  1. The browser handles cookies.

    Some specialized bots handle cookies. Most don't.

  2. The browser runs JavaScript.

    Most bots don't, although some may parse web page source code for URLs within JavaScript.

When those two tests are passed, it is assumed to be a real browser for real people. The rest are assumed to be bots and are ignored.

The filter is built with JavaScript, a function that uses Ajax code.

When the JavaScript filtering tests are passed, Ajax calls the PHP script, which for this example updates a log file. If you wish, the PHP script can be changed to do something else entirely.

The Bot Filter Source Code

As noted, the filter is JavaScript. Put it anywhere on your page. If it must run before the page is nearly finished loading, put it near the top of the page. Otherwise, put it near the bottom, such as immediately above the cancel </body> tag.

<script type="text/javascript">
function BotFilter()
{
/*
Bot Filter, Version 1.0, 5 April 2018
Will Bontrager Software LLC
http://www.willmaster.com/
*/
    // Check if client accepts cookies.
    if( ! navigator.cookieEnabled ) { return; }

    // Because some bots may spoof that they accept 
    //    cookies, we'll do an actual cookie test.
    document.cookie = "Test_cookie=TESTING";
    if( document.cookie.indexOf("Test_cookie") < 0 ) { return; }

    // At this point, we know the client accepts cookies.

    // Try to open connection to the server.
    if( ! (http=new XMLHttpRequest()) ) { return; }

    // What to do if the server request is successful 
    //    and the PHP script responds.
    http.onreadystatechange = function()
    {
        if(http.readyState == 4)
        {
            if(http.status == 200)
            {
                if(http.responseText=="OK")
                {
                    // Optionally, have the JavaScript do something after 
                    //    tests are passed and the PHP script returns "OK".
                    //    Setting a cookie comes to mind.
                }
            }
          else { alert('\n\nContent request error, status code:\n'+http.status+' '+http.statusText); }
        }
    }

    // Make the request for the PHP script.
    http.open("POST","/location/of/PHP/logger.php",true);
    //    Send the correct header line for a POST request.
    http.setRequestHeader("Content-type", "application/x-www-form-urlencoded");
    //    Send "thispage" with current web page URL.
    http.send("thispage="+encodeURIComponent(document.URL));
}
// Call the BotFilter() function.
BotFilter();
</script>

The JavaScript is well commented so you can see what different sections of the code are doing.

You'll see where you can optionally add JavaScript to do things after Ajax calls the PHP script and the PHP script returns "OK".

Half a dozen lines from the bottom of the JavaScript, you'll see the /location/of/PHP/logger.php URL. Replace that URL with the URL to your PHP script (see next section).

The PHP Script

The JavaScript calls this PHP script. Put it anywhere on the server that's accessible by URL. Name it logger.php or whatever file name is best for your implementation.

<?php
/*
Page Access Logger, Version 1.0, 5 April 2018
Will Bontrager Software LLC
http://www.willmaster.com/
*/

// If no $_POST['thispage'] value, it's a direct access 
//   of the script, perhaps by a bot. Print 0 and exit.
if( empty($_POST['thispage']) ) { echo(0); exit; }

// Log the page access with date-time stamp.
$LogFileName = $_SERVER['DOCUMENT_ROOT'] . '/subdirectory/logfile.txt';
file_put_contents($LogFileName,date('r')."\t{$_POST['thispage']}\n",FILE_APPEND);

// Print "OK" and exit.
echo "OK";
exit;
?>

The PHP script is also well commented so you can see what different sections of the code are doing.

Half a dozen lines from the bottom of the PHP script, you'll see the /subdirectory/logfile.txt specification for the log file location. Update that location with where your log file is to be.

Optionally, the PHP script can be modified to do other things than, or in addition to, logging the URL of the web page that called it via the Ajax function.

You now have the entire filter system for ignoring bots. As mentioned, functionality can be added and changed according to your requirements.

(This article first appeared with an issue of the Possibilities newsletter.)

Will Bontrager

Was this article helpful to you?
(anonymous form)

Support This Website

Some of our support is from people like you who see the value of all that's offered for FREE at this website.

"Yes, let me contribute."

Amount (USD):

Tap to Choose
Contribution
Method

All information in WillMaster Library articles is presented AS-IS.

We only suggest and recommend what we believe is of value. As remuneration for the time and research involved to provide quality links, we generally use affiliate links when we can. Whenever we link to something not our own, you should assume they are affiliate links or that we benefit in some way.

How Can We Help You? balloons
How Can We Help You?
bullet Custom Programming
bullet Ready-Made Software
bullet Technical Support
bullet Possibilities Newsletter
bullet Website "How-To" Info
bullet Useful Information List

© 1998-2001 William and Mari Bontrager
© 2001-2011 Bontrager Connection, LLC
© 2011-2024 Will Bontrager Software LLC