Software, your way.
burger menu icon
WillMaster

WillMaster > LibrarySnooping (Information Retrieval)

FREE! Coding tips, tricks, and treasures.

Possibilities weekly ezine

Get the weekly email website developers read:

 

Your email address

name@example.com
YES! Send Possibilities every week!

Spider Spoof Detection

Many spiders identify themselves. Others, quite a few, actually, spoof their identity and pretend to be a regular browser.

The spoofers can create havoc with site visitor page view and click-through statistics. Not to mention following no-follow links, submitting forms and, in general, snooping where they have no business being.

This article will help you identify spiders, even spiders that pretend to be a regular browser. It does not describe how to put the kibosh on them (misdirect, ban, …) because every site and situation is different. But it does identify all who wander into a special link trap.

The special link trap is in a div or paragraph with a CSS display:none declaration. To sooth the suspicion of especially wary spiders, JavaScript is present to change the CSS display property, albeit for a highly unlikely situation.

The special link, which no site visitor with a modern browser will ever see (unless they view source code like a spider does), leads to a script that logs the IP address of the spider and the user-agent string it is presenting as its identity.

This is the special link, which can be put anywhere on a PHP web page (with correct link URL):

<p id="link-display" style="display:none;">
<a href="https://example.com/special.php?<?php echo($_SERVER['PHP_SELF']) ?>">A special link</a>
</p>
<script type="text/javascript">
if(location.search=="?unlikely-to-ever-happen") { document.getElementById("link-display").style.display="block"; }
</script>

You'll see the link to the special.php logging script within a p tag having a display:none; CSS declaration. It has a bit of PHP code to insert the URL of the current web page.

Below the paragraph is the suspicion-soothing JavaScript. It will actually display the special link to the site visitor if they arrive with ?unlikely-to-ever-happen appended to the URL in the browser's address bar. It is virtually unlikely to happen unintentionally.

The special.php logging script needs to be installed so the special link can identify spiders.

Here is the source code for the PHP script. Comments follow.

<?php
/*
Access Log Intended to Identify Spiders
Version 1.0
April 25, 2020
Will Bontrager Software LLC
https://www.willmaster.com/
*/

/* CUSTOMIZATIONS */
/* Two places to customize. */

// Place 1 --
// Specify the location of the log file 
//    to record access to this script. 
//    The file is CSV formatted.

$LogFileLocation = "spiderTrapLog.csv";


// Place 2 --
// Between the lines containing the PAGECONTENT 
//    word, specify the content to show to 
//    the spider when it follows the special 
//    link. HTML markup may be used.

$PageContent = <<<PAGECONTENT
<p>
This is a page.
</p>
PAGECONTENT;

/* END OF CUSTOMIZATION */

$LogLIne = array();
$LogLine[] = date('r');
$LogLine[] = $_SERVER['REMOTE_ADDR'];
$LogLine[] = str_replace('"','""', (isset($_SERVER['QUERY_STRING']) ? rawurldecode($_SERVER['QUERY_STRING']) : '') );
$LogLine[] = str_replace('"','""',$_SERVER['HTTP_USER_AGENT']);
file_put_contents( $LogFileLocation, '"'.implode('","',$LogLine)."\"\n", FILE_APPEND );
echo $PageContent;
exit;
?>

Customizations —

Two places need to be customized.

1.
Replace spiderTrapLog.csv with the location for your log file. It is a CSV file, so give it a .csv file name extension.

2.
Between the lines with the word PAGECONTENT, Replace
<p>
This is a page.
</p>

with whatever content you wish the spiders to see when they follow the link trap. If you wish, you can leave it as is.

Upload the customized PHP script to your server in a place where a browser can access it. Name it special.php or other .php name you prefer.

Replace https://example.com/special.php in the special link code with the URL to the customized PHP script you uploaded.

Put the special link code on one or more pages of your website.

The special links can be tested by putting the web page into your browser. Append ?unlikely-to-ever-happen to the URL in the browser's address bar. When the page reloads, the special link should be visible to click on for testing.

You are now in position to detect spiders to the pages with the special link code. The user-agent string in the log file can tell you which are spoofing themselves as a regular web browser.

(This article first appeared with an issue of the Possibilities newsletter.)

Will Bontrager

Was this article helpful to you?
(anonymous form)

Support This Website

Some of our support is from people like you who see the value of all that's offered for FREE at this website.

"Yes, let me contribute."

Amount (USD):

Tap to Choose
Contribution
Method

All information in WillMaster Library articles is presented AS-IS.

We only suggest and recommend what we believe is of value. As remuneration for the time and research involved to provide quality links, we generally use affiliate links when we can. Whenever we link to something not our own, you should assume they are affiliate links or that we benefit in some way.

How Can We Help You? balloons
How Can We Help You?
bullet Custom Programming
bullet Ready-Made Software
bullet Technical Support
bullet Possibilities Newsletter
bullet Website "How-To" Info
bullet Useful Information List

© 1998-2001 William and Mari Bontrager
© 2001-2011 Bontrager Connection, LLC
© 2011-2024 Will Bontrager Software LLC