Directory Blocked to All but Specific Software
It's possible to give your software access to directories that are blocked to everybody else.
The link checker you run from your desktop computer is an example. It may need to get into the protected directory to do its job.
Your own scripts are another example. If you can spoof the user-agent string you can use the method described here.
A Python script on my desktop downloads backup files from our server. It needs access to an otherwise locked directory on the server to download the files. The script uses the method described in this article, which also can be used by PHP and other programming languages.
The method uses the .htaccess file to allow access for bots and browsers that identify themselves in a certain way.
Articles in the Willmaster library describe other .htaccess-related restrictions and authorizations — access to a specific file, blocking a specific file, access restricted by IP address, and blocking all. In case one of those better fits your needs, here are links to the articles.
-
Access to One File in Locked Directory (access to a specific file) — Lock all browsers and bots out of the directory except for access to a specific file or files.
Blocking WP Login Snooping Bots (blocking a specific file) — Any particular file or files can be blocked. The article talks about the file for WordPress login. Access to any file in the directory can be blocked with the same method.
-
Uncrackable Directory Access (access restricted by IP address) — All browsers and bots are locked out of the directory except those with the IP address or IP addresses you specify.
-
Effective Block for Browsers and Bots (blocking all) — This can be used to block all access to all files in the directory.
The method describe in this article allows access to a directory by software providing specific identification. An example of how to make a PHP script identify itself in a certain way is further below.
The .htaccess File Code
Here is the content for the .htaccess file of the directory being restricted to software with specific identification.
SetEnvIf User-Agent ^My_special_agent$ you_passed=1 Order Deny,Allow Deny from all Allow from env=you_passed
In the above code, My_special_agent
is the identification and you_passed
is the name of a variable.
How it works is when the software identifies itself as My_special_agent
, then the you_passed
variable is set to 1 (which, in software, can also mean set to true). The next two lines deny access to everybody. The last line bypasses the previous restrictions and allows access if the you_passed
variable is true.
Several things about My_special_agent
in the .htaccess file:
-
Immediately before
My_special_agent
is a^
caret symbol. It makes identification start there. It isn't required. However, without the caret symbol in that position, there would be no beginning marker andAnother_My_special_agent
would also matchMy_special_agent
-
Immediately after
My_special_agent
is a$
dollar symbol. It makes identification end there. It isn't required. However, without the dollar symbol in that position, there would be no ending marker andMy_special_agent_here
would also matchMy_special_agent
-
There are no spaces in the
My_special_agent
identification. If spaces are desired instead of underscore characters, the two-character\s
set represents a space.My_special_agent
would then becomeMy\sspecial\sagent
.
The variable name you_passed
may be changed so long as it is changed at every place it exists in the .htaccess code.
With the above, the user-agent identification of HTTP software can be specified to allow that software to access the directory — change My_special_agent
as needed.
If more than one user-agent identification needs to be specified, repeating the first line of the .htaccess code will do the trick (and the identification updated accordingly, of course). Example:
SetEnvIf User-Agent ^My_special_agent$ you_passed=1 SetEnvIf User-Agent ^Another\sagent$ you_passed=1 SetEnvIf User-Agent ^Special_4_Will$ you_passed=1 Order Deny,Allow Deny from all Allow from env=you_passed
With the above in the .htaccess file, software with any one of the three specified user-agent identifications will have access to the directory,
PHP Script Identifying Itself
PHP can identify itself by specifying a user agent string as its identification.
This PHP script identifies itself as My_special_agent
when it asks for the http://example.com/secret/test.php
document.
<?php $UserAgent = "My_special_agent"; $URL = "http://example.com/secret/test.php"; $ch = curl_init($URL); curl_setopt_array($ch,array(CURLOPT_USERAGENT=>$UserAgent)); echo curl_exec($ch); ?>
Two conditions need to exist so the above script can access the document it wants.
-
URL
http://example.com/secret/test.php
needs to be a valid URL, not a 404 Not Found. -
The .htaccess file in the
/secret
directory at domainexample.com
needs to allow access to software withMy_special_agent
identification.
When those two conditions are met, the above PHP script will print the content it receives from http://example.com/secret/test.php
.
Implementing an Example
To test the above, do these steps:
-
Create a subdirectory on your server. Upload an
.htaccess
file into the subdirectory that is composed of the code from the first code box in the .htaccess File Code section of the article. -
Upload this PHP script into the restricted subdirectory you created in step 1 and name the script test.php
<?php echo $_SERVER['PHP_SELF']; ?>
(As a test to verify the restricted directory is indeed restricted, try to access the URL of test.php with your browser. If the directory is restricted, you will get a "Forbidden" message.)
-
Copy the PHP code in the PHP Script Identifying Itself section further above.
-
Change
http://example.com/secret/test.php
to the URL of test.php you uploaded in step 2. -
Name the script
testpoint.php
and upload it to your server into any directory other than the restricted directory you created in step 1.
-
-
Everything is ready to test. Load the URL of
testpoint.php
into your browser. It should print the document location of thetest.php
file in the restricted directory.
Any software you can modify to send specific identification when requesting pages with HTTP or HTTPS can access a URL to your restricted directory — if you so allow it.
For other software, like link checkers, you'll need to know how they identify themselves. One way to find out is to check your server's access logs. Find a URL that your link checker accessed. The line in the log is likely to contain the user-agent string the software identifies itself as.
(This article first appeared in Possibilities newsletter.)
Will Bontrager