Identifying Orphan Files
During the time we've been doing business full time online, we've switched hosting companies a few times.
Something we always notice are many files we're uncertain whether or not are still needed.
We do a lot of development, a lot of testing, and sometimes we neglect to remove the test files. This has been going on for 26 years.
The possibly orphan files are brought to our attention when we move our sites to the new hosting account. Are the files adrift with no purpose or do we need them?
To answer that question, we put some JavaScript into each of the orphan files. The JavaScript launches a CGI script that logs it's use.
If an orphan doesn't show up on the log after a reasonable amount of time, it can be removed from the server. With many thousands of files on our server, removing these reduces clutter.
I'll show you how to do it.
Note: This can be done only with files of text, ie web pages and files that might be included in web pages. Image files can't be tagged this way.
Here is the Perl script. Install it on your server with any file name that makes sense to you. The example JavaScript assumes "logger.cgi", but you may use any legal name.
#!/usr/bin/perl use strict; # $FileName may include directory path. my $FileName = 'filename.txt'; sub GetDateTime { my ($second,$minute,$hour,$day,$month,$year) = localtime; $year += 1900; $month++; $month = "0$month" if $month < 10; $day = "0$day" if $day < 10; $hour = "0$hour" if $hour < 10; $minute = "0$minute" if $minute < 10; $second = "0$second" if $second < 10; # Return value can be changed to your preferred format. return "$year/$month/$day at $hour:$minute:$second"; } # sub GetDateTime my $logline = $ENV{QUERY_STRING}; $logline=~ tr/+/ /; $logline =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C",hex($1))/eg; open FILE,">$FileName" unless open FILE,">>$FileName"; print FILE GetDateTime . "\t$logline\n"; close FILE; print "Content-type: text/javascript\n\nvar X".time.';'; # end of file
The above Perl script prints the date and time according to the server and then prints any information sent to it by the JavaScript. The log file is tab-delimited so most popular desktop spreadsheet programs can import it.
You'll notice the Perl script allows you to specify the log file name. Also, the place where the date formating occurs is marked in case you want to change it.
The JavaScript (see below) needs to be pasted into each orphan file you want to monitor.
It has a place where you specify the URL of the above Perl script. And it also has a place where you can identify the individual orphan file the JavaScript is pasted into.
<script type="text/javascript" language="JavaScript"> <!-- Copyright 2006 Bontrager Connection, LLC // First published in Possibilities ezine, // /library/ // // Put identity of this file between the quotes. ('This // file' being the file where this JavaScript is at.) var thisfile = "menu.txt in the root"; // Put URL of logging script between quotes. var url = "http://example.com/cgi-bin/logger.cgi"; // No other modifications needed. thisfile = escape(thisfile); here = escape(document.URL); document.write('<sc'+'ript '); document.write(' type="text/javascript" '); document.write(' language="JavaScript" '); document.write(' src="'+url+'?'+thisfile+'%09'+here+'"'); document.write('</sc'+'ript>'); //--> </script>
The JavaScript sends your identity information about the orphan file to the logging script, along with the URL of the web page where the file was used at.
This provides more than just a heads up for you. If the file was included in a web page, you also know the URL of the web page it was included in.
The identification of files that do not show up in the logs after a reasonable period of time can be considered orphans.
Will Bontrager