Keeping Test Pages Out Of Search Engine Indexes
If you develop web pages for clients or friends, you'll want to keep the development and test pages out of search engine indexes. There are a couple reasons to keep the pages out of indexes.
-
The search engine's algorithms may get the idea your website is about, or at least partly about, the focus of the pages you are developing. If your website is about developing websites and you are creating pages for a site related to flying squirrels, getting the flying squirrel pages indexed could well dilute your website's focus (from a search engine's point of view).
-
If the development pages on your website are indexed, then later moved to the client or friend's website, the client or friend's site may be perceived as containing duplicate content.
There is an easy solution. Two steps:
-
Create a directory wherein all your development and testing will take place. (Let's assume that directory is named "/dev/")
-
Put 2 lines into the robot's text file. (Example assumes the development director is "/dev/")
User-Agent: * Disallow: /dev/
A robots.txt file is used to control web robot and spider access to certain directories and files. It is a plain text file that is publicly available. (Ours is here.) Web robot and spider compliance with robots.txt directives is voluntary. I believe all popular search engine spiders do comply. See About /robots.txt for more information.
Do all development and testing in the robots.txt disallowed directory or its subdirectories. Search engines that respect robots.txt (all popular search engines) will not index files in the disallowed directory or subdirectories.
Will Bontrager