Inadvertent Duplicate Content
URLs with query string parameters can result in the same page being in a search engine index more than once – duplicate content.
I'll show you how to use PHP to remove the parameter from the URL with a 301 redirect. (301 means "Permanent Redirect," telling search engine spiders the page has moved permanently to the new location, the old URL is permanently restated as the new URL.)
URLs with query string parameters have a "?" character followed by data. It may look like http://example.com/page.html?name=Will where "name=Will" is the URL parameter.
URLs without parameters are considered a different page than URLs with parameters, even if the main part of the URL is identical. Similarly, the same URL except with different parameters are considered to be different pages. The reason is that URL parameter data can cause pages to have different content.
Examples:
http://example.com/page.html
http://example.com/page.html?name=Will
http://example.com/page.html?color=green
http://example.com/page.html?color=green&size=large
If the content is the same at those various URLs, it is duplicate content.
How Duplicate Content Can Happen
Some websites use parameter information for session ID's, product display, page display, tracking, and for other reasons.
Further, other site owners can create URLs with parameters and use them to link to your page from their site – a site search engines will spider. There are good reasons for linking with such URLs. Examples:
-
The URL has an affiliate ID. An affiliate id as a URL parameter is common. The browser goes to the same page as it does for a URL without an affiliate ID. The affiliate ID serves to have a cookie set in case the person buys a product while at the website.
-
The URL has a source ID. Some websites append a source ID parameter to URLs they link to. The source ID lets the destination site know where the visitor comes from. There are two reasons for using a source ID instead of browser referral data:
-
Browser referral data can't be relied on: (a) Referral data can easily be spoofed. (b) Some browsers don't provide referral data.
-
Browser referral data varies by the page where the link is located. A source ID provides a fixed ID value site wide, easily searched for in server request logs.
Interested site managers can scan their request logs for the source ID to see how many referrals were made.
-
-
The parameter has custom information for the destination site. A website can give another website pertinent information, perhaps about the site visitor. It is a way to exchange information between websites, something for which cookies can't reliably be used. Generally, the two websites are owned by the same entity or the owners of the two sites have an information exchange agreement.
Removing Parameters from URLs for Search Engine Spiders
The code below will remove the parameter information from URLs for, and only for, spiders that identify themselves as
- googlebot,
- Yahoo! Slurp, or
- msnbot,
and 301 redirect the spider to the changed URL. For all other browsers and spiders, the URL won't be changed and the redirect won't take place.
The code is PHP. It must be placed at the very top of the source code of a PHP page. If the code is not at the top of the page, the redirect may be disabled. (Customization notes follow.)
<?php // This line needs to be at top of page. if( isset($_SERVER['QUERY_STRING']) and strlen($_SERVER['QUERY_STRING']) ) { if( strpos($_SERVER['HTTP_USER_AGENT'],'googlebot')!==false or strpos($_SERVER['HTTP_USER_AGENT'],'Yahoo! Slurp')!==false or strpos($_SERVER['HTTP_USER_AGENT'],'msnbot')!==false ) { header('Location: http://example.com' . preg_replace('/\?.*$/','',$_SERVER['PHP_SELF']), true, 301); exit; } } ?>
Customization:
Replace the red URL http://example.com with the URL to your own domain.
Alternatively, the blue spider identification section may be changed to remove spiders or add additional spiders.
Testing
When the code is in place, test it.
The page should work just like normal with your browser. Test it with and without parameter information.
Next, test it by spoofing the user agent as googlebot or one of the other spiders. Test it with and without parameter information.
If you don't have the means for user-agent spoofing, a search for "spoof user agent" should provide something you can use.
If you are a WebSite's Secret member, you are in luck. Download Snooper 3-Pack and install it on your server. It's a handy tool, for much more than just spoofing, as you'll see when you read about it.
Use PHP for redirect 301 to prevent duplicate content if a spider requests a URL with a parameter. The PHP code provided in this article can help prevent duplicate content from your own website being put into search engine indexes.
Will Bontrager