{infiniteZest}
// Articles. Tutorials. Utilities.
Home  |   Search  |   Login  
Categories Skip Navigation Links
New / All
AJAX
Apple
ASP.NET
.NET
Git
Google / Android
Python / IronPython
Miscellaneous
SQL Server
Search engine issues with RewritePath
Summary
RewritePath used to service search engine friendly urls has problems with search engines. While this works fine with regular browsers, the request fails with the user agents of most all search engines. This article looks at the workarounds (fix from Microsoft was not available by early 2007)
 
Table of Contents

What’s the Issue?

The Fix

Test with Fiddler using several commonly used User-Agent strings

First step is to get an HTTP debugging tools like Fiddler.

The second step is to get some user-agent strings.

And the third step is to use Fiddler to test with the above User-Agent strings.

Figure 1. Google’s User Agent String inside Fiddler

Figure 2: Success and Failure results from Fiddler

Additional Resources mentioned in this article

 

What’s the Issue?

Rewriting URLs to make them more user friendly to search engines and end users is a pretty common practice. You see this in the blogging world and other content-based web sites everywhere. This approach for asp.net sites is described in the following article:

Some Quick and Easy Ways to Rewrite URLs

The above approach works fine in almost all the browsers, but the Googlebot and other search spiders have problems with the above approach in ASP.NET 2.0. The problem is in the RewritePath method. This clearly is a problem with ASP.NET 2.0 and the RewritePath method. It is NOT a problem with Googlebot or other search engines.

Several solutions have been offered on the web. The one fix that I think is simple and works without making any changes to the existing content or even the web.config file is discussed here. And Google’s thoughts on this matter are discussed here. This article discusses some additional aspects of this issue.

The Fix

Basically, the fix involves creating a .browser file. In the above article a .browser file for Yahoo!Slurp was created. You can use the exact the same file for Googlebot and make the following changes:

....

<browser id="Googlebot" parentID="Mozilla">

<userAgent match="Googlebot" />

<capability name="browser" value="Googlebot" />

....

In essense, you will create a file called gooblebot.browser in the App_Browsers directory of your web site. The content of this file will be something like:

<!--

Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)

-->

<browsers>
  <browser id="Googlebot" parentID="Mozilla">
    <identification>
      <userAgent match="Googlebot" />
    </identification>
    <capabilities>
      <capability name="browser" value="Googlebot" />
      <capability name="Version" value="4.0" />
      <capability name="MajorVersion" value="4" />
      <capability name="MinorVersionString" value="" />
      <capability name="MinorVersion" value=".0" />
      <capability name="activexcontrols" value="true" />
      <capability name="backgroundsounds" value="true" />
      <capability name="cookies" value="true" />
      <capability name="css1" value="true" />
      <capability name="css2" value="true" />
      <capability name="ecmascriptversion" value="1.2" />
      <capability name="frames" value="true" />
      <capability name="javaapplets" value="true" />
      <capability name="javascript" value="true" />
      <capability name="jscriptversion" value="5.0" />
      <capability name="supportsCallback" value="true" />
      <capability name="supportsFileUpload" value="true" />
      <capability name="supportsMultilineTextBoxDisplay" value="true" />
      <capability name="supportsMaintainScrollPositionOnPostback" value="true" />
      <capability name="supportsVCard" value="true" />
      <capability name="supportsXmlHttp" value="true" />
      <capability name="tables" value="true" />
      <capability name="vbscript" value="true" />
      <capability name="w3cdomversion" value="1.0" />
      <capability name="xml" value="true" />
      <capability name="tagwriter" value="System.Web.UI.HtmlTextWriter" />
    </capabilities>
  </browser>
</browsers>

I don’t think you actually need to worry about which jscript version does Googlebot support in the above browser capability. Google’s interest is to actually suck in the text and make sense of the images, etc. I don’t think it cares about executing the actual JavaScript that you embedded on your page. Here you are trying to ’trick’ asp.net. With the above addition of .browser file for Google, now the server errors (Error Code 500) will go away. You will notice in Google’s web master tools area, the URL errors will go down for these re-directed URLs.

Test with Fiddler using several commonly used User-Agent strings

Now, how can you make sure the above fix works?

First step is to get an HTTP debugging tools like Fiddler.

If you have not used Fiddler before, the following article discusses some details in the following article:

Using Fiddler for debugging HTTP issues (including RewritePath issues)

Fiddler can be used to quickly test various pages on your site, including the url-redirected pages, using various user-agent strings. Your goal is to get the result 200 (this is HTTP status code for OK). The problem pages will come with a result of 500 (status code for Server Error). You might actually find that there are issues with other (non-redirected) pages as well.

The second step is to get some user-agent strings.

Using Fiddler you are fooling ASP.NET on your web server that the user-agent that sent this request is, for example, Googlebot. For this you need a set of user-agent strings that are used by browsers (IE, FireFox, Opera, etc.) and search engines (Google’s Googlebot, Yahoo!’s Slurp, and so on). You can get a set of commonly used User-Agent strings (for both browsers and search bots) from this article:

Important User Agent Strings

And the third step is to use Fiddler to test with the above User-Agent strings.

For example, to test how ASP.NET on your web server behaves when Googlebot access it, you can use the following:

Accept: */*
Accept-Encoding: gzip, x-gzip
User-Agent: Mozilla/5.0 (compatible; Googlebot/2.1;  http://www.google.com/bot.html)

The following image shows the Fiddler request builder for testing a particular page with a particular user agent:

Figure 1. Google’s User Agent String inside Fiddler

Googlebot HTTP Request from Fiddler

In the above box, we are trying to GET a path-rewritten (RewritePath) URL and we are pretending to be the User-Agent Googlebot. If everything is working properly, the result would be the status code 200 and it will return the body of the article.

The OK and the error will look like the following (two different user-agents are used below):

Figure 2: Success and Failure results from Fiddler

Googlebot HTTP Request from Fiddler Results

The OK (Result 200) should be the case for all the important user-agent strings (for both browsers and search engines). But, in some cases, asp.net 2.0 returns an error code (500) and does not return and body at all.

The body you see in case of the Error Code 500 is the error message. If you did not enable the custom errors (that’s what you would do in the production environment), you will the following message:

Description: An application error occurred on the server. The current custom error settings for this application prevent the details of the application error from being viewed remotely (for security reasons). It could, however, be viewed by browsers running on the local server machine.

Details: To enable the details of this specific error message to be viewable on remote machines, please create a <customErrors> tag within a "web.config" configuration file located in the root directory of the current web application. This <customErrors> tag should then have its "mode" attribute set to "Off".

<!-- Web.Config Configuration File --> <configuration> <system.web> <customErrors mode="Off"/> </system.web> </configuration>

Additional Resources mentioned in this article

Some Quick and Easy Ways to Rewrite URLs

Using Fiddler for debugging HTTP issues (including RewritePath issues)

Important User Agent Strings

Bookmark and Share This

More Articles With Similar Tags
icon-fiddler-results.jpg
Fiddler lets you capture the HTTP traffic on your computer very easily. You would be able to test how your site responds to requests from search engines very easily (by putting that user-agent string in the HTTP request). In addition, what’s discussed in this article can be used to debug things like the problems with RewritePath.
With ID as a query string parameter (...aspx?ID=...) being a big no-no from the search engine perspective, we would need a way to convert the convenient fake urls into the ones our system understands (like using ID= in the url). There are several ways of accomplishing this; we will look at a couple of quick options.
Transferring control from one aspx page to another is a pretty common need. This piece looks at one scenario.
It's easy and convenient to obtain articles, blog posts, etc. via ID as query string. But the page generated with ID paramater has less of a chance of being indexed by search engine. Instead of ID, use a more search-engine-friendly name in the URL.
You don't want to show ads from adsense while you are testing your site. This will inflate your adsense numbers and there is a chance of you accidentally clicking on one of the ads. Google does not allow you clicking on ads shown on your site. This article shows how to be safe and not show ads while testing.
About  Contact  Privacy Policy  Site Map