URL (Uniform Resource Locator) Rewriting

Introduction

This article demonstrates the complete URL rewriting concept using regular expression and set up the predefined rules and handling the issue regarding post back of ASP.NET while requesting the virtual path of a Website.

Why does URL rewriting matter?

URL rewriting is one of the most common techniques used to lure search engines and provide search engine friendly URLs. As a developer, we want to write code that is the most flexible and easier to implement search engine friendly URLs.

Also, handling cases when you want to reconstruct the pages within your website and also take care of the old url which are kept as a bookmark by most of the user should not break during the page relocation.

Improve the search relevancy of the pages on your site using most regular search engine like Google, yahoo and Bing. Specifically, URL Rewriting can often make it easier to embed common keywords into the URLs of the pages on your sites, which can often increase the chance of someone clicking your link.  Moving from using querystring arguments to instead use fully qualified URL's can also in some cases increase your priority in search engine results.  Using techniques that force referring links to use the same case and URL entrypoint (for example: weblogs.asp.net/scottgu instead of weblogs.asp.net/scottgu/default.aspx) can also avoid diluting your pagerank across multiple URLs, and increase your search results.

Native URL mapping

It is possible that old URLs can map new URLs without writing any line of code by using the URL mapping concept in ASP.NET. To use this concept, just create a new urlMapping section under the system.Web section of your web.config file. As shown below.

<url Mapping enable=true>
<add url="~/Info/Copywrite.aspx"  mappedurl="~/Help/CopyWrite.aspx" />
<add url="~/Info/Contact.aspx"      mappedurl="~/Help/Contact.aspx" />
</url Mapping>

Note:-"~/ represent the root directory of the application"

This solution is fine and allows redirecting the user to the new location of the page. But user will may get surprises that during the postback the url is change from "http://www.metlab.com/Info/Copywrite.aspx" to "http://www.metlab.com/Help/CopyWrite.aspx".   This  happen because the ASP.NET engine fills the action attribute of the form with the new url.

<form action=" http://www.metlab.com/Help/CopyWrite.aspx" method="post" name="form1">
</form>

This approach is good and fruitful if we relocate a few number of pages only. If you have many files and URLs, this technique is not recommended.

URL Rewriting

The best way to implement a URL rewriting solution is to create reusable and easily configurable modules, so the obvious decision is to create an HTTP Module (for details on HTTP Modules see MSDN Magazine) and implement it as an individual assembly. To make this assembly as easy to use as possible, we need to implement the ability to configure the rewrite engine and specify rules in the web.config file.

During the development process we need to be able to turn the rewriting module on or off (for example if you have a bug that is difficult to catch, and which may have been caused by incorrect rewriting rules). There should, therefore, be an option in the rewriting module configuration section in web.config to turn the module on or off. So, a sample configuration section within web.config can go like this:

<rewriteModule>
  <rewriteOn>true</rewriteOn>
  <rewriteRules>
    <rule source="(\d+)/(\d+)/(\d+)/" destination="Post.aspx?Year=$1&amp;Month=$2&amp;Day=$3"/>
    <rule source="(.*)/Default.aspx" destination="Default.aspx?Folder=$1"/>
    <rule source="Directory/(.*)/(.*)/(.*)/(.*).aspx" destination="Directory/Item.aspx?Source=$1&amp;Year=$2&amp;ValidTill=$3&amp;Sales=$4"/>
    <rule source="Directory/(.*)/(.*)/(.*).aspx" destination="Directory/Items.aspx?Source=$1&amp;Year=$2&amp;ValidTill=$3"/>
    <rule source="Directory/(.*)/(.*).aspx"  destination="Directory/SourceYear.aspx?Source=$1&amp;Year=$2&amp;"/>
    <rule source="Directory/(.*).aspx"  destination="Directory/Source.aspx?Source=$1"/>
  </rewriteRules>
</rewriteModule>

This means all the request run likes: "http://localhost/Web/2006/12/10/ "redirect to the page known as Post.aspx.

To the above section in web.config file the developer should register a section name and section handler for this section. To do this, add a configsection section in web.config.

<configSections>
  <sectionGroup name="modulesSection">
    <section name="rewriteModule" type="UrlRewriteModule.UrlRewrittingModuleHandler"/>
  </sectionGroup>
</configSections>

This means you can use the following section below the configurationsection

<modulesSection>
  <rewriteModule>
    <rewriteOn>true</rewriteOn>
    <rewriteRules>
     <rule source="(\d+)/(\d+)/(\d+)/"   destination="Post.aspx?Year=$1&amp;Month=$2&amp;Day=$3"/>
     <rule source="(.*)/Default.aspx" destination="Default.aspx?Folder=$1"/>
     <rule source="Directory/(.*)/(.*)/(.*)/(.*).aspx"    destination="Directory/Item.aspx?Source=$1&amp;Year=$2&amp;ValidTill=$3&amp;Sales=$4"/>
      <rule source="Directory/(.*)/(.*)/(.*).aspx" destination="Directory/Items.aspx?Source=$1&amp;Year=$2&amp;ValidTill=$3"/>
      <rule source="Directory/(.*)/(.*).aspx"  destination="Directory/SourceYear.aspx?Source=$1&amp;Year=$2&amp;"/>
      <rule source="Directory/(.*).aspx"  destination="Directory/Source.aspx?Source=$1"/>
    </rewriteRules>
  </rewriteModule>
</modulesSection>

Another thing we have to bear in mind that during the development of the rewriting module it should be possible to use the virtual url with the query string parameters, as shown in following:

"http://www.someblog.com/2006/12/10/?Sort=Dec&SortBy=Date".

Thus we have to develop the solution that can detect the parameters passed via query string and also via virtual url in webappication.

So, let's start by building a new Class Library. We need to add a reference to the System.Web assembly, as we want this library to be used within an ASP.NET application and we also want to implement some web-specific functions at the same time. If we want our module to be able to read web.config, we need to add a reference to the "System.Configuration" assembly.

Handling the Configuration Section

To reading the configuration section in web.config file we have to create a class that will implement the IConfigurationSectionHandler interface. As shown below.

namespace UrlRewriteModule
{
    #region[Directive]
    using System;
    using System.Collections.Generic;
    using System.Linq;
    using System.Text;
    using System.Configuration;
    using System.Xml;
    using System.Web;
    #endregion[Directive]

    /// <summary>
    /// This class has implemented IconfigurationSectionHandler to read configuration Section and
    /// return it any form of object
    /// </summary>
    public class UrlRewrittingModuleHandler:IConfigurationSectionHandler
    {
        #region[PrivateVariable]
        private XmlNode _xmlSection;
        private string _rewriteBase;
        private bool _rewriteOn;
        #endregion[PrivateVarible]

        #region[Poperty]

        /// <summary>
        /// gets or sets xmlSection
        /// </summary>
        public XmlNode XmlSection
        {
            get { return _xmlSection; }
            set { _xmlSection = value; }
        }
               /// <summary>
        /// gets or sets RewriteBase
        /// </summary>
        public string RewriteBase
        {
            get { return _rewriteBase; }
            set { _rewriteBase = value; }
        }

        /// <summary>
        /// gets or sets RewriteOn
        /// </summary>
        public bool RewriteOn
        {
            get { return _rewriteOn; }
            set { _rewriteOn = value; }
        }
        #endregion[Poperty]

        /// <summary>
        ///
        /// </summary>
        /// <param name="parent"></param>
        /// <param name="configContext"></param>
        /// <param name="section"></param>
        /// <returns></returns>
        public object Create(object parent, object configContext, XmlNode section)
        {
            // set base path for rewriting module to
            // application root
            _rewriteBase = HttpContext.Current.Request.ApplicationPath ;//+ "/";

            // process configuration section
            // from web.config
            try
            {
                XmlSection = section;
                RewriteOn = Convert.ToBoolean(section.SelectSingleNode("rewriteOn").InnerText);
            }
            catch (Exception ex)
            {
                throw (new Exception("Error while processing RewriteModule configuration section.", ex));
            }

            return this;
        }
    }
}

Maintain Original URL

When handling virtual URLS such as http://www. somebloghost.com/Blogs/gaidar/?Sort=Asc (that is, a virtual URL with query string parameters), it is important that you clearly distinguish parameters that were passed via a query string from parameters that were passed as virtual directories. Using the rewriting rules specified below:

     <rule source="(.*)/Default.aspx" destination="Default.aspx?Folder=$1"/>,

You can use the following URL:

http://www. somebloghost.com/gaidar/?Folder=Blogs

And the result will be the same as if you used this URL:

http://www. somebloghost.com/Blogs/gaidar/

To resolve this issue, we have to create some kind of wrapper for 'virtual path parameters'. This could be a collection with a static method to access the current parameters set:

namespace UrlRewriteModule
{
    #region[Directive]
    using System;
    using System.Collections.Generic;
    using System.Collections.Specialized;
    using System.Linq;
    using System.Text;
    using System.Web;
    #endregion[Directive]
    /// <summary>
    /// This class is use to create wrapper of virtual path parameters.
    /// This is content some static methods to access current parameter set.
    /// </summary>
    public class ReWriteContext
    {
        #region[PublicField]
        /// <summary>
        /// Get the RewriteContext
        /// </summary>
        public static ReWriteContext Current
        {
            get
            {
                if (HttpContext.Current.Items.Contains("RewriteContextInfo"))
                {
                    return HttpContext.Current.Items["RewriteContextInfo"] as ReWriteContext;
                }
                else
                    return new ReWriteContext();
            }
        }

        /// <summary>
        /// Initialize object with parameterize constructor
        /// </summary>
        /// <param name="param">provide namevalueCollection</param>
        /// <param name="baseUrl">provide baseUrl</param>
        public ReWriteContext(NameValueCollection param, string baseUrl)
        {
            Params = new NameValueCollection(param);
            InitialUrl = baseUrl;
        }

        /// <summary>
        /// Initialize object with the default setting
        /// </summary>
        public ReWriteContext()
        {
            _Params = new NameValueCollection();
            _initialUrl = string.Empty;
        }

        public NameValueCollection Params
        {
            get { return _Params; }
            set { _Params = value; }
        }

        public string InitialUrl
        {
            get { return _initialUrl; }
            set { _initialUrl = value; }
        }

        #endregion[PublicField]

        #region[PrivateField]
        private NameValueCollection _Params;
        private string _initialUrl;
        #endregion[PrivateField]

    }
}

Rewriting URL

For rewriting the url we have implement the module. Which will take care of all the url rewriting technique. The code for this is provided in below.

namespace UrlRewriteModule
{
    #region[Directives]
    using System;
    using System.Collections.Generic;
    using System.Linq;
    using System.Text;
    using System.Text.RegularExpressions;
    using System.Web;
    using System.Configuration;
    using System.Xml;
    #endregion[Directives]

    public class ReWriteModule:IHttpModule
    {

        public void Dispose()
        {
            throw new NotImplementedException();
        }

        public void Init(HttpApplication context)
        {
            ////This event handler is execute during the BeginRequest Handler
            context.BeginRequest += new EventHandler(RewriteModule_BeginRequest);
            ////This event will execute during PreRequestHandler Execute
            context.PreRequestHandlerExecute += new EventHandler(RewriteModule_PreRequestHandlerExecute);
        }

        /// <summary>
        /// PreRequestHandler it will execute before request process to the handler
        /// </summary>
        /// <param name="sender">the object which will execute this event</param>
        /// <param name="e">Event argument that is contain all the known argument value</param>
        void RewriteModule_PreRequestHandlerExecute(object sender, EventArgs e)
        {
            HttpApplication application = (HttpApplication)sender;
            if ((application.Context.CurrentHandler is System.Web.UI.Page)
                && (application.Context.CurrentHandler != null))
            {
                System.Web.UI.Page page = application.Context.CurrentHandler as System.Web.UI.Page;
                page.PreInit += new EventHandler(page_PreInit);
            }
        }

        /// <summary>
        /// It event will execute during the pre initialization of
        /// page
        /// <remarks>
        /// This method checks if the user requested a normal ASP.NET
        /// page and adds a handler for the PreInit event of the page lifecycle.
        /// This is where RewriteContext will be populated with actual parameters and
        /// a second URL rewriting will be performed. The second rewriting is necessary to make
        /// ASP.NET believe it wants to use a virtual path in the action attribute of an HTML form.
        /// </remarks>
        /// </summary>
        /// <param name="sender">represent the page object</param>
        /// <param name="e">argument that contain </param>
        void page_PreInit(object sender, EventArgs e)
        {
            if (HttpContext.Current.Items["OriginalUrl"] != null)
            {
                string path = (string)HttpContext.Current.Items["OriginalUrl"];

                ReWriteContext reWriteContex = new ReWriteContext(HttpContext.Current.Request.QueryString, path);

                if (path.IndexOf("?") == -1)
                    path = path + "?";
                HttpContext.Current.RewritePath(path);
            }
        }

        /// <summary>
        /// It will execute BeginRequest has execute
        /// </summary>
        /// <param name="sender">Represent the object that execute the event</param>
        /// <param name="e">Provide argument that is pass as EventArg</param>
        void RewriteModule_BeginRequest(object sender, EventArgs e)
        {
            ////Read configuration section using rewriteModuleSection
            UrlRewrittingModuleHandler rewriteModuleSection = (UrlRewrittingModuleHandler)ConfigurationManager.GetSection("modulesSection/rewriteModule");

            ////check rewrite option hasbeen on or off
            if (!rewriteModuleSection.RewriteOn)
                return;

            ////Keep the current request path
            string path = HttpContext.Current.Request.Path;

            ////Check for path length
            if (path.Length == 0)
                return;

            ////Get the rule section from the xmlconfig file
            XmlNode rules = rewriteModuleSection.XmlSection;
            foreach (XmlNode node in rules.SelectNodes("rewriteRules/rule"))
            {
                try
                {
                    ////Create regural excepresion for further match
                    Regex re = new Regex(rewriteModuleSection.RewriteBase + node.Attributes["source"].InnerText, RegexOptions.IgnoreCase);
                    ////Check the match
                    Match match = re.Match(path);
                    ////match is Success or Unsuccess
                    if (match.Success)
                    {
                        ////Replace the old url with the destination one
                        path = re.Replace(path, node.Attributes["destination"].InnerText);

                        ////Check path length
                        if (path.Length != 0)
                        {
                            ////check for querystring count
                            if (HttpContext.Current.Request.QueryString.Count != 0)
                            {
                                ////check for sign
                                string sign = path.IndexOf("?") == -1 ? "?" : "&";
                                ////Reconfigure the path
                                path = path + sign + HttpContext.Current.Request.QueryString;
                            }
 
                            ////Write newly change one
                            string newUrl = rewriteModuleSection.RewriteBase + path;

                            ////Writting the OriginalUrl for future referance
                            HttpContext.Current.Items.Add("OriginalUrl", HttpContext.Current.Request.RawUrl);

                            ////Rewrite the url
                            HttpContext.Current.RewritePath(newUrl);
                        }

                        return;
                    }
                }
                catch (System.Exception ex)
                {
                    throw (new Exception("Incorrect rule.", ex));
                }
            }
            return;
        }
    }
}

Summary

In this article, we learned the URL rewriting concept in ASP.NET and how to implement it using the regular expressions.

Resources

Here are some useful related resources:


Similar Articles