Apache mod_rewrite (.htaccess) - Rewrite or Redirect Request URLs if page not found, or no www - Code-Tips.com - Web Development, Programming, SEO

Sunday, January 17, 2010

Apache mod_rewrite (.htaccess) - Rewrite or Redirect Request URLs if page not found, or no www

Apache mod_rewrite (.htaccess) - Request rewrites & redirects to site root if page or directory not found

There are many scenarios where mod_rewrite can be used on a web server to improve usability, optimise for search engines, perform redirects based of specific criteria and much more. The following are some examples of how to configure .htaccess files on an Apache web server to alter requests and actions taken by the web server, including configuring the server to redirect to the site root (or other specified page) if a page or directory is not found, configuring the server to convert components of a http request into query string parameters to be passed to a specific page/script for processing and display (without altering the original address entered into the browser).



Other examples demonstrate common uses of mod rewrite such as configuring conditions and rules to ensure that the mod_rewrite engine affects requests that contain specific criteria, including working with, and handling multiple domain names using .htaccess and mod_rewrite. This allows you to configure redirects or rewrites based on the domain or subdomain entered such as forwarding "host.com" to "www.host.com" if the www was not included in the request. Some explanations of the special characters used to build the expressions used in conditions and rules are also explained.

Apache mod_rewrite Examples:


  • Redirect to site root if page not found
  • Convert part of a request to Query String Parameters to pass to a different page on the server
  • Redirect to include www (HTTP/1.1 301 Moved Permanently)

mod_rewrite - Background & General Information

Operators:
< : is lexically lower
> : is lexically greater
= : is lexically equal
! : not

CondPatterns
-d : is a Directory
-f : is a regular file
-s : is a regular file with size
-l : is a symbolic link
-F : is existing file via subrequest
-U : is existing URL via subrequest

eg.
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d

The above conditions will be true if the requested filename is not found as a file or directory on the web server. Any proceeding rules will be executed when the filename or directory is not found. If found, the conditions will not allow the rules to process, so the request will remain the same allowing the page to load.

Regular Expressions

Text:
. : any single character
[chars] : one of the chars from the set
[^chars] : not any of the chars from the set
choice1|choice2 : Alternative - choice1 or choice2


Quantifiers:
? : 0 or 1
* : 0 to N (many)
+ : at least 1 to N (many)

Grouping:
(text) - allow a string of characters to be grouped and quantified if required. eg: ^(www)+(.*) requires that "www" is included in the request string once only, followed by anything.

Anchors:
^ : Start of line
$ : End of line

Escape Special Characters:
\char : Escape special characters for use explicitly in a string.


Regular Expression Examples:


Expression
Input
Result
^blog(.*).com$blog.master-sharepoint.comtrue

blog.master-sharepoint.netfalse

www.master-sharepoint.comfalse

blog.master-sharepoint.com/aboutfalse
!^(www.)+master-sharepoint.com(.*)master-sharepoint.com/abouttrue

www.master-sharepoint.com/aboutfalse

For more information about mod_rewrite conditions, regular expressions and server variables avalilable for use by the mod_rewrite engine, see Module mod_rewrite URL Rewriting Engine.

Enable the mod Rewrite engine:

RewriteEngine On

Set the base location:
RewriteBase /



Redirect to site root if page not found:

#if page or directory on website is not found, external redirect to site root

RewriteBase /
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*) http://%{HTTP_HOST} [R]

!-f : Request filename is not a file
!-d : Request filename is not a directory
[R] = External Redirect

Rule Breakdown:

^ - Start of string

(.*) - Set: 1 to many single characters

http://%{HTTP_HOST} - Http response: uses data from the the HTTP_HOST request variable to redirect the user to the site root, ignoring the page requested that was not found. To redirect to a specific page, such as a custom not found page, append the required page onto the end of the response address ( http://%{HTTP_HOST}/custom_error.php )

[R] - Tells the server to redirect the user to the address generated using an external redirect (the address bar of the browser will display the generated address after the page has loaded)



Convert part of a request to Query String Parameters to pass to a different page on the server:

#if request not found, rewrite to specific page/script. Convert request details (directory/filename) into query string parameters

RewriteBase /
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^page/(.*) show-page.php?page=$1 [L]

Example:
Browser Request: "http://host/free-php-scripts/"

Loads the following page on the web server: "http://host/show-page.php?page=free-php-scripts" (but will still display "http://host/free-php-scripts/" in the browser)

The browser will load the show-page.php file with the directory/filename details as the query string parameter for "page". The address bar in the browser will still display "http://host/free-php-scripts/" even though "http://host/show-page.php?page=free-php-scripts" was the request processed by the server to display the page. You will need to make sure that paths (images, urls, stylesheets, JavaScript files, etc.) are relative to the site root and not the current directory. For example, a directory at the root of the web site called "images" is used to store images displayed on the website. An address (href) to an image in the images directory that is accessible from the "http://host/show-page.php?page=free-php-scripts" page might be "images/logo.jpg". If you use mod-rewrite to access the same page by requesting "http://host/free-php-scripts/", the browser will try to access the image the following location: "http://host/free-php-scripts/images" which is no longer correct.

One solution when linking images, stylesheets or JavaScript files from a webpage that has the address generated using mod_rewrite rules is to use the full absolute path ( "http://host/images/logo.jpg" ), or make all paths relative to the site root ( "/images/logo.jpg" ). Another solution may be to determine the level or depth of a page request within the directory structure of a web server using the request data, then incorporate the path back to the root of the site into page urls dynamically. For example, urls on the "http://host/free-php-scripts/" page pointing to the "images/" would include "../" at the beginning making the full address "../images/" when the html of the webpage is generated. Using the second method may be useful when directories and files linked to from the web page are relative to the current page and not the site root.



Redirect to include www (HTTP/1.1 301 Moved Permanently):

RewriteBase /
RewriteCond %{HTTP_HOST} !^www(.*)
RewriteRule ^(.*) http://www.%{HTTP_HOST}%{REQUEST_URI} [R]

This will redirect any request that is missing the www to the same host including www. For example, a request to "http://host.com" will be redirected to "http://www.host.com". If you have a "blog" subdomain ( http://blog.host.com ), to prevent mod_rewrite from redirecting this request to "http://www.blog.host.com/" the following conditions and rules could be used:

RewriteBase /
RewriteCond %{HTTP_HOST} !^www(.*)
RewriteCond %{HTTP_HOST} !^blog(.*)
RewriteRule ^(.*) http://www.%{HTTP_HOST}%{REQUEST_URI} [R]

This will redirect any request that doesn't start with "blog..." and that is missing the www, to the equivalent request including the www. If you have many subdomains, it may be easier to test the domain name explicitly and redirect to include the www if required:

RewriteBase /
RewriteCond %{HTTP_HOST} ^host.com
RewriteRule ^(.*) http://www.host.com%{REQUEST_URI} [R]

The condition above includes the domain explicitly without the www. This will mean that the condition will be satified allowing the rules to be executed only when the domain (TLD) is accessed without any subdomain or www included in the address. Then this is the case, the Apache mod_rewrite engine will redirect the request to the host with www included. The original page being requested will be included in the redirect URL with the www at the start.



Mod_Rewrite References:
  • Mod_Rewrite - Apache 1.3 Documentation - This module provides a rule-based rewriting engine to rewrite requested URLs on the fly.
  • Apache Module mod_rewrite - Apache 2.0 Documentation - "This module uses a rule-based rewriting engine (based on a regular-expression parser) to rewrite requested URLs on the fly. It supports an unlimited number of rules and an unlimited number of attached rule conditions for each rule, to provide a really flexible and powerful URL manipulation mechanism. The URL manipulations can depend on various tests, of server variables, environment variables, HTTP headers, or time stamps. Even external database lookups in various formats can be used to achieve highly granular URL matching".
  • URL Rewriting (Ross Shannon) - "The Apache server’s mod_rewrite module gives you the ability to transparently redirect one URL to another, without the user’s knowledge. This opens up all sorts of possibilities, from simply redirecting old URLs to new addresses, to cleaning up the ‘dirty’ URLs coming from a poor publishing system — giving you URLs that are friendlier to both readers and search engines."
  • Learn Apache mod_rewrite: 13 Real-world Examples - "Apache's low-cost, powerful set of features make it the server of choice for organizations around the world. One of its most valuable treasures is the mod_rewrite module, the purpose of which is to rewrite a visitor's request URI in the manner specified by a set of rules."

2 comments:

  1. how can we navigate this in to
    http://example.domain.com/example/folder/page.php

    =>>
    http://example.domain.com/folder/page.php

    by htaccess rule

    ReplyDelete
  2. URL rewriting through .htaccess is a good way to make the urls SEO friendly. url rewriting is one of the task involved in web development. Sometimes some SEO people also do this.

    ReplyDelete