Improve Performance for Websites with a Database Back-end & Dynamic Content - Code-Tips.com - Web Development, Programming, SEO

Wednesday, December 23, 2009

Improve Performance for Websites with a Database Back-end & Dynamic Content

Improve Performance for Websites with a Database Back-end & Dynamic Content

How to improve the performance of web pages, and the overall performance of a website by minimise front-end processing and additional server/client DNS requests required to fully load each page.

  1. Identify the areas/scripts for a website which are frequently used
  2. Identify scripts which take the longest time to process or load
  3. Identify DNS Requests that can be removed from front-end pages
  4. Create scripts to perform background processing, then store website content (HTML) in a database table.
  5. Replace code in scripts to retrieve the pre-formatted content from a database

The techniques described below focus on reducing the processing requirements and DNS requests required to generate and load a page for both the server-side and client-side aspects of a web application or website. The steps explain how to identify, and reduce the amount of processing required by the front-end server of a website by performing background tasks to complete processing which would usually take a long time to complete. This includes identifying scripts/pages which are frequently used, and considering reducing the loading time by retrieving pre-prepared html from a database instead of processing and displaying from the front-end script. The background tasks will periodically execute scripts which will perform the processing required to generate and store the pre-formatted html in a database table ready for retrieval when loading the web page.



The techniques explained below will be most effective when the databases for a website are hosted on the same server or farm.


1. Identify the areas/scripts for a website which are frequently used

Identifying the areas of a website which a re most used allows you to prioritise pages in an order which development time will be most beneficial. Spending a lot of time optimising scripts that are not frequently used on the website is not good use of your time.

Website Traffic Statistics
There are many common (and popular) methods which can be used to analyse page and visitor statistics for a website, that are also free. Most of these allow you to at least view traffic information to a website in terms of the content (the pages being visited), the visitors (country/region/city, browser, OS, host, etc.) and the source of the traffic (referral, search engine query, direct). From this information, you can see what pages on your site are most used, as well as the location of the users most frequently accessing the pages (See Geographical Targeting below).

Google Analytics
Google Analytics is a free service that tracks information about content and users of a website to great detail. This information can be viewed using the selection of core reports, but functionality is aloso provided to track custom data and generate reports using custom criteria.

Site Meter
Site Meter is a free or paid service that tracks statistics for websites. The statistics can be made public if required, allowing anyone to view all or some of the details about your site.

Website Traffic Stats
Website Traffic Stats provide reports similar to the above, but also include reports detailing the navigation structure of a website, which can also come in handy when Optimising a Website for Search Engines (SEO).


2. Identify scripts which take the longest time to process or load

Identifying and optimising specific scripts that take a long time to process is a good method for improving the performance of a website. There are many factors that have an affect on the performance of a website, some which are out of your control, or not easy to control. A large amount of processing is generally required to generate and display information on a website when there are many complex database queries, calculations, comparisons and graphical representations. In many cases, it will be possible to process and generate the above information and graphics by processing background scripts executed periodically by the server. A common scenario where this may assist is when statics are being generated and displayed for content on a website, which may require many thousands or millions of database records to be retrieved and processed to be able to display and use the information. In the above example, background scripts could process and generate the statistical information which would then be stored in a separate database or table to be retrieved and displayed only by the websites front-end, or pages. Retrieving the pre-processed data from the database and displaying directly on the page with minimal processing required will reduce the amount of time required to display the page. You need to find a balance between the frequently used scripts determined above in Step 1, and the scripts determined in this step that require a large amount of time to process or load when deciding which scripts and pages to optimise to improve the overall performance of a website.

There are tools and software is available to help identify scripts and pages in websites that take a long time to process and display, or alternatively, you could incorporate into your own scripts the ability to test page load times (get the current time at the beginning and end of processing a script/page, then calculate the length using the difference of the two values).

Web Page Analyzer

'Try our free web site speed test to improve website performance. Enter a URL below to calculate page size, composition, and download time.'

'Find out how fast your website loads. Too slow? Perhaps you need to optimize the page or move to a faster server.'

Website speed check

'The website speedtester shows the duration of a given website. This value can be used for showing how long a website take to load and if it is better to optimize the website or change a (slow) ISP.'



3. Identify DNS Requests that can be removed from front-end pages

Reports generated using tools such as the 'Website speed checks' listed above will often tell you how many DNS requests were required to load the page. It is quite common for a web page to require multiple DNS requests (over 3 or 4) in order to retrieve information from external websites and web services. An common scenario is a page that is served Ads from a public Ad server, a page that displays the content (or titles) of an rss feed or a page with content scraped from pages on other websites. Reducing these types of requests can have a significant affect on the performance of a website, as external dependencies will be eliminated or at least reduced.

Cache RSS Feed Data
A method for improving the performance of a page which displays the content of one or more RSS feeds is caching the feed data, and displaying from a local database instead of directly from the external location. A background script could retrieve and format the feed content, then updated/store in a local database (see Step 4 below for details about running background scripts, or 'cron jobs'). Many feeds remain unchanged for long periods of time making it unnecessary to re-load a fresh copy from the original (external) source every time a page is loaded on a website. A simple database query, with no additional processing or formatting will be much more efficient than retrieving and processing the data each time a page on a website is requested.

If an RSS feed needs to be displayed using JavaScript by the client, you should include the script close to the bottom of the page, so the the majority of the content will have loaded before it tries to retrieve the RSS feed data. This will also help if the external site which the RSS feed comes from is down or running slowly, which will also affect the performance of pages on your website.

Queue Connections to Web Services
It is often a requirement of a website or web page to use web services on an external website or server on the internet. This is usually server-side processing, which can be quite slow when multiple external hosts are being connected to. The XML-RPC provides a schema for web services and methods and is common example used around the internet. If one or more external hosts are slow or not accessible, the script will take longer to complete or time-out depending on (your) web server configuration.

One possible method to improve performance of pages that connect to web services from multiple external hosts is to store the information for each individual connection and request in a database table, which can then be processed as a queue by a background script/cron job. An example may be a website that allows users to submit details of multiple sites to many directories using the XML-RPC web service. A separate connection would be required for each external host, and for each site being submitted, which could reach the thousands quite easily. To manage such a large number of connections to external hosts without making the user wait for each to complete, the details of the site being submitted can be added to a separate database table, which is processed by a script executed by the server. For a service such as this, it is essential that the queue is being processed as close to "real-time" as possible, so the interval between execution of the background script would be short. The script would need to be configured to allow multiple instances of the same script to run in parallel without conflicting with each other. This will cater for scripts that take a long time to complete, and will also allow you to execute more than one instance of the script at at time. This would mean that each script would process a small number of rows from the queue table minimising the chance of one timing out if some external servers were not accessible.

When an end-user executes a transaction that would usually take over 30 seconds for the front-end script to complete, the page can load almost instantly, as it is much quicker to add the details to a local database table than to complete a connection to each individual external host. The background processing required to process the queue will have a minimal affect on the servers processing requirements and in return, almost no affect on the end users' experience with the performance of the website.

Script Organisation
If you need to connect to external hosts to get information required to display on a page, you should send the majority of the page content to the browser before attempting to connecting to external hosts. This can usually be done by including the script close to the bottom of the page if using JavaScript, or by buffering results for connections to external hosts if processing using a server-side programming or scripting language. This is a perceived improvement in performance, as it is clear to the user that their action is being processed even when it takes a long time to load fully.


4. Create scripts to perform background processing, then store website content (HTML) in a database table.

Once you have determined which scripts should be focused on to optimise your website's performance, you need to write separate scripts to complete the demanding processing and requests for information from external sources. These scripts will then populate database tables with pre-formatted HTML, which will be later display on pages when called from front-end scripts. A common background script may be one which retrieves updated information from multiple feeds, which can then be used to display the feed content on web pages in full, or as line-items (Feed item, or category titles) to be used as a shortcuts to the category or individual articles, posts and pages (display Related Posts dynamically on blogs). As the request for the feed data, including the process of adding the feed content to a html template (which usually requires further reads from the file system of the web server) is completed by the background scripts or cron jobs, the pages on the website are able to load very quickly, as only a single query to the database is required, that returns a small set of rows containing fields with the pre-generated html. In most cases, this will significantly improve the performance of your website, as you have eliminated the need for additional DNS requests every time a page is loaded.

Cron Jobs
Cron Jobs are scripts or commands that are executed by the server at a specific time, or repeatedly with specified intervals between execution of each script. In this case, where we are using cron jobs to perform the in-depth processing and external requests. As long as the Cron Jobs have been configured and are working properly, the pre-formatted html in the database will always be up-to-date. You will need to find a balance between the frequency that the scripts are run which may affect the overall performance of the web server, to the importance of having up-to-date information. For example, if an RSS feed is only updated once every few months, there is no need to run a script that updates that particular feed every minute.

Many popular web hosting services, including many that are free offer a solution which includes the ability to incorporate 'cron jobs' into a web application. Many hosts using Unix based servers provide this feature with theire hosting solutions making it essential to have at least a basic knowledge of Unix commands that can be useful for web and database applications. The hosts below offer free web hosting with many features including the ability to execute cron jobs, or background scripts on a periodic basis.

Free Web Hosting Allowing Cron Jobs

A comparison of approximately 30 free Web Hosting solutions that allow Cron Jobs.


000WebHost.com - Free Web Hosting With Cron Jobs



5. Replace code in scripts to retrieve the pre-formatted content from a database

The final step once you have completed and tested the Cron Job scripts that run in the background is to modify the front-end scripts or pages on your website to retrieve the pre-formatted html from the database. In some cases, this will require you to modify the HTML, or client-side code, as JavaScript is often used to insert content from an external location during or after the page has loaded. In other cases you will need to modify the server-side (Php, ASPX) scripts to retrieve the pre-formatted html instead of performing calculations and retrieving the information from external sources.

You should write a set of functions that can be used and reused throughout the scripts and pages on your website. Some examples for a feed cache may be getAllFeeds(), getFeed(ID), updateFeedData(ID), updateAllFeeds(), getFeedSummaryHTML(ID), getFeedHTML(ID) etc.

Other Performance Considerations

Relational Database Design
- If not already applied: You can reduce the amount of data retrieved by database queries by separating different categories or types of data into separate tables. For example, a website which started off with a small number of users who can be assigned to a group and have a permission level assigned may store have a foreign key in the users table which indicates the group and permission. As usage and the number of members of the website increase, the ability to assign users to multiple groups may be required, as well as needing to manage a larger amount of data/information for each user. To store all user, group and permission information in a single "Users" table, would mean that a query to the table which retrieves data from all columns would take a long time to complete, and in most cases would contain data which is irrelevant to the web page or script which required the user data. An alternative approach would be to have a separate table for group information, permissions, basic user information and one for each of the user-group assignments and user-permission assignments. Database queries can then be constructed to retrieve all information if required, but in most cases will retrieve only the data required to complete a specific task. The result generally requires information from specific, but not all tables and can significantly reduce the amount of data retrieved from the database to complete the processing. Another benefit is that when operations are being performed on the database that require rows or tables to be locked, you can be much more granular with the data which is locked allowing other scripts to access / modify data in other tables and rows.

Geographical Targeting
If the majority of users are from a specific country or region, you should consider hosting your website on a server in that country, or by configuring your DNS to point to a mirror hosed in the country closest to each user. This will reduce the amount of time for pages to load for most of your users as the request should not leave the country, or travel far at all.

Images and other Rich Media
Images and other media is often the reason a page takes a long time to load. A page that consists mainly of text will load quickly and much more consistently than a page with lots of images or other media, even when on a slower web server or if the web server is managing a large load. If you do have images on you site, you should try to minimise the size/dimensions of images as much as possible without loosing information. There are also many image formats which offer various methods of compression which are often configurable. You need to use formats that are legal (you have rights to use), that have good compression (small in size), but at an acceptible quality level. Where possible, you should use 1xN, or Nx1 sixed images that can then be stretched horizontally or vertically when used for the background of elements on a web page.

Allow users to choose if their browser will start downloading or buffering rich media sucg as streamin video, as this can consume a large amount of the users' bandwidth, which will reduce the performance of your website of they continue browsing without closing or stopping the video/media.


Related Articles:

1 comment:

  1. nice post, visit my blog to see more tip to optimize website http://programmingdiscussions.blogspot.com

    ReplyDelete