| > | | | | servicing. |
| Title: Clustering Solutions and Zero Downtime Hosting | | | | As such, failover software is an important function |
| Pitfalls | | | | ofmission-critical systems that rely on |
| Author: Godfrey Heron | | | | constantaccessibility. |
| Email: Word Count:1452 | | | | One of the inherent difficulties with failover for Web |
| Copyright: © 2005 by Godfrey Heron | | | | Hosting companies operating on different networksis |
| Article URL: | | | | the limitations imposed by the DNS caching system. |
| Publishing Guidelines: You may publish this article in | | | | As DNS records are passed from the original |
| yournewsletter, on your web site, or in your print | | | | DNSservers (i.e., ns1/ns2.your-domain.com), they |
| publicationprovided you include the resource box at | | | | arecached or stored at several different ISP’s |
| the end. Notificationwould be appreciated but is not | | | | along the way. |
| required. | | | | Which is why it takes a while for a newlyregistered |
| Clustering Solutions and Zero Downtime Hosting | | | | domain name to resolve to its IP address. |
| Pitfalls | | | | Each DNS record has a TTL (time to live) setting |
| There are a number of benchmarks, which we may | | | | assigned. |
| use toevaluate hosting companies. One of these is, | | | | By manipulating this value, it is possible to alter |
| reliability. | | | | howlong that particular IP address/ DNS record |
| Like most things in this life, reliability in web hosting | | | | combo isstored. If your site is on 2 different servers |
| istypically a function of how much we are willing to | | | | with 2different IP addresses, you could set the |
| spend forit. In essence, a cost-effectiveness | | | | time to live’with a value of, say, 2 minutes. |
| equation needs to bedetermined and solved. | | | | The failover software would check server |
| Reliability can be measured in terms of | | | | availabilityby pinging the web server every |
| percentageavailability. Industry personnel will talk of | | | | few minutes todetermine whether it’s IP |
| reliabilityin terms of system availability with three | | | | address is respondingappropriately. (perhaps by |
| (99.9%), four | | | | looking for a particular textstring in a web page). |
| (99.99%) or five nines.(99.999%). | | | | If a failure is detected, then the software would |
| Typically, web-hosting availability exceeding three | | | | pullthe non-working web server IP address out of |
| nineswas the purvue of extremely large companies | | | | the listof IP addresses assigned to the your web |
| with multiplelayers of redundancy built into their | | | | site’s domainname. If/when your web server |
| network and softwaresystems. However technology | | | | IP comes back onlineit would be restored to the list. |
| has now broughthigh-availability theory and | | | | With a TTL setting of 2 minutes, theoretically, |
| cost-effective reality intoalignment. | | | | yourweb site should be down for just 2 minutes, |
| High availability can be achieved by removing, as far | | | | while switching |
| aspossible, any single point/s of failure, or, | | | | DNS information to the other web server. |
| where thisis not altogether possible, minimizing the | | | | The problem with this scenario, is that, while some |
| time spentin a failure situation. | | | | ISP’s caching might respond to such low |
| One of the ways in which small businesses and | | | | figures,other ISP’s may decide to ignore,(to |
| ISP’s canreasonably avoid single point of | | | | save onbandwidth utilization), any TTL’s below |
| failures is by employingserver farm clustering and | | | | a certainvalue, say, 60 minutes. So it is entirely |
| load-balancing solutions. | | | | possiblethat some of your visitors would see your |
| Webopedia defines server farm clustering as follows: | | | | websitesand for others, your site would be down for |
| A server farm is a group of networked servers | | | | 1 houror more, even though one of your servers |
| that arehoused in one location. A server farm | | | | wasoperating perfectly. |
| streamlinesinternal processes by distributing the | | | | Static non interactive web sites are great |
| workload betweenthe individual components of the | | | | candidatesfor server clustering, but the wicket |
| farm and expeditescomputing processes by | | | | becomes a bitsticky for dynamically generated sites. |
| harnessing the power of multipleservers. | | | | Most databaseapplication software in general, |
| The farms rely on load-balancing software | | | | althoughhaving some replication capabilities, are not |
| thataccomplishes such tasks as tracking demand | | | | happywith multiple server master/slave relationships |
| forprocessing power from different machines, | | | | andreal time updating between servers. The issue |
| prioritizingthe tasks and scheduling and rescheduling | | | | canbecome very problematic if your site |
| themdepending on priority and demand that users | | | | requiresfrequent updates. |
| put onthe network. When one server in the farm | | | | Then there is the problem of how to keep your |
| fails,another can step in as a backup. | | | | websitessynchronized. Unix/Linux servers have a built |
| It is important to note, that typically, web | | | | insynchronizing software tool called rsync. You |
| servers,which are load-balanced in such a manner, | | | | canalso automate the synchronizing process by |
| display oneexternal IP address to the public Internet, | | | | setting upa cron job to run periodically. |
| whileusing internal network IP’s to | | | | DNS caching and synchronizing issues can be |
| communicate betweenthe clustered servers and load | | | | soproblematic so as to nullify the advantages of |
| balancer. | | | | serverclustering. For example, a cron job to |
| Now this is indeed fantastic! Not only do you | | | | synchronizeyour servers every few minutes might |
| receiveweb site peak demand scalability with web | | | | very well useup your server capacity. |
| serverclusters, but you also have the built-in | | | | Your customers will also have to contend with |
| high uptime availability component which is | | | | theirdesktop email client software having dual |
| soimportant. | | | | emailaddresses for each email account on eachweb |
| However this is only half of the picture. | | | | server. e.g. , . |
| There are very important cautionary notes to keep | | | | It is important to realize that DNS operates bydefault |
| inmind. | | | | in a round robin manner, so that, if you havethe |
| Where web hosting is concerned, availability | | | | same web site on 2 separate servers, it is verylikely |
| dependson two things: | | | | that server 1 will get 50% of all the webtraffic. |
| 1.Hardware reliability (RAID drives, serverclustering | | | | Now, this is important for a number of reasons,but |
| etc) within the Data Center; | | | | one of the principal reasons to keep this inmind, is |
| 2.High Bandwidth Internet Connectivity to the | | | | that, you will not be able to effectivelykeep a |
| Data Center / Network Operating Center (NOC). | | | | back-up site (as some providers would |
| Now, with all your well thought out server | | | | haveyou believe) which will only be used when the |
| clusteringsolutions, what would be the result, if, (as | | | | primaryserver goes down. For e.g. a site saying |
| hadrecently occurred in a very high profile web | | | | we’resorry our main server is down but you |
| company),a fire in the Network vicinity had caused | | | | may contact usat: |
| the entire | | | | On a final note, hardware based load |
| Data Center to shut down power for hours. Or, | | | | balancingsolutions tend to be quite expensive and |
| abandwidth provider to the NOC had router problems. | | | | alsointroduce a potential single point of failureinto the |
| All your websites would be showing the dreaded | | | | system, the load balancer itself. Thereis a very |
| Page | | | | prominent Data Center that beganoffering load |
| Cannot be Displayed page. | | | | balanced hosting solutions, wherethe load balancer |
| The ideal solution therefore would be to | | | | itself failed on severaloccasions, although the web |
| employclustering solutions with servers in | | | | servers were operatingperfectly. The net effect to |
| entirelydifferent Data Centers with different | | | | the public however,was that the sites were |
| bandwidthproviders. Redundant Data Centers | | | | unavailable. |
| eliminate the NOCitself being a single point of failure. | | | | Reasonable cost effective software based |
| Thisscenario becomes interesting at this point, | | | | solutionsmay be obtained as a service model or by |
| becausethe difficulty of addressing the potential | | | | purchasingthe software yourself. Zoneedit is an |
| problemsnow increase exponentially. | | | | example of aservice model, and Simplefailover is an |
| We now have to deal with DNS caching, the concept | | | | example ofa software based model which maybe |
| offailover, and how static and dynamic | | | | purchased on aserver license basis. |
| webapplications respond to failure events. | | | | In conclusion, at this point in time, there areseveral |
| Failover and Load balancing are frequently | | | | limiting factors to successfully implementinga |
| usedinterchangeably, however they are in fact | | | | true high availability multiple server web |
| quitedifferent. | | | | hostingsystem. Depending on your clientele and the |
| ·Load Balancing refers to physically sharingservers | | | | nature oftheir web sites,this may indeed be a very |
| capacity, so that one server is notoverloaded and | | | | viable alternative. |
| swamped with requests. | | | | For others, simply setting up a server with highquality |
| ·Failover however, is the process that manuallyor | | | | components, redundant RAID hard drives and agood |
| automatically switches a failed server orbandwidth | | | | supply of server spare parts may be the best wayto |
| provider to a standby server ornetwork if the | | | | ensure high availability. |
| primary system fails or istemporarily shut down for | | | | |