Clustering Solutions and Zero Downtime Hosting Pitfalls

>servicing.
Title: Clustering Solutions and Zero Downtime HostingAs such, failover software is an important function
Pitfallsofmission-critical systems that rely on
Author: Godfrey Heronconstantaccessibility.
Email: Word Count:1452One of the inherent difficulties with failover for Web
Copyright: © 2005 by Godfrey HeronHosting companies operating on different networksis
Article URL:the limitations imposed by the DNS caching system.
Publishing Guidelines: You may publish this article inAs DNS records are passed from the original
yournewsletter, on your web site, or in your printDNSservers (i.e., ns1/ns2.your-domain.com), they
publicationprovided you include the resource box atarecached or stored at several different ISP’s
the end. Notificationwould be appreciated but is notalong the way.
required.Which is why it takes a while for a newlyregistered
Clustering Solutions and Zero Downtime Hostingdomain name to resolve to its IP address.
PitfallsEach DNS record has a TTL (time to live) setting
There are a number of benchmarks, which we mayassigned.
use toevaluate hosting companies. One of these is,By manipulating this value, it is possible to alter
reliability.howlong that particular IP address/ DNS record
Like most things in this life, reliability in web hostingcombo isstored. If your site is on 2 different servers
istypically a function of how much we are willing towith 2different IP addresses, you could set the
spend forit. In essence, a “cost-effectiveness”‘time to live’with a value of, say, 2 minutes.
equation needs to bedetermined and solved.The failover software would check server
Reliability can be measured in terms ofavailabilityby “pinging” the web server every
percentageavailability. Industry personnel will talk offew minutes todetermine whether it’s IP
reliabilityin terms of system availability with threeaddress is respondingappropriately. (perhaps by
(99.9%), fourlooking for a particular textstring in a web page).
(99.99%) or five nines.(99.999%).If a failure is detected, then the software would
Typically, web-hosting availability exceeding threepullthe non-working web server IP address out of
nineswas the purvue of extremely large companiesthe listof IP addresses assigned to the your web
with multiplelayers of redundancy built into theirsite’s domainname. If/when your web server
network and softwaresystems. However technologyIP comes back onlineit would be restored to the list.
has now broughthigh-availability theory andWith a TTL setting of 2 minutes, theoretically,
cost-effective reality intoalignment.yourweb site should be down for just 2 minutes,
High availability can be achieved by removing, as farwhile switching
aspossible, any “single point/s of failure”, or,DNS information to the other web server.
where thisis not altogether possible, minimizing theThe problem with this scenario, is that, while some
time spentin a “failure” situation.ISP’s caching might respond to such low
One of the ways in which small businesses andfigures,other ISP’s may decide to ignore,(to
ISP’s canreasonably avoid single point ofsave onbandwidth utilization), any TTL’s below
failures is by employingserver farm clustering anda certainvalue, say, 60 minutes. So it is entirely
load-balancing solutions.possiblethat some of your visitors would see your
Webopedia defines server farm clustering as follows:websitesand for others, your site would be down for
“A server farm is a group of networked servers1 houror more, even though one of your servers
that arehoused in one location. A server farmwasoperating perfectly.
streamlinesinternal processes by distributing theStatic non interactive web sites are great
workload betweenthe individual components of thecandidatesfor server clustering, but the wicket
farm and expeditescomputing processes bybecomes a bitsticky for dynamically generated sites.
harnessing the power of multipleservers.Most databaseapplication software in general,
The farms rely on load-balancing softwarealthoughhaving some replication capabilities, are not
thataccomplishes such tasks as tracking demandhappywith multiple server master/slave relationships
forprocessing power from different machines,andreal time updating between servers. The issue
prioritizingthe tasks and scheduling and reschedulingcanbecome very problematic if your site
themdepending on priority and demand that usersrequiresfrequent updates.
put onthe network. When one server in the farmThen there is the problem of how to keep your
fails,another can step in as a backup.”websitessynchronized. Unix/Linux servers have a built
It is important to note, that typically, webinsynchronizing software tool called rsync. You
servers,which are load-balanced in such a manner,canalso automate the synchronizing process by
display oneexternal IP address to the public Internet,setting upa cron job to run periodically.
whileusing internal network IP’s toDNS caching and synchronizing issues can be
communicate betweenthe clustered servers and loadsoproblematic so as to nullify the advantages of
balancer.serverclustering. For example, a cron job to
Now this is indeed fantastic! Not only do yousynchronizeyour servers every few minutes might
receiveweb site peak demand scalability with webvery well useup your server capacity.
serverclusters, but you also have the built-inYour customers will also have to contend with
“high uptime availability” component which istheirdesktop email client software having dual
soimportant.emailaddresses for each email account on eachweb
However this is only half of the picture.server. e.g. , .
There are very important cautionary notes to keepIt is important to realize that DNS operates bydefault
inmind.in a round robin manner, so that, if you havethe
Where web hosting is concerned, availabilitysame web site on 2 separate servers, it is verylikely
dependson two things:that server 1 will get 50% of all the webtraffic.
1.Hardware reliability (RAID drives, serverclusteringNow, this is important for a number of reasons,but
etc) within the Data Center;one of the principal reasons to keep this inmind, is
2.High Bandwidth Internet Connectivity to thethat, you will not be able to effectivelykeep a
Data Center / Network Operating Center (NOC).”back-up” site (as some providers would
Now, with all your well thought out serverhaveyou believe) which will only be used when the
clusteringsolutions, what would be the result, if, (asprimaryserver goes down. For e.g. a site saying”
hadrecently occurred in a very high profile webwe’resorry our main server is down but you
company),a fire in the Network vicinity had causedmay contact usat:
the entireOn a final note, hardware based load
Data Center to shut down power for hours. Or,balancingsolutions tend to be quite expensive and
abandwidth provider to the NOC had router problems.alsointroduce a potential single point of failureinto the
All your websites would be showing the dreadedsystem, the load balancer itself. Thereis a very
“Pageprominent Data Center that beganoffering load
Cannot be Displayed” page.balanced hosting solutions, wherethe load balancer
The ideal solution therefore would be toitself failed on severaloccasions, although the web
employclustering solutions with servers inservers were operatingperfectly. The net effect to
entirelydifferent Data Centers with differentthe public however,was that the sites were
bandwidthproviders. Redundant Data Centersunavailable.
eliminate the NOCitself being a single point of failure.Reasonable cost effective software based
Thisscenario becomes interesting at this point,solutionsmay be obtained as a service model or by
becausethe difficulty of addressing the potentialpurchasingthe software yourself. Zoneedit is an
problemsnow increase exponentially.example of aservice model, and Simplefailover is an
We now have to deal with DNS caching, the conceptexample ofa software based model which maybe
offailover, and how static and dynamicpurchased on aserver license basis.
webapplications respond to failure events.In conclusion, at this point in time, there areseveral
Failover and Load balancing are frequentlylimiting factors to successfully implementinga
usedinterchangeably, however they are in fact“true” high availability multiple server web
quitedifferent.hostingsystem. Depending on your clientele and the
·Load Balancing refers to physically sharingserversnature oftheir web sites,this may indeed be a very
capacity, so that one server is notoverloaded andviable alternative.
swamped with requests.For others, simply setting up a server with highquality
·Failover however, is the process that manuallyorcomponents, redundant RAID hard drives and agood
automatically switches a failed server orbandwidthsupply of server spare parts may be the best wayto
provider to a standby server ornetwork if theensure high availability.
primary system fails or istemporarily shut down for