Server Hardware Testing and Burn-in - Detailed Stress Testing and Fault Detection on New Hardware

Go on, admit it, you've thought about it yourself.3214235409234472020393848453 is prime?), and
Wouldn't it be satisfying to set your computer alight?runs some code specifically written to run the CPU at
Sadly, that is not what this article is about. Burning Inits hottest.
is the term used to describe the process of testingHard Drive
new managed server hardware for faults beforeCerberus writes large volumes of data to the hard
putting it to use in a live environment. This is done bydrives over and over again to ensure that the drive
running 'Stress testing' software for some period ofplatters are functional, and it will also delete and
time.move files, and check the disks for errors.
Whenever we get new server hardware, we alwaysIf after a week the server is still running (not
do a complete burn in to ensure that the serversmoking) and hasn't crashed, it is considered good
hardware is up to our high standards. If the hardwareenough for use as a production machine. If it fails the
fails at any point, we send it back to the supplier.tests anywhere along the way, it is packed up and
The actual process is easy, although setting it up isn't.returned to be replaced. Web servers that have
Memorysurvived this process will certainly survive anything
First, when the new server is turned on, we boot offyou can through at them.
of the network, which allows us to boot multipleYou would normally expect that this level of testing
machines at once without needing 20+ bootablewould be completed by the hardware manufacturers
disks. The first test run is the well known Memtest,and so these test shouldn't show up any faults. In
you'll find it in Google, this thoroughly checks theour experience testing hundreds of machines we do
computers memory, and runs for about 1 day.regularly find faults, and we do send components
If the computer passes the Memtest, it is restartedback.
and booted into a custom Red Hat kickstart installThe reason it is so important to perform this level of
that will install a bare Red Hat environment, andtesting on computers that will be used as servers is
Cerberus Test Control System, special software thatthat the uptime demands are so high. The slightest
runs numerous tests on all the hardware in thefaults will cause outages and downtime. Once a web
system.server is deployed, never again will you have the
CPUopportunity to take it offline and perform such
Cerberus performs several tasks to test the CPU. Itdetailed testing. Even if it were to crash, there is
compiles the Linux kernel over and over again, runsalways a demand that it be put back online as quickly
complicated mathematical problems (how long does itas possible, not left offline whilst thorough diagnostics
take you to work out ifare completed.