Information Technology Problem Solving - The 6 Principles of Scientific Problem Solving

This paper will explain a scientific approach to problemsolved. An infinite-loop was discovered.
solving. Although it is written to address InformationWhat the "Problem Solver" did, was to replicate the
Technology related problems, the concepts mightproblem and at the same time tried to isolate the
also be applicable in other disciplines. The methods,code that caused the problem. In doing so, the
concepts, and techniques described here is nothingcomplex (and time consuming) stored procedure
new, but it is shocking how many "problem solvers"became something fast and simple.
fail to use them. In between I will include someIf the problem is inside an application, create a new
real-life examples.application and try to simulate the problem inside the
Why do problem solvers guess in stead of followingnew application as simple as possible. If the problem
a scientific approach to problem solving? Maybeoccurs when a certain method for a certain control
because it feels quicker? Maybe a lack of experiencegets called, then try to only include this control in the
in efficient problem solving? Or maybe because itempty application and call that method with
feels like hard work to do it scientifically? Maybehard-coded values. If the problem is with embedded
while you keep on guessing and not really solving,SQL inside a C# application, then try to simulate the
you generate more income and add some jobSQL inside of a Database Query tool (like SQL*Plus
security? Or maybe because you violate the firstfor Oracle, Query Analyzer for SQL Server, or use
principle of problem solving: understand the problem.the code in MS Excel via ODBC to the database).
Principle #1. Understand the *real* problem.The moment you can replicate the problem in a
Isn't it obvious that before you can solve, you needsimple way, you are more than 80% on your way to
to understand the problem? Maybe. But, most of thesolve it.
time the solver will start solving without knowing theIf you do not know where in the program the
real problem. What the client or user describe asproblem is, then use DEBUG.
"The Problem" is normally only the symptom! "MyPrinciple #4. Debug.
computer does not want to switch on" is theMost application development tools come standard
symptom. The real problem could be that the wholewith a debugger. Weather it is Macromedia Flash,
building is without power. "Every time I try to add aMicrosoft Dot Net, Delphi, or what ever development
new product, I get an error message" is theenvironment there will be some sort of debugger. If
symptom. Here the real problem could be "Only thethe tool does not come standard with a debugger,
last 2 products I tried to add gave a 'Product alreadythen you can simulate one.
exists' error". Another classic example: "Nothing isThe first thing you want to do with the debugger is
working"...to determine where the problem is. You do this by
You start your investigation by defining the "realadding breakpoints at key areas. Then you run the
problem". This will entail asking questions (andprogram in debug mode and you will know between
sometimes verify them), and doing some basicwhich breakpoints the problem occurred. Drill down
testing. Ask the user questions like "when was theand you will find the spot. Now that you know where
last time it worked successfully?", "How long havethe problem is, you can "conquer it simple"
you been using the system?", "Does it work onAnother nice feature of most debuggers includes the
another PC or another user?", "What is the exactfacility to watch variables, values, parameters, etc. as
error message?" etc. Ask for a screen-print of theyou step through the program. With these values
error if possible. Your basic testing will be to ensureknown at certain steps, you can hard-code them into
the end-to-end equipment is up and running. Checkyour "simplified version" of the program
the user's PC, the network, the Web Server,If a development tool does not support debugging,
Firewalls, the File Server, the Database back-end, etc.then you can simulate it. Put in steps in the program
Best-case you will pint-point the problem already.that outputs variable values and "hello I am here"
Worst-case you can eliminate a lot of areas for themessages either to the screen, to a log file, or to a
cause of the problem.database table. Remember to take them out when
A real life example. The symptom according to thethe problem is resolved... you don't want your file
user: "The system hangs up at random times when Isystem to be cluttered or filled up with log files!
place orders". The environment: The user enters thePrinciple #5. There is a wealth of information on the
order detail on a form in a mainframe application.database back-end that will help to solve a problem.
When all the detail is completed, the user will tab offThe "Problem Solver" was called to help solve a very
the form. The mainframe then sends this detail viatricky problem. A project was migrating system from
communication software to an Oracle Client/Servera mainframe to client-server technology. All went well
system at the plant. The Oracle system will doduring testing, but when the systems went live, all of
capacity planning and either returns an error or ana sudden there were quite a few, and quite random
expected order date back to the mainframe system."General Protection Faults". (The GPF-error was the
This problem is quite serious, because you can loosegeneral error trap in Windows 95 and 98). It was
clients if they try to place orders and the systemtried to simplify the code, debugging was attempted,
does not accept them! To attempt to solve thisbut it was impossible to replicate. In the LAB
problem, people started by investigating: 1) The loadenvironment, the problem would not occur! Debugging
and capacity of the mainframe hardware 2)trace messages to log files indicated that the
Monitoring the network load between the mainframeproblem occurred very randomly. Some users
and the Oracle system 3) Hiring consultants to debugexperienced it more than others, but eventually all
the communication software 4) Debugging the Oracleusers will get them! Interesting problem.
capacity planning system After spending a couple ofThe "Problem Solver" solved this after he started to
months they could not solve the problem.analyze the database back-end. Not sure if it was by
The "Scientific Problem Solver" was called in. It tookchance or because he systematically moved in the
less than a day and the problem was solved! How?right direction because of a scientific approach.
The solver spends the day at the user to see whatThrough tracing what is happening on the back-end
the "real problem" was. It was found that thelevel, it was found that all these applications were
problem only occurs with export orders. Bycreating more-and-more connections to the database.
investigating the capture screen and user actions, itEvery time a user starts a new transaction another
was found that with export orders the last field onconnection was established to the database. The
the form is always left blank and the user did not tabsum-total of the connections were only released
off this field. The system was not hanging, it waitedwhen the application was closed. As the user
for the user to press "tab" another time. Problemnavigated to new windows inside the same
solved. It can be noted that the "Scientific Problemapplication, more and more connections are opened,
Solver" had very limited knowledge of theand after a specific number of connections, the
mainframe, of the order capturing system, of theapplication will have enough and then crash. This was
communication software, and of the Oracle capacitya programming fault in a template that was used by
planning system. And this brings us at Principle#2.all the developers. The solution was to first test if a
Principle #2. Do not be afraid to start the solvingcursor to the database is already open, before
process, even if you do not understand the system.opening it again.
How many times have you heard "I cannot touchHow do you trace on the back-end database what is
that code, because it was developed by someonehappening? The main database providers have GUI
else!", or "I cannot help because I am a HRtools that help you to trace or analyze what queries
Consultant and that is a Finance problem"? If youare fired against the database. It will also show you
washing machine does not want to switch on, youwhen people connect, disconnect, or were unable to
do not need to be an Electrical Engineer, Washingconnect because of security violations. Most
Machine Repair Specialist, Technician, or whateverdatabases also include some system dictionary tables
specialist to do some basic fault finding. Make surethat can be queried to get this information. These
the plug is working. Check the trip-switch, etc. "Itraces can sometimes tell 'n whole story of why
have never seen this error before" should not stopsomething is failing. The query code you retrieve
you from attempting to solve. With the errorfrom the trace can be help to "simplify the search".
message and an Internet Search engine, you can getYou can see from the trace if the program makes
lots of starting points.successful contact with the database. You can see
In every complex system there are a couple of basichow long it takes for a query to execute.
working principles. System A that reads data fromTo add to Principle#2 (do not be afraid to start...);
System B can be horribly complex (maybe ayou can analyze this trace information, even though
Laboratory Spectrometer that reads data from ayou might not know anything about the detail of the
Programmable Logic Computer via an RS-232 port).application.
But, some basics to test for: Does both systemsRemember though that these back-end traces can
have power? Is there an error message in the eventput a strain on the back-end resources. Do not leave
log on one of these systems? Can you "ping" orthem running for unnecessary long.
trace a network packet from the one system to thePrinciple #6. Use fresh eyes.
other? Try a different communication cable. SearchThis is the last principle. Do not spend too much time
the internet for the error message.on the problem before you ask for assistance. The
Once you have established what the problem is, youassistance does not have to be from someone more
need to start solving it. Sometimes the initialsenior than you. The principle is that you need a pair
investigation will point you directly to the solutionof fresh eyes for a fresh perspective and sometimes
(switch the power on; replace the faulty cable, etc).a bit of fresh air by taking a break. The other person
But, sometimes the real problem is complex in itself,will look and then ask a question or two. Sometimes
so the next principle is to solve it simple.it is something very obvious that was missed.
Principle #3. Conquer it simple.Sometimes just by answering the question it makes
Let's start this section with a real-life example. Underyou think in a new directions. Also, if you spend
certain conditions, a stored procedure will hang. Thehours looking at the same piece of code, it is very
stored procedure normally takes about an hour toeasy to start looking over a silly mistake. A lot of
run (when it is not hanging). So, the developer triedfinance balancing problems get solved over a beer. It
to debug. Make some changes and then wait anothercould be a change of scenery, and/or the relaxed
hour or so to see if the problem is solved. Afteratmosphere that will pop out the solution. Maybe it is
some days the developer gave up and the "Problemthe fresh oxygen that went to the brain while
Solver" took over. The "Problem Solver" had to hiswalking to the pub. Maybe it is because the problem
disposal the knowledge under witch conditions thegot discussed with someone else.
stored procedure would hang. So, it was a simpleConclusion
exercise to make a copy of the procedure, and thenAfter reading this paper, the author hope that you
with this copy to strip all unnecessary code. Allwill try these the next time you encounter a problem
parameters were changed with hard-coded values.to solve. Hopefully by applying these six principles you
Bits of code were executed at a time and thewill realize the advantages they bring, rather than to
result-sets were then again hard-coded into the copy"guess" your way to a solution.
of the procedure. Within 3 hours the problem was