The Information Explosion - The Sorcerer Goes to Lunch

The other day I decided it was finally time to clean -simple. We simply put everything in a box and placed
really clean - my desk. As I worked through years ofthat box in a warehouse for however long. Given the
accumulation, I noticed an actual pencil lying forgottenexplosive growth of easily replicable electronic
on an old pad of paper. The pencil was all dusty andinformation, it's much more challenging.
unused, and the pad had yellowed and curled edges.Factor #4: Information is subject to electronic
That got me thinking about how long it's been sincediscovery
pen and paper were used for daily communicationA critical event occurred in December 2006 with the
and record keeping. Now, some of you may brandpassing of The Federal Rules of Civil Procedure
me an old codger for being able to remember that at(FRCP). The FRCP governs procedures for civil suits
all, but, bear in mind that the first usable version ofin United States district (federal) courts. It was
Microsoft Windows, version 3.0, shipped just 18 yearsamended to outline how electronic documents can be
ago. At that time, many pencil pushers resistedused to support litigation proceedings. The
transferring everything to the "computer," but evenamendment also defined how electronic documents
the most active holdouts gave way in a year or two.should be handled to support litigation search and
Oh, some of us might have used an email systemdiscovery.
back then. At Hewlett Packard, I remember sendingEssentially, this means that all information is
emails using their proprietary HP Desk mail systemdiscoverable, which presents a problem. Companies
back in the mid 1980s. However, at the time, theare not only required to keep information for a
majority of office workers were... actual pencilparticular period of time (for regulatory purposes),
pushers.but also are incentivized to get rid of it as soon as
Can you find anyone today who doesn't compulsivelypossible. It simply isn't practical for a company to pay
check their email or scroll through file listings to findan attorney $400/hour to perform discovery across
what they need? OK, John McCain doesn't, but heall of their information.
has "people." I don't know about you, but I don'tThe Challenges of Managing Unbridled Information
have "people." However, I do have my fullyGrowth (the Buckets Multiply Geometrically)
networked system with Internet access, InstantSo - where does this leave us? We have too much
Messaging, stunning graphics, oodles of productivity,information today. Some of this data needs to be
and regular backups. And I love it!protected because it contains sensitive information.
In that eighteen years we experienced the worldSome of it needs to be retained for certain periods
changing faster, and more completely than it everof time due to regulatory constraints, and it's all
had. The Internet (once the exclusive domain of thediscoverable. We are creating new information at an
nerdy-ist of nerds - cosmological physicists) becamealarming pace, and like with the sorcerer's apprentice
everyone's instant window to the world. Easy-to-usebucket brigade, it's frighteningly out of control.
authoring tools allowed everyone to be productive.I attended last month's ARMA (Association of
It's as if the sorcerer's apprentice were replicatingRecords Managers and Administrators) conference in
legions of pencils instead of brooms to generate aLas Vegas to get more perspective on the
catastrophic data deluge. Today, corporations areinformation conundrum. After all, records managers
literally drowning in it.have had to deal with the management of
Back in 2006, IDC conducted an exhaustive studyinformation for many years, initially in physical form
(Source: The Expanding Digital Universe, IDC, Marchand more recently in electronic form. Their mantra is
2007) and forecasted, between 2006 through 2010,simple. They need to know what they have and
a 57% growth rate year over year in the amount ofwhere they have it. They need to make certain only
information created, captured and replicated.the right people have access to the information.
So, where is all this information coming from and whyThey need to know what to keep, and they need to
aren't companies able to deal with it? Well, it comeskeep it as long as they have to. They need to get
from everyone, and it's a problem because most ofrid of everything else. It's a simple matter of setting
it is unstructured. Most people aren't aware of this,up policies across the enterprise and enforcing them.
but The Enterprise Strategy Group estimates thatIt sounds so simple. But is it? Do we know what we
between 80-85% of all business data is unstructuredhave? Do we know where we have it?
(Source: Extending Discovery to All CorporateUnfortunately, it is easier said than done. The
Information, Enterprise Strategy Group, Decemberinformation we create is vast. It is stored in
2007).heterogeneous formats throughout the world.
What is unstructured data? It consists of emails,I spoke with one Records Manager of a mid-sized
reports, all user files (documents, spreadsheets,company who told me, "Yes, I know what we have.
PPTs, PDFs), images, video, HTML/XML, MP3, etc. ItI have two file shares in Des Moines with my finance,
varies in importance, too. The average user will savemarketing and sales files. I have a user share in our
pictures of their children, emails about what a goodcorporate office with personal files. At corporate, I
job they are doing, CYA "email trails," work-relatedalso have my web farm. I have an Exchange Server,
spreadsheets, thick Word documents, etc.one Personnel Database, one Accounting Database,
In the book, "Tapping into unstructured data:and one Documentum System. Oh, and twelve
Integrating unstructured data and structural analyticsSharePoint sites."
into business intelligence" (Bill Inmon and AnthonyHer problem is typical. With information stored
Nesavich, Prentice Hall, 2008), the authors describeeverywhere, how can she manage it across
the various types of unstructured data created byheterogeneous systems? How can she set up
the typical departments in a corporation. Theseconsistent policies for ensuring the right access? How
include: Accounting, Call Centers, Engineering, Finance,can she ensure that the right data is retained? How
Human Resources, Legal, Marketing, Sales, Shippingcan she ensure that she gets rid of what she does
and Operations. That means everyone is contributingnot need?
to the challenge while they look to the data centerBasically, she not able to address these problems,
to control it.which could place her company out of compliance.
The Challenges of Unbridled Information GrowthThis brings a risk of being heavily fined. The problem,
Let's take a look at some of the major challenges inof course, is even worse for larger companies who
dealing with this unbridled growth of information.literally have Petabytes of information stored
Factor #1: Information must be storedeverywhere.
The more data we generate, the more storage isOne Certified Records Manager I spoke with likes to
required. This storage need opened up tremendouscategorize information as follows:
opportunities for storage vendors as customersType of Information and Where it Exists
sought to purchase more and more equipment. The- Unstructured - Data File Shares, Desktops, Laptops
storage industry introduced the moniker Information- Enterprise Content Management System -
Lifecycle Management to provide more costSharePoint, FileNet, Documentum
effective ways to deal with this growth. They also- Messaging - Email, Voice Mail, IM, etc.
introduced the concept of tiered storage to allow- Databases - Human Resources, Order Processing,
companies to better manage it along variousetc.
dimensions: price, performance, capacity and function.He explained, "The problem is that a single, universal
Initially, the storage cost factor was the biggestsystem for managing information does not exist." I
impact on corporations of this growth. However, asvisited vendors, both big and small, and confirmed
storage cost quickly declined, its importance becamewhat every Records Manager has known for quite
dwarfed by other factors.some time. That being able to effectively manage
Factor #2: Information can be sensitive and needs toinformation according to the Records Management
be protectedmantra is truly a Herculean task.
As companies created more and more information,Managing Information in a Cloud (the Sorcerer
the importance of protecting that information andReturns from Lunch)
ensuring the proper access level became moreThere is a sorcerer who can clean up, organize, and
apparent. While it sounds easy (i.e. making sure thecontrol the deluge of information, and its name is
right people have access to the right information), it'sClassification Management. Only when information is
not so easy to actually do, and the costs of notclassified, can it be effectively managed to address
securing data can be astounding. Examples are:the concerns of sensitivity, retention and destruction.
- Hefty fines under PCI, SOX and HIPAA forThe magic wand of this Classification and
breaches and noncomplianceManagement sorcerer is a sophisticated Policy Engine.
- Bad PR and damage to the corporate brand due toIt's sophisticated because it can support different
the need to publicly disclose privacy breachesPolicies for different information sources (databases,
- Outright IP theft where trade secrets andemail systems, etc.). It can support different Policies
proprietary information could fall into the hands of afor different regional regulations, and it's flexible
competitor and materially damage the company'senough to deal not only with today's regulations, but
business prospectsalso with future ones.
Factor #3: Information must be preserved forScalability is the other side of this magic wand
regulatory reasonsbecause, classification alone is not sufficient to deal
Every company is governed by a set of regulationswith the scale of the information in the average
that that determine the length of time thatenterprise. It is geographically distributed across
information must be stored. There are a slew ofheterogeneous systems, and a centralized
regulations that govern information retention. Theinformation management scheme will not and cannot
more familiar of these include:scale.
- Health Insurance Portability and Accountability ActBecause of this, we see Enterprise Information
(HIPAA) of 1996Management as being the first enterprise application
- Sarbanes-Oxley Act of 2002that requires a form of cloud computing. This allows
- SEC Rule 17a-3, a-4all information everywhere to be managed.
There are countless more. Some industries (e.g.What's Next
Pharmaceutical, Finance, etc.) are more regulated thanAs we proceed into 2009, only a couple things are
others. And, of course, with the recent Credit Crisis,certain: companies will create more information;
we expect the number of regulations to skyrocket ingovernment will create more regulation, and the
the coming years.sorcerer will have more and more data to manage.
In the good old days, retaining this information was