Top 10 Concepts That Every Software Engineer Should Know

The future of software development is about gooduniform hash is used to evenly allocate data among
craftsmen. With infrastructure like Amazon Webcomputers in a cloud database. A flavor of this
Services and an abundance of basic libraries, it notechnique is part of Google's indexing service; each
longer takes a village to build a good piece ofURL is hashed to particular computer. Memcached
software.similarly uses a hash function.Hash functions can be
These days, a couple of engineers who know whatcomplex and sophisticated, but modern libraries have
they are doing can deliver complete systems. In thisgood defaults. The important thing is how hashes
post, we discuss the top 10 concepts softwarework and how to tune them for maximum
engineers should know to achieve that.performance benefit.
A successful software engineer knows and uses6.CACHING:
design patterns, actively refactors code, writes unitNo modern web system runs without a cache, which
tests and religiously seeks simplicity. Beyond the basicis an in-memory store that holds a subset of
methods, there are concepts that good softwareinformation typically stored in the database. The need
engineers know about. These transcend programmingfor cache comes from the fact that generating
languages and projects - they are not designresults based on the database is costly. For example,
patterns, but rather broad areas that you need to beif you have a website that lists books that were
familiar with. The top 10 concepts are:popular last week, you'd want to compute this
1. Interfacesinformation once and place it into cache. User
2. Conventions and Templatesrequests fetch data from the cache instead of hitting
3. Layeringthe database and regenerating the same information.
4. Algorithmic ComplexityCaching comes with a cost. Only some subsets of
5. Hashinginformation can be stored in memory. The most
6. Cachingcommon data pruning strategy is to evict items that
7. Concurrencyare least recently used (LRU). The prunning needs to
8. Cloud Computingbe efficient, not to slow down the application.A lot of
9. Securitymodern web applications, including Facebook, rely on
10. Relational Databasesa distributed caching system called Memcached,
1.INTERFACE:developed by Brad Firzpatrick when working on
The most important concept in software is interface.LiveJournal. The idea was to create a caching system
Any good software is a model of a real (orthat utilises spare memory capacity on the network.
imaginary) system. Understanding how to model theToday, there are Memcached libraries for many
problem in terms of correct and simple interfaces ispopular languages, including Java and PHP.
crucial. Lots of systems suffer from the extremes:7.CONCURRENCY:
clumped, lengthy code with little abstractions, or anConcurrency is one topic engineers notoriously get
overly designed system with unnecessary complexitywrong, and understandibly so, because the brain does
and unused code.juggle many things at a time and in schools linear
Among the many books, Agile Programming by Drthinking is emphasized. Yet concurrency is important
Robert Martin stands out because of focus onin any modern system.Concurrency is about
modeling correct interfaces.parallelism, but inside the application. Most modern
2.CONVENTIONS AND TEMPLATES:languages have an in-built concept of concurrency; in
Naming conventions and basic templates are theJava, it's implemented using Threads.
most overlooked software patterns, yet probablyA classic concurrency example is the producer
the most powerful.consumer, where the producer generates data or
Naming conventions enable software automation. Fortasks, and places it for worker threads to consume
example, Java Beans framework is based on a simpleand execute. The complexity in concurrency
naming convention for getters and setters. Andprogramming stems from the fact Threads often
canonical URLs in del.icio.us: del.icio.us/tag/softwareneeds to operate on the common data. Each Thread
take the user to the page that has all items taggedhas its own sequence of execution, but accesses
software.common data. One of the most sophisticated
Many social software utilise naming conventions in aconcurrency libraries has been developed by Doug
similar way. For example, if your user name isLea and is now part of core Java.
johnsmith then likely your avatar is johnsmith.jpg and8.CLOUD COMPUTING:
your rss feed is johnsmith.xml.In our recent post Reaching For The Sky Through
Naming conventions are also used in testing, forCompute Clouds we talked about how commodity
example JUnit automatically recognizes all thecloud computing is changing the way we deliver
methods in the class that start with prefix test.Thelarge-scale web applications. Massively parallel, cheap
templates are not C++ or Java language constructs.cloud computing reduces both costs and time to
We're talking about template files that containmarket.Cloud computing grew out of parallel
variables and then allow binding of objects, resolution,computing, a concept that many problems can be
and rendering the result for the client.solved faster by running the computations in parallel.
3.LAYERING:After parallel algorithms came grid computing, which
Layering is probably the simplest way to discussran parallel computations on idle desktops. One of the
software architecture. It first got serious attentionfirst examples was SETI@home project out of
when John Lakos published his book aboutBerkley, which used spare CPU cycles to crunch data
Large-scale C++ systems. Lakos argued thatcoming from space. Grid computing is widely adopted
software consists of layers. The book introduced theby financial companies, which run massive risk
concept of layering. The method is this. For eachcalculations. The concept of under-utilized resources,
software component, count the number of othertogether with the rise of J2EE platform, gave rise to
components it relies on. That is the metric of howthe precursor of cloud computing: application server
complex the component is.virtualization. The idea was to run applications on
Lakos contended a good software follows the shapedemand and change what is available depending on
of a pyramid; i.e., there's a progressive increase in thethe time of day and user activity.
cumulative complexity of each component, but not inToday's most vivid example of cloud computing is
the immediate complexity. Put differently, a goodAmazon Web Services, a package available via API.
software system consists of small, reusable buildingAmazon's offering includes a cloud service (EC2), a
blocks, each carrying its own responsibility. In a gooddatabase for storing and serving large media files
system, no cyclic dependencies between(S3), an indexing service (SimpleDB), and the Queue
components are present and the whole system is aservice (SQS). These first blocks already empower
stack of layers of functionality, forming a pyramid.an unprecedented way of doing large-scale
Lakos's work was a precursor to manycomputing, and surely the best is yet to come.
developments in software engineering, most notably9.SECURITY:
Refactoring. The idea behind refactoring isWith the rise of hacking and data sensitivity, the
continuously sculpting the software to ensure it'issecurity is paramount. Security is a broad topic that
structurally sound and flexible. Another majorincludes authentication, authorization, and information
contribution was by Dr Robert Martin from Objecttransmission.Authentication is about verifying user
Mentor, who wrote about dependencies and acyclicidentity. A typical website prompts for a password.
architecturesThe authentication typically happens over SSL
Among tools that help engineers deal with system(secure socket layer), a way to transmit encrypted
architecture are Structure 101 developed byinformation over HTTP. Authorization is about
Headway software, and SA4J developed by mypermissions and is important in corporate systems,
former company, Information Laboratory, and nowparticularly those that define workflows. The recently
available from IBM.developed OAuth protocol helps web services to
4.ALGORITHMIC COMPLEXITY:enable users to open access to their private
There are just a handful of things engineers mustinformation. This is how Flickr permits access to
know about algorithmic complexity. First is big Oindividual photos or data sets.
notation. If something takes O(n) it's linear in the sizeAnother security area is network protection. This
of data. O(n^2) is quadratic. Using this notation, youconcerns operating systems, configuration and
should know that search through a list is O(n) andmonitoring to thwart hackers. Not only network is
binary search (through a sorted list) is log(n). Andvulnerable, any piece of software is. Firefox browser,
sorting of n items would take n*log(n) time.marketed as the most secure, has to patch the code
Your code should (almost) never have multiplecontinuously. To write secure code for your system
nested loops (a loop inside a loop inside a loop). Mostrequires understanding specifics and potential
of the code written today should use Hashtables,problems.
simple lists and singly nested loops.Due to abundance10.RELATIONAL DATABASES:
of excellent libraries, we are not as focused onRelational Databases have recently been getting a
efficiency these days. That's fine, as tuning canbad name because they cannot scale well to support
happen later on, after you get the designmassive web services. Yet this was one of the most
right.Elegant algorithms and performance is somethingfundamental achievements in computing that has
you shouldn't ignore. Writing compact and readablecarried us for two decades and will remain for a long
code helps ensure your algorithms are clean andtime. Relational databases are excellent for order
simple.management systems, corporate databases and
5.HASHING:P&L data.
The idea behind hashing is fast access to data. If theAt the core of the relational database is the concept
data is stored sequentially, the time to find the itemof representing information in records. Each record is
is proportional to the size of the list. For eachadded to a table, which defines the type of
element, a hash function calculates a number, which isinformation. The database offers a way to search
used as an index into the table. Given a good hashthe records using a query language, nowadays SQL.
function that uniformly spreads data along the table,The database offers a way to correlate information
the look-up time is constant. Perfecting hashing isfrom multiple tables.The technique of data
difficult and to deal with that hashtablenormalization is about correct ways of partitioning the
implementations support collision resolution.data among tables to minimize data redundancy and
Beyond the basic storage of data, hashes are alsomaximize the speed of retrieval.
important in distributed systems. The so-called