| The future of software development is about good | | | | uniform hash is used to evenly allocate data among |
| craftsmen. With infrastructure like Amazon Web | | | | computers in a cloud database. A flavor of this |
| Services and an abundance of basic libraries, it no | | | | technique is part of Google's indexing service; each |
| longer takes a village to build a good piece of | | | | URL is hashed to particular computer. Memcached |
| software. | | | | similarly uses a hash function.Hash functions can be |
| These days, a couple of engineers who know what | | | | complex and sophisticated, but modern libraries have |
| they are doing can deliver complete systems. In this | | | | good defaults. The important thing is how hashes |
| post, we discuss the top 10 concepts software | | | | work and how to tune them for maximum |
| engineers should know to achieve that. | | | | performance benefit. |
| A successful software engineer knows and uses | | | | 6.CACHING: |
| design patterns, actively refactors code, writes unit | | | | No modern web system runs without a cache, which |
| tests and religiously seeks simplicity. Beyond the basic | | | | is an in-memory store that holds a subset of |
| methods, there are concepts that good software | | | | information typically stored in the database. The need |
| engineers know about. These transcend programming | | | | for cache comes from the fact that generating |
| languages and projects - they are not design | | | | results based on the database is costly. For example, |
| patterns, but rather broad areas that you need to be | | | | if you have a website that lists books that were |
| familiar with. The top 10 concepts are: | | | | popular last week, you'd want to compute this |
| 1. Interfaces | | | | information once and place it into cache. User |
| 2. Conventions and Templates | | | | requests fetch data from the cache instead of hitting |
| 3. Layering | | | | the database and regenerating the same information. |
| 4. Algorithmic Complexity | | | | Caching comes with a cost. Only some subsets of |
| 5. Hashing | | | | information can be stored in memory. The most |
| 6. Caching | | | | common data pruning strategy is to evict items that |
| 7. Concurrency | | | | are least recently used (LRU). The prunning needs to |
| 8. Cloud Computing | | | | be efficient, not to slow down the application.A lot of |
| 9. Security | | | | modern web applications, including Facebook, rely on |
| 10. Relational Databases | | | | a distributed caching system called Memcached, |
| 1.INTERFACE: | | | | developed by Brad Firzpatrick when working on |
| The most important concept in software is interface. | | | | LiveJournal. The idea was to create a caching system |
| Any good software is a model of a real (or | | | | that utilises spare memory capacity on the network. |
| imaginary) system. Understanding how to model the | | | | Today, there are Memcached libraries for many |
| problem in terms of correct and simple interfaces is | | | | popular languages, including Java and PHP. |
| crucial. Lots of systems suffer from the extremes: | | | | 7.CONCURRENCY: |
| clumped, lengthy code with little abstractions, or an | | | | Concurrency is one topic engineers notoriously get |
| overly designed system with unnecessary complexity | | | | wrong, and understandibly so, because the brain does |
| and unused code. | | | | juggle many things at a time and in schools linear |
| Among the many books, Agile Programming by Dr | | | | thinking is emphasized. Yet concurrency is important |
| Robert Martin stands out because of focus on | | | | in any modern system.Concurrency is about |
| modeling correct interfaces. | | | | parallelism, but inside the application. Most modern |
| 2.CONVENTIONS AND TEMPLATES: | | | | languages have an in-built concept of concurrency; in |
| Naming conventions and basic templates are the | | | | Java, it's implemented using Threads. |
| most overlooked software patterns, yet probably | | | | A classic concurrency example is the producer |
| the most powerful. | | | | consumer, where the producer generates data or |
| Naming conventions enable software automation. For | | | | tasks, and places it for worker threads to consume |
| example, Java Beans framework is based on a simple | | | | and execute. The complexity in concurrency |
| naming convention for getters and setters. And | | | | programming stems from the fact Threads often |
| canonical URLs in del.icio.us: del.icio.us/tag/software | | | | needs to operate on the common data. Each Thread |
| take the user to the page that has all items tagged | | | | has its own sequence of execution, but accesses |
| software. | | | | common data. One of the most sophisticated |
| Many social software utilise naming conventions in a | | | | concurrency libraries has been developed by Doug |
| similar way. For example, if your user name is | | | | Lea and is now part of core Java. |
| johnsmith then likely your avatar is johnsmith.jpg and | | | | 8.CLOUD COMPUTING: |
| your rss feed is johnsmith.xml. | | | | In our recent post Reaching For The Sky Through |
| Naming conventions are also used in testing, for | | | | Compute Clouds we talked about how commodity |
| example JUnit automatically recognizes all the | | | | cloud computing is changing the way we deliver |
| methods in the class that start with prefix test.The | | | | large-scale web applications. Massively parallel, cheap |
| templates are not C++ or Java language constructs. | | | | cloud computing reduces both costs and time to |
| We're talking about template files that contain | | | | market.Cloud computing grew out of parallel |
| variables and then allow binding of objects, resolution, | | | | computing, a concept that many problems can be |
| and rendering the result for the client. | | | | solved faster by running the computations in parallel. |
| 3.LAYERING: | | | | After parallel algorithms came grid computing, which |
| Layering is probably the simplest way to discuss | | | | ran parallel computations on idle desktops. One of the |
| software architecture. It first got serious attention | | | | first examples was SETI@home project out of |
| when John Lakos published his book about | | | | Berkley, which used spare CPU cycles to crunch data |
| Large-scale C++ systems. Lakos argued that | | | | coming from space. Grid computing is widely adopted |
| software consists of layers. The book introduced the | | | | by financial companies, which run massive risk |
| concept of layering. The method is this. For each | | | | calculations. The concept of under-utilized resources, |
| software component, count the number of other | | | | together with the rise of J2EE platform, gave rise to |
| components it relies on. That is the metric of how | | | | the precursor of cloud computing: application server |
| complex the component is. | | | | virtualization. The idea was to run applications on |
| Lakos contended a good software follows the shape | | | | demand and change what is available depending on |
| of a pyramid; i.e., there's a progressive increase in the | | | | the time of day and user activity. |
| cumulative complexity of each component, but not in | | | | Today's most vivid example of cloud computing is |
| the immediate complexity. Put differently, a good | | | | Amazon Web Services, a package available via API. |
| software system consists of small, reusable building | | | | Amazon's offering includes a cloud service (EC2), a |
| blocks, each carrying its own responsibility. In a good | | | | database for storing and serving large media files |
| system, no cyclic dependencies between | | | | (S3), an indexing service (SimpleDB), and the Queue |
| components are present and the whole system is a | | | | service (SQS). These first blocks already empower |
| stack of layers of functionality, forming a pyramid. | | | | an unprecedented way of doing large-scale |
| Lakos's work was a precursor to many | | | | computing, and surely the best is yet to come. |
| developments in software engineering, most notably | | | | 9.SECURITY: |
| Refactoring. The idea behind refactoring is | | | | With the rise of hacking and data sensitivity, the |
| continuously sculpting the software to ensure it'is | | | | security is paramount. Security is a broad topic that |
| structurally sound and flexible. Another major | | | | includes authentication, authorization, and information |
| contribution was by Dr Robert Martin from Object | | | | transmission.Authentication is about verifying user |
| Mentor, who wrote about dependencies and acyclic | | | | identity. A typical website prompts for a password. |
| architectures | | | | The authentication typically happens over SSL |
| Among tools that help engineers deal with system | | | | (secure socket layer), a way to transmit encrypted |
| architecture are Structure 101 developed by | | | | information over HTTP. Authorization is about |
| Headway software, and SA4J developed by my | | | | permissions and is important in corporate systems, |
| former company, Information Laboratory, and now | | | | particularly those that define workflows. The recently |
| available from IBM. | | | | developed OAuth protocol helps web services to |
| 4.ALGORITHMIC COMPLEXITY: | | | | enable users to open access to their private |
| There are just a handful of things engineers must | | | | information. This is how Flickr permits access to |
| know about algorithmic complexity. First is big O | | | | individual photos or data sets. |
| notation. If something takes O(n) it's linear in the size | | | | Another security area is network protection. This |
| of data. O(n^2) is quadratic. Using this notation, you | | | | concerns operating systems, configuration and |
| should know that search through a list is O(n) and | | | | monitoring to thwart hackers. Not only network is |
| binary search (through a sorted list) is log(n). And | | | | vulnerable, any piece of software is. Firefox browser, |
| sorting of n items would take n*log(n) time. | | | | marketed as the most secure, has to patch the code |
| Your code should (almost) never have multiple | | | | continuously. To write secure code for your system |
| nested loops (a loop inside a loop inside a loop). Most | | | | requires understanding specifics and potential |
| of the code written today should use Hashtables, | | | | problems. |
| simple lists and singly nested loops.Due to abundance | | | | 10.RELATIONAL DATABASES: |
| of excellent libraries, we are not as focused on | | | | Relational Databases have recently been getting a |
| efficiency these days. That's fine, as tuning can | | | | bad name because they cannot scale well to support |
| happen later on, after you get the design | | | | massive web services. Yet this was one of the most |
| right.Elegant algorithms and performance is something | | | | fundamental achievements in computing that has |
| you shouldn't ignore. Writing compact and readable | | | | carried us for two decades and will remain for a long |
| code helps ensure your algorithms are clean and | | | | time. Relational databases are excellent for order |
| simple. | | | | management systems, corporate databases and |
| 5.HASHING: | | | | P&L data. |
| The idea behind hashing is fast access to data. If the | | | | At the core of the relational database is the concept |
| data is stored sequentially, the time to find the item | | | | of representing information in records. Each record is |
| is proportional to the size of the list. For each | | | | added to a table, which defines the type of |
| element, a hash function calculates a number, which is | | | | information. The database offers a way to search |
| used as an index into the table. Given a good hash | | | | the records using a query language, nowadays SQL. |
| function that uniformly spreads data along the table, | | | | The database offers a way to correlate information |
| the look-up time is constant. Perfecting hashing is | | | | from multiple tables.The technique of data |
| difficult and to deal with that hashtable | | | | normalization is about correct ways of partitioning the |
| implementations support collision resolution. | | | | data among tables to minimize data redundancy and |
| Beyond the basic storage of data, hashes are also | | | | maximize the speed of retrieval. |
| important in distributed systems. The so-called | | | | |