Sunday, July 24, 2011

Why Software Architecture Fails

I have worked on very old software systems. Some are over twenty years old. Many software engineers I know work on older systems. What happens is subsystems get replaced over times by new subsystems, fundamentally changing the architecture. Sometimes when the architectural decay is so great, the entire system must be replaced.

Not so in this system I worked on. It is a work horse, as complex as it is now, with hundreds of engineers contributing to it.

There was one piece of this system I remember well. It was a communications system that allowed subsystems to exchange requests and data with ease. It has had very little change over the many years it has existed. The architecture stood the test of time. Even new users praised its simplicity. We added over the years many new subsystems and they all use this communications system.

But this is a rarity in the software world. George Fairbanks, in his book Just Enough Architecture, discusses a software system that manages hosted e-mail servers. Customers call support for help and support must search log files to see what happened so they can help the customer. The first version was simple and a grep style search was performed on log files on each of the servers. But as the system grew these searches took longer, slowed down the servers, and engineers were required to do the search. This architecture had failed. So a new architecture for logs was needed. The next version of the architecture had the log files copied to a central server every few minutes and the data was loaded into a central database. Support techs could access this database via a web interface. But the central database got slow when the amount of log data grew even more and sometimes the centralized server failed, losing data. The architecture had failed. So a new system was created by saving the log data into a distributed file system and parallelizing the indexing of log data. Ten commodity machines were used rather than one single powerful server using a third party file system. This system finally was able to accommodate their growing data needs and has remained in place.

In his book Why Buildings Fall Down, Mathys Levy says “A building is conceived when designed, born when built, alive while standing, dead from old age or an unexpected accident.” At first glance, this doesn’t seem to apply to software. Software doesn’t die from old age. It can run over and over again and not wear out. That said, software and the design behind it does seem to have a lifecycle. And when the load on it becomes too heavy, change must come or the system will not run, like the e-mail server example.

The book many sections that when read, appear to be talking about software. It has a chapter on redundancy and theorizes that all structural failures are due to a lack of redundancy. There is a redundancy failure with the e-mail server system as well as shown with centralized server.

Most software systems are not made to withstands what the book describes as Big Bangs. Whether directly or accidentally, software architectures make tradeoffs based on risks. If the risk of receiving huge amounts of log data is low, as it was in the beginning of the e-mail server project, then a simple architecture will work. But as the risk grows higher, a system to withstand the big-bang of data is required. Most buildings were also not created to withstand a big bang. An analysis of why the World Trade Towers fell on 9/11 (http://en.wikipedia.org/wiki/Collapse_of_the_World_Trade_Center) reveals that the building was not created to endure the heat of the fire and impact.

Resonance failures of when bridges break are similar to infinite loops in software. (http://www.youtube.com/watch?v=3mclp9QmCGs ). When this kind of loop occurs in a system, it will eventually shut down, just as the bridge broke.

Many buildings are covered with facades or what the book calls “Structural Dermatology”. Software also has a design pattern called a façade. A facade is an object that provides a simplified interface to another software subsystem. Like a building façade improves the look and feel of a building, the software façade can redesign a poor interface into a great interface. Building facades in New York City were causing a lot problems as they were crumbling down, water damages was creeping in, and lack of skilled artisans were not able to maintain and repair the facades. Software facades can also be just as problematic as the original and need to be skillfully added for the pattern to work.

Software can technically last forever. Buildings when they are erected are believed at that time to last forever. But in both cases they often do not. Neglect, abandonment, and replacement often dismantle buildings as pointed out in the book. Neglect, abandonment, and replacement also dismantle software architectures. As with buildings, it is often economically more feasible to demolish what is there and recreate something new.

The book concludes with five factors that the longevity of a structure. They are:

1. Structural theories – this is the mathematics around load and structure.

2. Calculation techniques – the ability to compute various scenarios in the design process.

3. Material properties – how do the building blocks of the architecture work under various scenarios?

4. Communication procedures – how do all the people working on the architecture understand all aspects of the architecture?

5. Economic factors – what can do within a particular budget?

In many ways, this is identical to the issues we deal with in software architecture.

1. Structural theories – what are the design patterns we are using?

2. Calculation techniques – what are our performance metrics?

3. Material properties – what are our assumptions about what data will be flowing through our system?

4. Communications procedures – how do we work together as a team? Do we all understand the systems architecture?

5. Economic factors – what can we do within the given time we have with the people we have?

There are more similarities to buildings than I could have imagined.

For more information on building architecture: http://www.accreditedonlinecolleges.com/blog/2011/20-educational-architecture-books-anyone-can-enjoy/