Dr Carlo Kopp's Publications Archive

Industry Publications Index ... Click Here

Complexity and the Unified Modelling Language

Originally published July, 2001

by Carlo Kopp

One of the enduring arguments in the coding community is that of Object Oriented techniques vs classical procedural techniques. Whilst many programmers may correctly point out that the issue is in a sense a non-argument, since each model reflects a different aspect of a common reality, it is worth exploring some of the trends in current OO methodology. This month's feature will discuss some of the central issues and take a brief look at the Unified Modelling Language (UML) and its various implications.

The most fundamental issue which underpins the whole software engineering methodology debate is that of dealing with the extreme complexity of modern software products. Complexity is in many respects the barrier which limits acheivable functionality and frequently also interoperability in software products.

This was not always the case. If we look back over the last two to three decades, the complexity of a software product would have been bounded, in practical terms, by factors which are external to the product itself. Key factors which evolution has since rendered either irrelevant or incidental are:

Operating systems functionality. The advent of modern operating systems with powerful interprocess communications facilities and scheduling facilities has removed the previously difficult constraints in getting data or messages between processes in complex applications, and constraining the manner in which processes can interact.
Runtime environments. Modern runtime environments provide powerful facilities such as multithreading, which allow for much more complex control flow management techniques within an application process.
Common interfaces to services. The wide adoption and standardisation of POSIX interfaces to operating systems promotes the reuse of mature and proven pieces of code to provide low level services, especially basic libraries and I/O routines.
Hardware memory size and cost. Whilst the advent of virtual memory allowed applications to occupy enormous address spaces, the performance penalties of swapping these in and out of memory presented a serious obstacle to building very large and complex programs. With commodity desktop machines now cheaply available with hundreds of Megabytes of memory, there are few obstacles to prevent the implementation of genuinely enormous programs. Moore's Law will continue to drive this process.
Processor compute performance. Moore's Law has yielded remarkable results over the last decade. Flavour of the month desktop machines are now equipped with copper metallised microprocessors which run at clock speeds between 1.4 and 1.6 GHz, and deliver compute performance more than an order of magnitude better than a decade ago. Therefore the execution time of even very large memory and cache bound applications can be quite fast (more on this in next month's issue).
Development tools. Tools for crafting, compiling and debugging code have improved significantly, generally in a manner which facilitates the development of large and complex code. Running on GigaHertz class processors, compile times need not be the overnight batch jobs they had once been, even for relatively large programs or systems.
Languages. The extension of ubiquitous C into C with Classes, formalised into C++ provided a vehicle for a large scale in programming technique and methodology from the classical procedural approach to the OO approach, thus facilitating levels of complexity not managable using older techniques. The plethora of other mature or maturing OO languages in the market provides a developer with many choices.
Object interface standards. The emergence of OMG's CORBA and its proprietary equivalent(s) provided a common and standardised interface model via which objects could be accessed and used. In a networked or even basic multiprocessing environment, the CORBA model allows the construction of very complex applications in which components interact in a well defined manner.

These eight items represent the results of almost two decades of focussed evolution in technology, all directly or indirectly aimed at facilitating the design and implementation of increasingly complex pieces or interacting systems of code. Applications which were uncompilable, unrunnable, undebuggable, unable to relaibly communicate internally and unaffordable in previous years due to basic technology, are now technically feasible and in terms of basic technology, implementable.

Yet the collective experience is still that bugginess and poor reliability are endemic and expensive problems, whether we are observing the behaviour of a shrinkwrapped application with or a flight control system on a rocket booster (the only difference between a BSOD and exploding Ariane booster being in the scale of the outcome and its consequences).

Inevitably, any problem in a software product results in finger pointing. The code cutters got it wrong, the testers missed it, the user did something silly, the marketeers misunderstood the requirement, indeed the number of ways in which responsbility for an adverse outcome can be assigned is limited only by the imagination of the party seeking to responsibility.

The root cause, in the most fundamental sense, is complexity.

The REAL Enemy - Complexity

In the broadest philosophical sense, the trend toward increasing complexity seems to be an artifact of evolution, be it biological or technological. Trends in software are no exception, and in recent times programs with sizes of the order of millions of lines of code are becoming common. This is not only true of shrinkwrapped commodity products, but also of large commercial products and larger embedded systems, such as those found in space vehicles, large industrial plants and military or commercial aircraft.

Complexity is thus unavoidable. Just as the strands of DNA which make up a more evolved mammalian species became more complex over time, code will simply get more complex over time.

The big difference between nature and man-made entities like software is that the former is subject to Darwinian evolution over enormous timescales. Software is driven by Lamarkian evolutionary behaviour, and time to market and use are thus do or die parameters in the evolutionary process of a software product.

The traditional programming model and software engineering approach involved some omniscient chief software engineer or programmer attempting to coordinate the activities of a small group, each member of which would craft his own component. With enough iterations and enough haggling over interfaces the system could be made to work.

In practice, this technique ran into difficulties with sizes of hundreds of thousands of lines. While a program of this size can be successfully developed and maintained by two dozen or perhaps fewer programmers, the odds are that all participants will need a solid depth of experience and preferably as much insight as possible into the specific product being maintained. Reduce the level of programmer experience and difficulties will arise very quickly.

With all complex problems, the proven and most robust strategy is to divide and conquer the problem. In the most fundamental sense, it is broken down into smaller chunks, ideally chunks which are small enough to be well understood by individual code cutters or small teams. Gigantic monolithic programs are not a very common sight.

Where extreme complexity bites hardest, even with a rigorous divide and conquer methodology, is in one key area - the definition of the interrelationships between the components in the program and the interfaces which support these interrelationships.

This is frequently for good reasons and bad reasons:

An interrelationship between components of a program may be inherently difficult to define, or may indeed change depending on the state of the program.
Different components of a program may be implemented in different languages, with disimilar parameter passing conventions.
Different components of a program may be implemented by different programmers or teams of programmers, who may interpret the intented interrelationship between these components differently.
Frequently, interrelationships between components of a program fall between areas of responsibility in program development, and thus less attention is paid to them in comparison with each component itself.
The intended function of the program may not have faithfully replicated in the formal definition, if it exists, the interrelationships between components of a program.
The complexity of the application may be so great that no individual programmer can properly understand how all of the major components are intended to interact, let alone the lesser parts of the program.

This problem of extreme complexity leading to severe difficulties, especially in integrating the various components of a product, is incidently not confined to the software industry alone. The aerospace industry is replete with examples. Two noticable case studies are the US 1960s TFX fighter development program, and the UK 1970s-1980s Nimrod AEW program. In both instances, the biggest problems arose in getting various major components to operate together in the manner intended. The first of these projects eventually succeeded, the second crashed and burned. Both incurred many times the development costs originally envisaged.

Dealing With Extreme Complexity

One might argue that with enough discipline and rigour applied in the development process, the spectre of component interrelationship mis-definition and interface failure can be avoided. This may well be true, but in practice the kind of regime required to impose that level of discipline and rigour upon a group of developers may not be either managerially or politically implementable within an organisation. The natural human propensity to want to do things independently always works against an organisationally imposed scheme of straitjacketing how deisgns are put together.

The other difficulty which arises is that evolvability in the design may be lost in the process. Where the user requirements for the function of the design may evolve during and after the process of developing the design, whatever model is employed to define the structure of the design and the interrelationships between components of a design must be capable of also evolving in step, preferably without unreasonable expense.

Ideally, the basic technology should both impose the required quality of evolvability in the architecture of the design, yet also provide the framework for a rigorous and disciplined development process.

The OO paradigm developed in a large part with these aims in mind. It is customary in many discussions of OO technology to focus on the details of implementation, rather than the broader systemic implications of this model. This distracts from a more fundamental issue, which is that of how the paradigm itself facilitates the design and implementation of highly complex programs.

OO programming languages provide the basic brick and mortar portion of the technology base, facilitating implementation. They do not implicitly provide a mechanism for formally representing the high level structure of large and complex programs.

That is the function of a higher level modelling language, which is used to capture the critical interrelationships between the components of the program. Such a language provides a means of describing these in a format which is both rigorous and evolvable.

The Unified Modelling Language (UML), devised primarily by Rational, is a product of the latter half of the nineties, and is now the OMG ratified industry standard for this purpose.

Unified Modelling Language

UML was created by the fusion of ideas developed in three second generation software engineering methodologies, Booch, Objectory, and OMT, devised by Grady Booch, Ivar Jacobson and Jim Rumbaugh, but also incorporates ideas produced by a large number of other CASE methodology theorists. The extended UML for Real-Time incorporates features from the Real-Time Object-Oriented Modeling language (ROOM).

The process of creating UML started in 1994 when Booch and Rumbaugh decided to unify their respective Booch and OMT methods. Ivar Jacobson's use cases were incorporated, and Jacobson soon after joined the unification effort which led to the current UML specification. The decision to unify the three established methods was based on the following criteria (Rational - UML FAQ by Booch, Rumbaugh and Jacobson, http://www.rational.com/):

First, these methods were already evolving toward each other independently. It made sense to continue that evolution together rather than apart, thus eliminating the potential for any unnecessary and gratuitous differences that would further confuse users.

Second, by unifying these methods now, we could bring some stability to the object-oriented marketplace, allowing projects to settle on one mature method and letting tool builders focus on delivering more useful features.

Third, we expected that our collaboration would yield improvements in all three earlier methods, helping us to capture lessons learned and to address problems that none of our methods currently handled well.

Booch, Rumbaugh and Jacobson describe their goals in devising UML thus:

To model systems (and not just software) using object-oriented concepts, To establish an explicit coupling to conceptual as well as executable artifacts, To address the issues of scale inherent in complex, mission-critical systems, To create a method usable by both humans and machines.

These four lines by the authors of UML encapsulate, very concisely, much of the argument presented earlier. Importantly, the UML model is not unique to software, but provides a paradigm which is quite general and thus applicable to defining the attributes and behaviour of highly complex systems of any type.

UML comprises a number of components. A metamodel is used to describe the semantics and syntax of the elements of the language. The long term aim is to refine this using formal logic. A graphical notation is used to provide a graphical syntax which can be read by humans and by tools. The language also includes a set of idioms to describe usage.

UML employs a set of models which are used to describe the system:

Use-case diagrams, adopted from Objectory, are employed to describe use cases.
Class diagrams, a feature of Booch and OMT, are used to describe the static semantics of the classes in the system. * State-machine diagrams, are used to describe the dynamic semantics of classes.
Message-trace diagrams, object-message diagrams, and process diagrams, adopted from the Booch, OMT, and Fusion schemes, describe the dynamic semantics of collaborations of objects.
Module diagrams are employed to describe the developer's view of the system.
Platform diagrams are used to describe the organisation and topology of the hardware upon which the system executes.
Deployment diagrams show the configuration of the hardware and software components of the system at run-time.

UML for Real-Time, intended to describe mission-critical realtime systems, incorporates further models:

Capsules are complex active components, which interact with the surrounding environment through boundary objects called ports.
Ports are objects which implement the interfaces between a capsule and the external world. Ports are signal based, to provide portability across platforms ad distributed implementations, and implement protocols via which they communicate.
Connectors describe the communication relationships between capsules.
State Machines describe the functionality of a simple capsule. More complex capsules are described using internal sub-capsules, interacting through connectors.

A well implemented UML toolset will provide extensive facilities for binding the UML models to the object implementations in an OO programming language, and some toolsets also provide reverse engineering facilities which can produce UML descriptions of an existing program. Whether the code is implemented in C++, ADA, Smalltalk or any other applicable language, the toolset provides the means of transfering a definition into a framework for implementation in code.

UML is not a panacea. It is a mechanism via which the behaviour of a complex system can be exactly described and defined, to facilitate the process of creating code. Even with a perfect UML description, poorly implemented and buggy code modules will cause difficulties. However, bugs of this ilk are much easier to identify and fix, typically, in comparison with bugs which arise at an architectural level in the product design.

In terms of dealing with complexity, the widespread adoption of UML will yield important benefits in the robustness and predictability of the development and mainenance process, against older techniques. A likely consequence, in coming years, is that this will push complexity up even further beyond current bounds, introducing difficulties which have yet to be seen.

Programs with tens of millions of lines of code will present some very interesting challenges.

$Revision: 1.1 $

Last Updated: Sun Apr 24 11:22:45 GMT 2005