Complexity and the Unified Modelling Language |
Originally published July, 2001 |
by
Carlo Kopp |
¿ 2001, 2005 Carlo Kopp |
One of the enduring arguments in the coding community is that of Object Oriented techniques vs classical procedural techniques. Whilst many programmers may correctly point out that the issue is in a sense a non-argument, since each model reflects a different aspect of a common reality, it is worth exploring some of the trends in current OO methodology. This month's feature will discuss some of the central issues and take a brief look at the Unified Modelling Language (UML) and its various implications. The most fundamental issue which underpins the whole software engineering methodology debate is that of dealing with the extreme complexity of modern software products. Complexity is in many respects the barrier which limits acheivable functionality and frequently also interoperability in software products. This was not always the case. If we look back over the last two to three decades, the complexity of a software product would have been bounded, in practical terms, by factors which are external to the product itself. Key factors which evolution has since rendered either irrelevant or incidental are:
These eight items represent the results of almost two decades of focussed evolution in technology, all directly or indirectly aimed at facilitating the design and implementation of increasingly complex pieces or interacting systems of code. Applications which were uncompilable, unrunnable, undebuggable, unable to relaibly communicate internally and unaffordable in previous years due to basic technology, are now technically feasible and in terms of basic technology, implementable. Yet the collective experience is still that bugginess and poor reliability are endemic and expensive problems, whether we are observing the behaviour of a shrinkwrapped application with or a flight control system on a rocket booster (the only difference between a BSOD and exploding Ariane booster being in the scale of the outcome and its consequences). Inevitably, any problem in a software product results in finger pointing. The code cutters got it wrong, the testers missed it, the user did something silly, the marketeers misunderstood the requirement, indeed the number of ways in which responsbility for an adverse outcome can be assigned is limited only by the imagination of the party seeking to responsibility. The root cause, in the most fundamental sense, is complexity. The REAL Enemy - Complexity In the broadest philosophical sense, the trend toward increasing complexity seems to be an artifact of evolution, be it biological or technological. Trends in software are no exception, and in recent times programs with sizes of the order of millions of lines of code are becoming common. This is not only true of shrinkwrapped commodity products, but also of large commercial products and larger embedded systems, such as those found in space vehicles, large industrial plants and military or commercial aircraft. Complexity is thus unavoidable. Just as the strands of DNA which make up a more evolved mammalian species became more complex over time, code will simply get more complex over time. The big difference between nature and man-made entities like software is that the former is subject to Darwinian evolution over enormous timescales. Software is driven by Lamarkian evolutionary behaviour, and time to market and use are thus do or die parameters in the evolutionary process of a software product. The traditional programming model and software engineering approach involved some omniscient chief software engineer or programmer attempting to coordinate the activities of a small group, each member of which would craft his own component. With enough iterations and enough haggling over interfaces the system could be made to work. In practice, this technique ran into difficulties with sizes of hundreds of thousands of lines. While a program of this size can be successfully developed and maintained by two dozen or perhaps fewer programmers, the odds are that all participants will need a solid depth of experience and preferably as much insight as possible into the specific product being maintained. Reduce the level of programmer experience and difficulties will arise very quickly. With all complex problems, the proven and most robust strategy is to divide and conquer the problem. In the most fundamental sense, it is broken down into smaller chunks, ideally chunks which are small enough to be well understood by individual code cutters or small teams. Gigantic monolithic programs are not a very common sight. Where extreme complexity bites hardest, even with a rigorous divide and conquer methodology, is in one key area - the definition of the interrelationships between the components in the program and the interfaces which support these interrelationships. This is frequently for good reasons and bad reasons:
This problem of extreme complexity leading to severe difficulties, especially in integrating the various components of a product, is incidently not confined to the software industry alone. The aerospace industry is replete with examples. Two noticable case studies are the US 1960s TFX fighter development program, and the UK 1970s-1980s Nimrod AEW program. In both instances, the biggest problems arose in getting various major components to operate together in the manner intended. The first of these projects eventually succeeded, the second crashed and burned. Both incurred many times the development costs originally envisaged. Dealing With Extreme Complexity One might argue that with enough discipline and rigour applied in the development process, the spectre of component interrelationship mis-definition and interface failure can be avoided. This may well be true, but in practice the kind of regime required to impose that level of discipline and rigour upon a group of developers may not be either managerially or politically implementable within an organisation. The natural human propensity to want to do things independently always works against an organisationally imposed scheme of straitjacketing how deisgns are put together. The other difficulty which arises is that evolvability in the design may be lost in the process. Where the user requirements for the function of the design may evolve during and after the process of developing the design, whatever model is employed to define the structure of the design and the interrelationships between components of a design must be capable of also evolving in step, preferably without unreasonable expense. Ideally, the basic technology should both impose the required quality of evolvability in the architecture of the design, yet also provide the framework for a rigorous and disciplined development process. The OO paradigm developed in a large part with these aims in mind. It is customary in many discussions of OO technology to focus on the details of implementation, rather than the broader systemic implications of this model. This distracts from a more fundamental issue, which is that of how the paradigm itself facilitates the design and implementation of highly complex programs. OO programming languages provide the basic brick and mortar portion of the technology base, facilitating implementation. They do not implicitly provide a mechanism for formally representing the high level structure of large and complex programs. That is the function of a higher level modelling language, which is used to capture the critical interrelationships between the components of the program. Such a language provides a means of describing these in a format which is both rigorous and evolvable. The Unified Modelling Language (UML), devised primarily by Rational, is a product of the latter half of the nineties, and is now the OMG ratified industry standard for this purpose. Unified Modelling Language UML was created by the fusion of ideas developed in three second generation software engineering methodologies, Booch, Objectory, and OMT, devised by Grady Booch, Ivar Jacobson and Jim Rumbaugh, but also incorporates ideas produced by a large number of other CASE methodology theorists. The extended UML for Real-Time incorporates features from the Real-Time Object-Oriented Modeling language (ROOM). The process of creating UML started in 1994 when Booch and Rumbaugh decided to unify their respective Booch and OMT methods. Ivar Jacobson's use cases were incorporated, and Jacobson soon after joined the unification effort which led to the current UML specification. The decision to unify the three established methods was based on the following criteria (Rational - UML FAQ by Booch, Rumbaugh and Jacobson, http://www.rational.com/): First, these methods were already evolving toward each other independently. It made sense to continue that evolution together rather than apart, thus eliminating the potential for any unnecessary and gratuitous differences that would further confuse users. Second, by unifying these methods now, we could bring some stability to the object-oriented marketplace, allowing projects to settle on one mature method and letting tool builders focus on delivering more useful features. Third, we expected that our collaboration would yield improvements in all three earlier methods, helping us to capture lessons learned and to address problems that none of our methods currently handled well. Booch, Rumbaugh and Jacobson describe their goals in devising UML thus: To model systems (and not just software) using object-oriented concepts, To establish an explicit coupling to conceptual as well as executable artifacts, To address the issues of scale inherent in complex, mission-critical systems, To create a method usable by both humans and machines. These four lines by the authors of UML encapsulate, very concisely, much of the argument presented earlier. Importantly, the UML model is not unique to software, but provides a paradigm which is quite general and thus applicable to defining the attributes and behaviour of highly complex systems of any type. UML comprises a number of components. A metamodel is used to describe the semantics and syntax of the elements of the language. The long term aim is to refine this using formal logic. A graphical notation is used to provide a graphical syntax which can be read by humans and by tools. The language also includes a set of idioms to describe usage. UML employs a set of models which are used to describe the system:
UML for Real-Time, intended to describe mission-critical realtime systems, incorporates further models:
A well implemented UML toolset will provide extensive facilities for binding the UML models to the object implementations in an OO programming language, and some toolsets also provide reverse engineering facilities which can produce UML descriptions of an existing program. Whether the code is implemented in C++, ADA, Smalltalk or any other applicable language, the toolset provides the means of transfering a definition into a framework for implementation in code. UML is not a panacea. It is a mechanism via which the behaviour of a complex system can be exactly described and defined, to facilitate the process of creating code. Even with a perfect UML description, poorly implemented and buggy code modules will cause difficulties. However, bugs of this ilk are much easier to identify and fix, typically, in comparison with bugs which arise at an architectural level in the product design. In terms of dealing with complexity, the widespread adoption of UML will yield important benefits in the robustness and predictability of the development and mainenance process, against older techniques. A likely consequence, in coming years, is that this will push complexity up even further beyond current bounds, introducing difficulties which have yet to be seen. Programs with tens of millions of lines of code will present some very interesting challenges.
|
$Revision: 1.1 $ |
Last Updated: Sun Apr 24 11:22:45 GMT 2005 |
Artwork and text ¿ 2005 Carlo Kopp |