Trevor's Trinkets: June 2007

Computer software development projects still often run late and over budget, and the people who commission them are still often surprised and disappointed by what they get at the end of the development process. Software development has been around for over 60 years now, and it should be a mature, reliable process, but some big gaps remain. I've been designing and writing software for over 40 years, and I have scars to prove that I blew it often enough myself. I have been trying for a long time to find a way to make the development process more visible and easy to understand for the people who will eventually use what we build, so they get advanced warning when we're going wrong, and can help us to sort out our mistakes before they get cast in code, because that's even worse than being cast in concrete.

This paper discusses the biggest problems that happen time and again in the software development process:

We don't fully understand the users' requirements up front.
The users don't really understand the design that we put together.
It's only when we deliver code to the user that we all find out how much trouble we're in.
The users don't have sufficiently detailed plans to test the system before it goes live.
When the system does go live, we get into even more trouble.

I show ways in which the humble use case can be extended to better solve these problems, and to improve the quality of the solutions that we deliver.

Why do things keep going wrong?

If we have been developing computer software for so long, and if we know so much about how it should be done that our universities offer graduate courses in computer science and software engineering, how come we keep getting it wrong? In his book "Great Software Debates", Alan M. Davis states that "Requirements" are "The Missing Piece of Software Development". Usually the people developing software are not experts in the business that they're trying to automate, and the people in the business, who know it backwards, are not experts in software development. Both groups use their own language to talk about their area of expertise. Neither group understands the other particularly well. Communication is poor. Both groups hope that the problem will go away while they're developing the system. It does, but too late. By then the damage is done!

"The user requirements have changed"

First, let me dispose of the oldest, lamest excuse in the industry (I should know, I have used it often enough): "The user requirements have changed". Maybe we have to build an accounting system, or an order entry system, or whatever. People have been doing these things for centuries. Double-entry bookkeeping, for example, goes back to 12th century Italy, and hasn't changed much since. The people doing it get trained in schools, colleges, universities, and then get extensive on-the-job training before they start to practise their trade. So we build a system to meet what we think their needs are. We have many meetings with them, where we talk computer jargon and they try to pretend they understand what we're saying. But the first time they really understand how little we know about their business is when they try out the system that we have built for them. Then, suddenly, they communicate, long and loud. But we are smart. We have been burned before, so we got them to sign a contract in advance that says if they want anything different from what we build them, they pay. We say, "The user requirements have changed". That's usually not true. It's their understanding of what we're doing to them that has changed, and, after a lot of loud shouting, our understanding of their business processes and needs changes too. But then we get deployed on a new project in a different business where we're just as green as before, and the cycle repeats itself.

How can we discover the real requirements?

Lots of good ways have been developed to help computer people to talk with business people to discover what it is they do, and what the proposed new software should do in order to help them get their job done. These are generally called "methodologies" by the people who develop them. As the Wikipedia article on this points out, it would be more accurate to call them "methods". There used to be a whole lot of competing methods around, each with their own jargon. Mercifully, most have converged on a common jargon called the Unified modelling Language together with common techniques and diagrams. The UML components that deal most directly with capturing and documenting user requirements are:

The Actors, who use a computer system or trigger actions within it.
Actions, which are the things that Actors and computer systems do together to achieve something useful.
Use Cases, which are short stories written in business language that describe what Actors do to perform Actions.
Use Case Diagrams, which are Use Cases written as diagrams.

In short, these UML elements deal with the question of who is going to do what, how, in order to get what done. This is the most important part of the project, because if we get it wrong, we will build the wrong solution. Many early methods used complex jargon and diagrams that made this part of the process very difficult for business people to understand and fully participate in. Long and hard experience has taught us the importance of keeping it simple, and current UML methods are good in this area. If we follow the UML guidelines, the users will generally be able to understand what's going on, participate fully, and we should end up with a set of use cases that an accurately describe what's needed.

So why does development still go wrong?

In my experience, there are two major reasons why software development often delivers the wrong solution (or the right solution to the wrong problem):

The user requirements aren't detailed enough to fully specify the target system
The development process is invisible to the users; they can't identify errors as they arise

It's tricky to get the right level of detail in use cases. Here's the problem, and it afflicts all aspects of UML, not just use cases: in the books and training materials, the methods are used to describe activities at a trivially simple level. Here for example is a use case diagram that says that a patron orders food from a waiter in a restaurant; that the chef cooks it; that the waiter delivers it together with a drink; and that the patron pays for it. A nice, simple view of how things work. But we all know that reality is a lot more complex. The patron may have to be shown a seat, the waiter may want to talk about the specials, the patron may want to ask how various dishes are prepared. The chef has to plan the day's food, get appropriate recipes, obtain the required ingredients, peel, chop up, and otherwise prepare the food, cook various components in various ways, and combine the finished product on a plate. Eating itself is quite a procedure, which is maybe why it got left out of the use case diagram. Payment may be through cash, cheque, or credit or debit card, each of which have their own procedure. A credit card may be refused by a bank, and then the patron may pay with cash instead. And so forth.

If the use case is made detailed enough to be a useful and reliable guide to the programmer who needs to write the code, it will end up very big and bulky. It will take a lot of time and effort for the users to specify the use case at this level of detail, and they're "too busy". But this raises the ugly question, "if you're too busy to do it right, when will you find time to do it over?" When the project is planned, commitment must be made by suitably qualified users to spend the time needed to get the requirements right, and documented.

The bulk problem

The use case method, like most of the other methods in UML and all earlier modelling disciplines, were largely defined back in the days when modelling was done on large sheets of paper in a meeting room. Completed sheets of paper were stuck to the walls. Use cases with realistic levels of detail get so big and bulky that they won't fit onto a single page. They will spill over dozens, maybe hundereds of pages, and there's no easy way to navigate reliably from one page to another.

While we're on this subject, let's note that use case diagrams are much more bulky that plain text use cases. If you open a text editor and type in the text that appears in the action diagram referenced here, in the same sized font, you will find that the diagram occupies ten times more space than the text version. If real-life use cases don't fit on a single page in text form, they won't fit on ten pages in diagram form.

Drill-down detail

Nowadays programmers mainly use computer based modelling tools to create and edit the various diagrams that they use in support of the design effort. Some of these tools like ArgoUML are open source and free. They offer an elegant solution to the dilemma of simple versus detailed use cases. The tool user can specify a process as a set of simple steps, so that they fit on a single screen. Those steps that need more explanation can be given "children" steps, as many as are needed. The presence of these children can be signalled by prefixing the step with the familiar + icon that we see in file explorers. This tells us that there's more detail to see. If we click on the icon it becomes a – and its children appear below it. If any child action needs further explanation, we can repeat this procedure. Realistically, the requirements gathering phase needs to be supported by a tool like this. That way sufficient detail can be gathered over time without overwhelming the audience with unwanted detail. So requirements gathering should be based on a computer-based modelling tool, and the display projected onto a big screen so the participants can see what's happening.

The devil is in the details

To be really unambiguous for programmers, use cases would have to describe every single field of data to be captured on every single form or panel, and the validation to be carried out upon these inputs, and the complete list of all outputs that will be displayed on each form or panel. If this were done, the labour required to produce and maintain the use cases would be almost as much as that needed to write the programs that do the work. Programmers claim, and often with justice, that they hardly have enough time allocated to them to write the programs once, and that they don't have the time to produce really detailed use cases. If they did so, they would end up writing every program twice, in two very different formats. On the other hand, the end users who must help to develop and then validate the use cases would find it difficult to understand how the finished software would look and behave if all they see is textual use cases statements. But if use cases are not detailed down to the level of identifying the individual fields in each form or panel, they will be too high-level for the end users to assess their accuracy and relevance.

In my experience, the gap between use cases and running code is so big that users usually don't know enough about what is being developed to judge the proposed design before it has been turned into running code. By then it is too late to easily fix the problems that become visible.

Adding value to the use case

Perhaps the best way to make use cases detailed enough to keep the programmers honest, but at the same time meaningful to the users so that they can understand them, is to link each step in the use case to a separate mock-up panel which shows all the input fields required and all the output fields returned. Users can compare mock-up panels to the existing paper forms or legacy screen panels that they currently use to get the job done. They can compare the two field by field, and ensure that each field input into the paper form can be captured in the mock-up panel; or if not, ask why not. Creating the mock-up panel will not impose an unreasonable and irrelevant burden on the programmer providing each such panel is subsequently used as part of the system being developed. The mock-up panel can be refined through successive iterations to become the production panel.

In the case of a web-based software, which is the popular paradigm today, the mock-up panels can be developed as HTML pages, because ultimately this is what they will have to be. The use cases can also be developed as HTML documents. An index can be developed in HTML to list the use cases in the same hierarchical structure that is used in the UML model. Each mock-up HTML panel can be given a short, unique ID (this is ultimately required if users are going to have useful conversations with help desk personnel over telephones). An extra column can be appended to the use case scripts to carry the ID of the mock-up panel to which each paragraph of the use case script refers. Each such ID can be made a hyperlink to the mock-up panel, pointing to a different target window, so that when the reader of the use case clicks on a panel ID, it appears in a different window, and the user does not lose their place in the use case script. If a user has to perform a large number of use case validations, they could be given access to two physical screens, with the use case script window positioned on one screen and the target window that displays the mock-up panels on the other screen. With this sort of setup, the user can read the script in one window and swiftly see a mock-up of the panel that will be displayed when the function has been programmed. The user can then compare each mock-up to existing paper forms or legacy system screens, and ensure that it is complete and consistent. A given panel will often appear more than once in a given use case script, and across different use case scripts. The user will be able to validate the mock-up panel once in exhaustive detail when it first appears, and to devote less attention to the panel on each subsequent reappearance. This approach requires a lot less writing of tedious detail than would a use case that contains a field-by-field narrative for every panel every time it is referenced. Much less work for the person producing the use case, and similarly for the user who is validating the use case.

Here is a simple example of some steps in a use case that follow the method described above:

5.	The searcher enters search criteria that identify the documents of interest.	ggls01
6.	The system presents a list of the titles of documents that meet the search criteria, ordered with those that best meet the criteria first.	gglr01
7.	The searcher is able to click on the title of any of the documents listed to view its contents.	gglr02

In a functional computer system, values entered as inputs in one panel will often appear as outputs in subsequent panels. A mock-up interface built of separate static HTML files will not behave in this way. It is possible to get some of this behaviour without having to write specific logic for each mock-up panel. Each panel can at some stage in the development cycle be morphed from a static HTML page to an active page such as a JSP or PHP page, which it will ultimately have to be. This can be done by adding some fixed wrapper lines to the file and renaming it. The fixed wrapper lines can include logic to harvest all of the inputs captured by the user in prior HTML forms, and to store them in a hashmap in the user's session, without any regard to the names or values of the various fields. When the next page is presented, each output field can contain a method invocation that passes the name of the output field to a standard output method. This method could check if a value has been associated with the name passed, and if so return the value, else a question mark. The hashmap could be primed initially from a simple ascii file that contains name/value pairs, in order to prime the pump.

The UML community feel most comfortable with modelling when it is diagram-based. In order to gain their acceptance of mock-up panels, it may be best to give them a diagram-like name such as UIGrams.

Measuring the scope of work, and progress

One of the most vexing issues that face the owners of systems under development, and the developers of such systems, is the big disconnect between the specification of the system's requirements, in which the owners participate, and the production of a working, testable system. The owners have almost no way of knowing how much of the required work has been done, and whether the work is of adequate quality, until they see running code. Much the same dilemma may afflict the development project manager, unless he or she is an accomplished programmer as well as a project manager, a rare combination. By the time the code runs well enough to test, so much time and money have flowed that the owners may find themselves committed to using the final product even if it is not to their liking.

Many different approaches have been tried to provide system owners with an objective measurement of progress, which they and the development manager can use to check whether the project is on track. One of the earliest was to estimate the number of lines of program code that would be required to complete the project, and to count the number of lines coded on a regular basis. This turned out to be a poor metric for several reasons:

No one can accurately estimate how many lines of code will be required to complete a system, especially if other coders are involved.
Programmers typically write many lines of code quite quickly, but then spend a lot of time correcting errors, which may not result in much net growth in the number of lines of code.
Hard experience has shown that if programmers know that their progress is being measured in terms of the number of lines of code that they write, they will write more lines of code. They will tend to clone more sections of code rather than writing a function or subroutine or method which they invoke from different places. This proliferation of code will eventually make the system more difficult to fix and enhance once it has gone into production.

System developers do develop a lot of artefacts in the process of systems specification and development. These should ideally be captured into a formal computer-based development model. Many different models have been developed over the years, but fortunately there has been a convergence on a single group of standards called UML (Universal modelling Language, see http://www.uml.org/). Unfortunately, one has to be an expert in the various models available within UML in order to assess the scope, completeness, and quality of the models developed by the system developers, so this isn't of much use to system owners in determining progress.

Function Point Analysis (see http://www.ifpug.org/) was then developed, and turned out to provide a far more reliable measure of the work to be done. It's probably the best system that we have, but has the drawback that it requires a lot of work from both the developers and the system owners to determine in advance how many function points of what complexity a new system will entail, and to measure progress against plan. And as classically conducted, function point analysis tends to be all overhead in the sense that it does not contribute directly to the design, development, or testing of the system.

For an online interactive transaction-based business or administrative transaction processing system, it turns out that the function point metrics are largely determined by the number of panels and the number of input and output fields on the panels that the users will interact with. This would make no provision for batch, but batch programs typically constitute a small fraction of the overall system development effort. The amount of work required to develop a system (excluding batch) can therefore be based on a count of the number of panels, together with their input and output fields, that will be needed to implement the required functionality; and that progress should then be measured against the agreed set of panels. This is a more rough-and-ready approach than function point analysis, but it has the great advantage that it does not require either the developers of the system owners to do work that does not contribute directly to the final system. These are the steps that are required:

During the requirements gathering phase, developers and owners work together to identify the elements listed below. Developers will capture them into the agreed modelling tool, and the owners will be asked to verify that this has been done accurately (this part of the model is easy for non-specialists to check):

The actors that will play a role in the system.
A hierarchical list of the major actions that these actors will perform.
Models of each panel that will be required to support the actions identified, in HTML if it is a browser-based system.
Use cases that describe in words the various actions identified, in HTML if it is a browser-based system, with links to the model panels.
A data model that contains all of the data items identified in the actions and panels.

During the design development phase:

The users can work their way through the use cases, viewing the model panels at each step of the process, and validate or correct them.
The developers refine the data model into normal form, then produce a database design.
The designers populate the model with the classes, attributes, and methods that will be required to implement the system.

During the code development phase:

The developers flesh out the model classes with the code required to implement the system.
The developers create the database and the classes required to manage the data in them.
Simple, standard logic can be added to the model panels to propagate inputs entered by users onto subsequent panels.
Other team development members refine the look and feel of the model panels until the users are comfortable with them.
Snapshots of key panels are taken and are signed off by the system owners as being the look and feel that they require.
The development team ensure that the agreed look and feel is applied uniformly across all panels, preferable via style sheets.
The system owners test and sign off (or critique) the modified panels to assert that they have the required function and appearance.

During the testing phase:

As the various parts of the system are implemented, the corresponding model panels are fleshed out with embedded logic as required.
The use cases now become the test scripts. The users use them as their guide for testing the system methodically, but now panel-to-panel navigation is achieved through software logic in the test system rather than by clicking links in the use case (although use case links may still be used to navigate to the appropriate software panels where this makes sense, i.e. input from a prior panel is not required).
Navigation across those sections of the system that have not yet been developed may still be done via the use cases so that users can assess the components under test in a plausible context rather than in isolation.
A systematic colour scheme convention should be implemented through style sheets to distinguish model panels from working panels.

With this approach to the development process, the users start to get exposure to a "straw man" model of the final system during the system specification phase. The model system gradually evolves and improves throughout the design and development phases, and becomes real during the testing phase. Almost from day one, the users have something tangible and comprehensible to work with. They can see and assess how well the finished system will meet their needs, and provide meaningful feedback to the developers while development is still underway, rather than only after the project has supposedly completed and the developers have been assigned to different projects.

Dated snapshots of the model and developed system should be taken weekly and archived by both the developers and the system owners, so that when (not if) disputes arise as to what was previously done and agreed or not agreed, evidence will be available to help resolve the disputes.

Linking use cases to Java code and documentation

If the software development takes place in Java then a further refinement is possible – use cases can be cross-linked to the source code once written, and to the Javadoc once generated (Javadoc is a set of HTML documents that list all of the classes, and for each class all of its methods and attributes in a Java system).

Once the requirements gathering phase is complete, the text of the use cases is supposedly fixed. Analysts should then study the use cases and from them work out what objects are needed to represent the objects that appear in the use cases. The objects should correspond to the nouns that appear in the use case. The possessive form (e.g. the dog's bone) suggests that the class bone is an attribute of the class dog. Verbs should suggest the methods that the various objects (nouns) will need to implement. Adjectives qualify nouns, and may suggest subclasses.

It would be nice if the modelling tool allowed the developer to highlight nouns, verbs, and adjectives found in the use case, and to indicate which objects, attributes, methods, and subclasses they correspond to. The text of the use case could be colour-coded to show these classifications. As the analysis proceeds, the system could recognise nouns, adjectives, and verbs that the analyst has previously categorised, and offer the link previously made by the analyst as the default interpretation of the new occurrence of that word. The analyst could accept the default, or create a new object or method.

Once this analysis is completed, simple source code skeletons could be created automatically from it, and the use case linked to the source, so that clicking on a noun takes the viewer to the corresponding source class or attribute definition. Once the programmers have fleshed out the generated code stubs with working code, they will normally generate Javadoc documentation from it. The use cases could also link to the places within the generated Javadoc where the corresponding classes, method and attribute definitions are defined. Missing links (e.g. a noun that doesn't link to a class, or a verb that doesn't link to a method) would suggest areas of the use case that have not yet been fleshed out with source code, and hence parts of the software that require further attention.

Here is a simple example of how a marked-up step in a use case might appear. Classes have pink backgrounds, methods blue, both are underlined (would be hyperlinked in the real system), and tooltips may be added for extra information.

3. The quick brown fox jumps over the lazy dog.

References

Alan M. Davis. Great Software Debates (October 8, 2004), pp:125-128 Wiley-IEEE Computer Society Press

Trevor's Trinkets

Thursday, June 21, 2007

Use Cases On Steroids