Something is rotten… (part 2 of 2) – Human Technology and Organisations Research Group

In the previous post I wrote about how we seem to forget most of our history, when it comes to failed projects. Some projects will create working conditions that are similar to working in a very messy kitchen, where the fridges have stopped working ages ago, but nobody has noticed. The sad fact is that we already know some of the factors that will cause a project to fail, and we even know them far too well for it to be comfortable. Ken Eason wrote about the problem already in 1988, and unfortunately it is still possible to recognize several of the reasons he lists in many of the projects that have failed since then. In the following, I will use numbers to denote examples from some of the later software engineering failures as follows:

Millennium (An administrative system for hospitals and other medical units, failed).
Blåljus (An administrative system for the Police, failed)
Moving from Mellior to Cosmic (Two administrative systems for medical administration in region Gävleborg, running but with large problems)
Ladok (A joint administrative system for the academic studies, students and examiners, running and works after many small and large problems)
Nationella proven (the central administration system for national correlation tests for the Swedish education, was recalled two days before the day of the test).

The failures in these projects point to different, but related problems in the development and introduction of the systems. This list is by no means complete, but these systems display some of the well-known factors leading to failures or inconveniences. The failures could be quite easy to predict from what we know about human factors and experiences from earlier failed projects. It would be possible to write long reports about the reasons for failures of all these projects, but here I will just try to highlight some of the most evident of these.

What is the purpose?

The main document that guides the software development process is the requirements specification, which is a huge document, supposedly describing the complete functionality of the whole system to such an extent that we should be able to program the system starting from that base. This document is also normally the base for the contract between the stakeholders in the process. If a function is not in the requirements specification, it is not supposed to be there. Adding functionality outside of the requirements specification is a big no-no, just as if the functionality described by the document is missing in the system.

This sounds both great and solid, but there are some caveats already in the beginning of the process. The first is to get an overview over the complexity of the specification. For larger systems, this becomes an overwhelming task, that most humans will no longer be able to perform. However, there are already software tools that will help in the process, and I assume that this will be a task that can be well supported by systems based on artificial intelligence technology, since summarizing texts is what they are already supposed to be good at. But more crucial is that the requirements specifications, despite their complexity, will often still be incomplete to a certain extent. What is missing? Quite simply, we often spend very little time finding out the purpose, or the goal, of using the system for the end users. We can specify the central functionality to the extreme, but if we don’t know what the goal of using a system is this will still not make the system well designed. In some cases, there are also unspoken, tacit and missed requirements that will affect the usability of the final system.

The Goals – Not the tools

To make matters worse, most systems today do not have one single goal, but many, and sometimes even contradictory. An administrative system for the health services has one very clear overall goal, namely to store all the information about the patients in a secure, safe and still accessible manner. We may also have quite detailed requirements on security, about which items to store and how they need to be stored etc. But, the question is, do the requirements show the purpose of storing the data? Let us take the following example:

“An X-ray picture is taken of the knee of a patient. If the only purpose of taking the X-ray is to document the treatment of the patient, it might not matter so much if the X-ray image becomes cropped at the edges to fit the standard image size when it is saved in the journal system (1). But if the purpose instead is to make a diagnose, some small details around the edges might be very important. If the details are missing, then in the best case the surgeon only needs to order a retake of the image, but in the worst case, the doctor might not know about the cropping of the image and miss vital information for the further treatment of the patient. It appears that the quality of the storage of the data becomes very important for the health professionals using the system.“

More diffuse, albeit, obvious goals of a system, may not even be mentioned explicitly in the requirements. We can, for example, be sure that one of the main goals of introducing a (new) system is to make the work simpler, or at least more efficient for the users. Thus, if a previously simple note-taking task now requires more than twenty interactions with the system, this is definitely not supporting this indirect goal (1, 2, 4). In Ladok, entering the final grade for a course now has to pass over at least five different screens, where the final step forces the examining teacher to log out from the system and then back in again. This is stated to be for “security reasons”. It is difficult to understand how this can be regarded as “efficient”.

Furthermore, most people today will use some kind of password manager to save login identities and passwords, so that you don’t have to remember the login data. With this type of program activated, the user only has to press “Enter” one extra time, and you are once again logged in again. Where is the security in this extra set of button presses? And what are the users’ goals and tasks in all this? Logging in one extra time is definitely not part of these.

Open the door, Richard!

To make the general discussion a bit more clear, let’s take a side track over a very simple physical example that most people should recognize: “The door handling mechanism!” Normally this mechanism is referred to simply as “the door handle” (but there may also be a locking part). But a door handle can have many different shapes, from the round door knob, to the large push bar that stretches along the whole width of the door. Which design is the best? Someone might argue that the large push-bar is the best, since it allows for the use of both hands. Some might instead hold the aesthetic design for utterly important, proposing the polished door knob as their favorite.

The discussion often ends in a verbal battle about “Who is right here?”, and commonly people who have the HCI education in their backs will reply with the ID principle: “It Depends” (the principle holds that there is almost never a single truth to the question but that there are many factors that we need to contemplate before the design). This principle is of course one way to look at it, but if we consider a kitchen door, for example, this may not be the best place to use a polished door knob (as any chef or cook would immediately realize). A hand that has been handling different kinds of food will often be more or less covered in grease or even some remaining soap after washing. This will in turn make a door knob impossible to twist. Better then to use a regular door handle with the Archimedes’ levering mechanism (which also provides the necessary force for people with weak muscles, of course).

However, maybe we should look a bit further than to the best specific design of the door handle? How often have you seen someone just standing in front of the door, only twisting or applying force to the handle? Isn’t there something further involved in the action? What is the goal of using a door handle? If we think a bit further, the goal of using the door handle is most of the time to open or close the door! Right! Now we know enough then? Well, know, how often have you seen someone just opening and closing a door just for fun? OK, some children might think it’s a good way to annoy the parents, but apart from that? What is the purpose for opening or closing a door? Of course, it’s to go to the other side of the door opening, or to close it in order to stop someone or something from coming in or out. So, this is in fact (very close to) the final goal of using the door handle, to get out of or into a room or at least to get through the door opening. So, any solution that will support a user to handle the door in a way that achieves this goal will be acceptable, and there may even be some solutions that are really good (and not just usable).

Back on track… to the rotten parts…

Now, I assume that nobody would really forget that doors have the purpose mentioned above, but for other tasks it may not be so simple. In some cases the goals of using a system might not be so simple and clear. Even worse, we might forget that the same system may have different purposes depending on the user and his or her perspective. The main purpose of a system may be one thing, but for the individual user, the main purpose of using a system may be very different depending on the user role, the assigned tasks and many other things. And here comes the big problem: while we most of the time construct the system from the company or organizational perspective and the purpose of the system; its goal is quite well specified, the goals of its operators, the users, might be much less clear. And for the user it is not enough that the function is possible to use, it has to be better than the previous system or better than doing the task by hand (1, 2, 3).

It has to be better than the previous method…

This is where at least some of the problems with the software development failures is to be found. Usability is important, but the system also has to conform to the reality experienced by the users; it has to make their work more enjoyable, not more stressful or complicated. Just to give a few examples from failed systems:

The region health care in Gävleborg has now replaced the old system “Mellior” (which was in itself not exactly a very well-liked system), with a version of Cosmic (3). It would of course be expected that you replace a system with a better one? Unfortunately, the new system and not least the transfer from the previous system leaves a lot to desire. Some of these problems relate to specific work tasks, whereas others are affecting the more general aspects of the usage. At the units of child psychiatry, it was soon found that the system was not at all designed for their usage. For safety reasons, you are many times ordered to work in pairs on some patients, which turned out not to be possible to administer in the new system(3). There were also no default templates for the specific tasks in the units, and when asked they received the answer that the templates should arrive about two years (!) after the new system had been introduced. Until then, the notes and other information had to be handled “ad hoc”, using the templates that were aimed at other units.

After some “trying and terror” there were some more serious issues that were discovered. If the wrong command (that most of the personnel felt to be the most natural) was used, the patient records were immediately visible to anyone who had access to the system. Even worse, it also turned out that hidden identities were no longer… hidden. The names, personal numbers, addresses, telephone numbers and other sensitive data were all visible in plain sight (3). The same security and integrity problem was also found in the system for the administration of school tests (5). This happened although it would be quite natural to assume that there is a certain purpose of keeping people’s identities hidden and protected. Could it be because the specific requirements regulating the “hidden identity records” was forgotten or omitted?

Big Bang Introduction

One clear cause of system failure has been traced down to the actual introduction of the system in the workplace. Ken Eason (1988) wrote about the different ways a new system could be introduced. The most common was described under the quite accurate name “Big Bang Introduction”. It is also one of the most common ways we still do it. At a certain date and time, the new system is started and the old one is closed down. Sometimes the old system is still running since all existing data may not have been transferred to the new system. This is often not a surprise, since the transfer of data is often not regarded as “important”.

Data transfer

When the Cosmic system was introduced (3), the data was not transferred auztomatically, fortunately. Instead the data had to be transferred manually, but also with a certain additional extra work needed. The different data records had to be “updated” with a certain additional tagging system before being transferred, because otherwise all the records would be dumped together in an unordered heap of data. The unordered heap then had to be resorted again according to the previous, existing labels (which were in the records all the time).

The patient records are, among other things, also used for communication with the patients. However, it turns out that when messages are sent to patients through Cosmic, they don’t get sent at all, although the sending is acknowledged by the system. The messages can deal with anything, from calls for patient visits, information about lab results or information about therapeutic meetings. Now the medic personnel has to revert to looking up the addresses manually in the old system, and then send the messages to each patient directly.

Training

I already mentioned above that one reason for why a new computer system is being developed is to make the work more efficient. As we also found, the new systems not always flawless. But even if they had been flawless, there is the problem that the workplace can already be in an overstressed mode of working. The time needed for additional training in the new system is often not available when the new system is introduced. This means that either the personnel has to learn the new system on their spare time or at home after work, or not get enough training. In some cases (3, 5) the responsibility for the training is instead handed over to the IT-support groups, where this can become even worse, if the time of introduction is badly chosen.

Cosmic(3) was introduced in January, Ladok(4) was introduced in the middle of the fall term. Other systems have been introduced during summer, which might seem to be a good choice. However, December-January contain the Christmas/New Year breaks, where it is difficult to get enough personnel to manage normal work conditions. Summer holidays likewise. To imagine that it would be easy to get people to also train on the new systems during those times is of course ridiculous.

But mid-term? The time for the introduction of the Ladok(4) system was for some reason placed when the people at the student office has the most of things to do, namely when the results for all courses for the first part of the term are to be reported, in Ladok, and all with very short deadlines. This is again a recipe for a bad start on the use of a new system.

The fridge…?

If some food runs a risk of getting spoiled, we probably put it in the fridge, or even the freezer. But when software products run a risk of getting bad, where is the fridge or freezer? Well, the first thing we have to do, is to clean out the rotten stuff, even before we start finding ways of preserving the new projects that will be developed. Essentially, we have to start rethinking the usability requirements on the software we produce, and also look back, not only to what has worked before, but even more learn from the previous failures.

But most important is that we start to work with the Human Factors as guiding principles, and not just as “explanations for when people make fatal mistakes”. We know a lot about the human factors and how they shape our reactions. This post is already very long, so I guess I have to get back with part 3 of 2 dealing with these factors as part of the failed projects. While you are waiting I can recommend to take a look at the excellent book about “Human Errors” by James Reason (1990).

Illustrations are made by Lars Oestreicher, using MidJourney (v 6 and 7).

References

Eason, Ken (1988) Information Technology and Organisational Change. London ; New York: Taylor & Francis.

Reason, James. (1990) Human Error. New York, New York, USA: Cambridge University Press