On the Morning of 12 December, 1988, a crowded commuter train ran into the rear of a stationary train just outside of Clapham Junction. The commuter train bounced off the tracks and into the path of a third oncoming train. As a result, thirty five people were killed, and nearly five hundred were injured.
The Hidden report states clearly that “The purpose of this Investigation was not to look for one simple, single solution to account for the tragedy, but to seek to establish both the immediate and the underlying causes of the accident and all the circumstances attending it”. Although he doesn’t mention the technique by name, what Anthony Hidden accomplishes in his report is a masterpiece of root-cause analysis.
Root-cause analysis has a somewhat deceptive name. It’s purpose is not to find a single root cause of an accident. Rather, like the roots of a tree, it begins with a simple explanation and then builds a network of causal circumstances. For any statement of fact about an accident, root-cause analysis prompts you to ask “How did this come to be?”, “How was this allowed to remain?” and “How did this come to result in an accident?”.
At Clapham Junction, the causal network starts with a piece of wire. It was not a useful piece of wire. It should not have been there. It was in fact part of an old relay circuit that had been disconnected. This piece of wire touched the circuit for a new signal that had been installed. This allowed electricity to flow across the circuit, and caused the stationary train to become invisible to the signalling system. Since the circuit could see no train, the signal light became green, when it should have been red. In safety language, this is called a wrong-side failure. Rail signals are designed to be fail-safe. If there is an error or failure in the system, it defaults to turning the signals red. The hopefully rare events which defeat a fail-safe strategy are called wrong-side failures.
So the accident was caused by a piece of wire that was not supposed to be there. We know exactly who was supposed to make sure that the wire wasn’t there and that person took full responsibility afterwards, both for the wire, and for the accident. But the Hidden Inquiry does not accept this claim of responsibility.
When an old circuit is disconnected, it isn’t always possible to completely remove the cabling. In these cases, the maintenance rules say that three precautions should be taken. The wire should be disconnected at both ends, it should be cut back as short as possible, and it should be secured in a position where it cannot touch other wires. The maintainer did none of these things.
This was a violation but how did the violation come about? It turns out that this was what the maintainer usually did. He routinely broke the rules, and no one had ever noticed and corrected him. He hadn’t been trained to do the job properly, or undertaken any test or assessment in how to do electrical wiring. It wasn’t just him, either. This was the regular way maintainers worked.
How did the violation remain? Normal good practice would require a supervisor checking the work of the maintainer. The supervisor in this case was working with one of the track gangs. He never even entered the signal box, let alone checked the wiring work. The maintainer had been working on electrical wiring for sixteen years, and none of his supervisors had ever noticed that his work was unsafe.
How did the violation come to result in an accident? It’s a scary thought that a railway could be made unsafe by a single wiring error. In actual fact, there is an independent check called a wire count to prevent the single point of failure. The wire count should typically be the job of the supervisor, but he was unaware that this was part of his responsibility.
So now root cause analysis gives us three more facts we need to explain. We have regular poor maintenance practices, a lack of adequate supervision, and a lack of knowledge about the importance of the wire count. There are explanations for each of these, but for the sake of simplicity, let’s focus on the wire count, which the Hidden report describes as the Last Defence. It may have been the supervisor’s job to actually do the count, but it was the job of the Testing and Commissioning Engineer to make sure that it got done.
The Test and Commissioning Engineer was only temporarily in that role. As a result of a reorganisation, his previous position had disappeared. He’d applied for four or five other jobs, but hadn’t got them, so he felt obliged to accept the temporary role. It was a temporary job he didn’t want, in a location where he didn’t want to work. Because it was a temporary job, he had no real induction or training for the work. The document describing the wire count was called SL-53. It was a formal work instruction titled “TESTING OF NEW AND ALTERED SIGNALLING”. The test and commissioning engineer had seen this document in draft form, but had never read it in its final version. It had arrived in his inbox, but no one drew it to his attention or told him that it was his job to make sure it got implemented. This didn’t stop him signing the box on the test certificate saying that all work had been completed in compliance with SL-53.
It doesn’t matter who this engineer was; his predecessor who wasn’t temporary, and wasn’t demotivated, and did understand the job, would have done exactly the same thing. The engineer in charge of improving the quality of testing for the whole region would have done the same thing. None of these engineers, with test and commissioning in their job title, was aware that a formal work instruction with testing in its title had been issued and was in force.
SL-53 didn’t appear from nowhere, it was produced in response to a series of wrong side failures called the Oxted incidents and the Queenstown Road Incident. All of these incidents would have been avoided by a proper wire count. The tester in charge of the Queenstown Road Incident had been blamed and reprimanded, but no one had stopped to ask how a tester without training or awareness of the work instructions had been put into that job. As the Hidden report states, `the fault did not lie merely with the tester’s “shoddy work”, it lay equally with the person who had sent him to do the job.’
The Hidden report contains a fairly lengthy discussion of re-organisations within the Signals and Telecommunications Department of British Rail. This discussion fails to reach a strong conclusion. Hidden recognises that the regional testing team had three tasks: to conduct testing, to improve the standard of testing, and to improve the training and competence of testers. They were so busy doing the first of these that they failed in the other two. This was the case before the re-organisations, so the worst that can be said is that the re-organisations were missed opportunities for reform. It may be that that the number and size of re-organisations had created reform fatigue. Staff were reluctant to undertake long term initiatives because they expected any improvements to be wiped out by the next reorganisation.
For any safety engineer, there is a balancing act between the immediate needs of the tasks at hand, and the long term needs of the organisation. The goal is always to do a good job within existing constraints, but to make sure that it will be easier to do a better job next time. The Clapham Junction accident could have been avoided if the people who knew most about testing were spending less time testing, and more time creating a better test organisation.
With all of the chaos in the causes and circumstances of an accident, it can be surprising that things do go right. The first person in charge of the accident scene was Temporary Station Officer Mills. He arrived four minutes after the crash, and was in charge for ten minutes. In that time, he had assessed the situation, begun to evacuate the passengers, worked out how many more ambulances and fire tenders were needed, declared a major incident so that the hospitals would be prepared, set up arrangements to direct the arriving police, ambulance and fire crews, set people to work to improve access to the crash site, and then joined in the rescue himself.
Another hero in the aftermath was a controller called Mr Ronald Reeves. When his alarm panel started to flash, before anyone even told him there was an accident, he had diagnosed that a major incident must have happened, and isolated traction current from the whole area. Without his actions the rescuers could have been at constant danger of electric power being restored throughout the first minutes of the evacuation.
To give an indication of the scale of the rescue effort required at an accident of this size, in addition to the police, fire brigade, ambulance, doctors and volunteers, there were one hundred and thirty four council workers involved in the rescue. Why does a rescue need council workers? Accidents don’t tend to happen in convenience, easy to get to places. Whilst the first responders were struggling over fences and down a steep embankment, the council workers were creating better ways to get supplies in and stretchers out. They removed fences, they cut down trees, and they built steps and paths for the other emergency workers. Other council workers were diverting traffic, carrying supplies or stretchers, or providing assistance to the unwounded survivors. When they were finished with that part of the job, they lined up to donate blood.
At 8:10 in the morning, a single out of place wire had killed 35 people. The Hidden report does in fact name names of those involved and if you read the actual words of the report you’ll find that Hidden can be quite trenchant, but also understanding. He recognises that to stop an accident it isn’t enough to spot the violations, but also to find explanations.