Tuesday, July 24, 2012

When Summing the Parts Doesn't Add Up


We work with companies to understand the value of discovering and influencing the uncertainties associated with their strategic and project decisions.  Usually there are few key uncertainties whose range swing the value of the decision by a large margin.  Understanding those drivers, their impacts, and the result of influencing their outcome is what we call the Insights to Success.

However, sometimes the number of uncertainties that must “go right” can be so large that even a small degree of risk for each one makes the project appear hopeless.

Consider this problem – for the Apollo spacecraft to successfully go to the moon and back, literally thousands of things had to go right.  If any single item failed, then the launch, landing, or return would fail and a disaster would have ensued.  The problem comes when the many uncertainties are totaled into an outcome.  Ten uncertainties, each with a 99% chance of success, results in a 90% chance of overall success (0.9910).  One hundred uncertainties, each with the same chance of success results in only a 37% chance of overall success.  Increase the number of uncertainties to one thousand, and the flight is pretty much guaranteed to fail.

So we argue that the chance of success for each part is actually greater than 99%.  Let’s assume the chance of success for each part is actually 99.999%.  The same problem occurs as the number of parts increase.  When we get to 100,000 parts with a 99.999% chance of success for each, the total chance of success is still almost nonexistent.

This problem also occurs in business uncertainty assessments.  When the business case is broken into a large number of uncertainties, the overall project may start to appear to have a high degree of failure.

I call this the Un or “U to the n” problem.  The uncertainty, raised to the number of uncertainties, starts to make the resulting value small and can quickly disconnect the problem’s relevance from the real world.

This is, in part, because we humans have a difficult time grasping very large or very small numbers.  Our reference points are often influenced by our macro experiences.  We know we are late for the airport about 1 in 100 times, and since that doesn’t seem like very often, a 1% failure “feels” like a low probability.  Trying to get our minds around a 0.0000001 chance of failure is almost impossible for most of us to grasp.

So how do we assess the very small probabilities of failure?  There are two approaches I’ve seen which works.  With the first approach is to aggregate the uncertainties into groups or systems, and then assess the probability of the system succeeding or failing.  Compare this assessment to the uncertainty obtained from calculating the individual uncertainties and see if the numbers are close.  If they are not, perform a logic comparison of the system assessed uncertainty to the calculated individual uncertainty effect.  Why are they different?  Which individual uncertainties are driving the number to be off when compared to the system assessment?  Also, it’s invalid to state at this point that one number or the other is “right”.  It’s more important to resolve the differences to get to the correct assessments.

The second approach uses comparative assessments.  Compare the likelihood of an individual assessment against one or more uncertainties with a known track record. For example, rank the following in the order of likelihood:
·         A bolt shearing off from the reactor housing
·         The fuel source becoming contaminated
·         Being more than an hour late for work
·         Finding a wrench forgotten in the fuselage
·         Having your air conditioning fail in your house, car, and office at the same time

This approach helps you calibrate your assessments into the right range of numbers before assigning a set of probabilities.

Sometimes, both approaches are necessary to fully grasp and analyze a set of uncertain events.  By testing both approaches you will be applying both the intuitive and reason approaches to the problem.  Some team members will be better at one than the other, but by testing both, you will have a more robust and valid assessment.

Of course, the prevention of failure is often the driver for redundant systems.  For each backup, the probability of failure is decreased the same amount as lowering the n in the Un by one.  That is, it’s a logarithmic benefit and indicates the mathematical basis for the the high value of applying redundant safety systems.  But that’s a topic for another time.



No comments:

Post a Comment