Fat-Tailed Probability Distributions
Don't forget the corollary to Murphy's Law, that Murphy was an Optimist.Given the recent negative turn of events in the economy as a whole, and on Wall Street in particular, it is critically important for decision makers to always be aware of those 'Black Swan' occurrences that might be lurking in the future ready to derail the best-laid plans and forecasts. Such incidents are extreme and/or catastrophic in nature. They are the outliers that we think we can safely ignore. These are the events that when we talk about them in hindsight we begin by saying "Who would have thought that ..." Well, although we've all heard of Murphy's Law "if it can go wrong, it will"; as recent events have shown us, Murphy was an optimist!
Although the use of the term 'black swan' probably originated in the 17th century, it has more recently come into vogue as a result of the book "The Black Swan: The Impact of the Highly Improbable" by Nassim Nicholas Taleb. Since Black Swan events are so very rare, we think we can forget about them when making plans and forecasts. The problem is that when they occur, they can wreak complete devastation on an organization or business. They cannot be ignored; they need to be properly evaluated and anticipated so that the organization will survive the event, and hopefully rebound and prosper in the future.
An example of a Black Swan Event would be the destruction and devastation resulting from Hurricane Katrina in New Orleans. Another obvious one would be the attacks of 9/11 and the collapse of the World Trade Towers. And let's not forget the Tylenol tampering scare. At the time, society was shocked in its naivety that anybody would ever do such a thing. And these examples are not meant to imply that a Black Swan need be something negative. There can be tectonic and unexpected events, such as the sudden collapse of the U.S.S.R and the fall of the Berlin Wall, that are generally considered as being of a positive nature.
Although various methods are available to analyze random or unpredictable processes, the analytical modeling techniques that are typically used today are frequently insufficient to meeting the challenge of the Black Swan. In order to provide insight into the risks associated with some situation or decision, modelers have at their disposal tools ranging from "merely" solving a closed-form algebraic equation of a probability distribution, to the use of a full blown Monte Carlo simulation program. But fundamental to many analyses is the Normal Probability Density Function, the familiar 'bell-shaped' curve where the two tails of the graph (at the far left and right) taper down to equal 0. The problem is that since the late 1800s, researchers have recognized that this curve, with its 'near-0' tails does not accurately model Black Swan events.
Such extreme circumstances demand the use of a group of probability functions that are being called "fat-tailed" or stable-Paretian distributions. These, functions, based on the work of Vilfredo Pareto in the late 19th century, give higher probabilities of occurrence to events in the tails of the curve. A specific example of one is the Cauchy Distribution. Whereas the Normal curve approaches 0 at plus or minus 3.5 standard deviations, a Cauchy (depending on its parameters) is still not close to 0 at plus or minus 5 standard deviations.
Gen. Carl Strock of the Army Corps of Engineers, addressed a press conference shortly after Hurricane Katrina regarding the New Orleans levee system. He said, "… when the project was designed … we figured we had a 200 or 300 year level of protection. That means that the event we were protecting from might be exceeded every 200 or 300 years. That is a 0.05% likelihood. So we had an assurance that 99.5% of this would be okay. We unfortunately have had that 0.5% activity here." The General's analysis was based on a Normal distribution. If however a fat-tailed distribution had been used, that 300 years would have been much less, perhaps in the range of 60 to 80 years, and perhaps remedial actions would have been taken to avert the disaster that nearly destroyed the city of New Orleans.
The most generalized version of the equation of a fat-tailed probability function is below.

The parameter is what determines the thickness of the two tails, what is called the kurtosis of the function. Generally, as  decreases, tail thickness increases. In fact, the standard Normal Distribution is merely a special case of this equation where the parameters have certain specific values.
The problem with these fat-tailed distributions is that, depending on the specific values chosen for the parameters (which determine the exact shape of the graph), they may not be solvable algebraically. With the Normal Distribution, it's possible to mathematically solve the equation and state that the probability of a certain event is some specific value. However, fat-tailed distributions do not lend themselves to this kind of a closed-form analysis. To be able to estimate the chances of specific events, numerical methods such as Monte Carlo simulations or binomial decision trees are required.
So the conclusion is that in analyzing any decision or plan that involves random processes, it is critical to realize and anticipate both the worst and best case scenarios. The business person or decision maker should discuss with the modeler/analyst what might really happen under a wide range of possible scenarios. Realizing the drawbacks of the Normal Distribution, together they need to decide on whether or not a probability function with a fat-tail (embodying this kind of "the sky's the limit" thinking) is appropriate. And if a fat-tail distribution is required, the modeler/analyst must decide on the best numerical technique to address the needs of the decision maker.
This article was written by John Hughes, Profit Point's Production Scheduling Practice Leader.
To learn more about our supply chain optimization services, contact us here.
Labels: John Hughes, Operations Research, Risk Management, Supply Chain Agility
Friday, June 05, 2009
Understanding Your Risks with Monte Carlos
What is a Monte Carlo model and what good is it? We’re not talking a type of car produced by General Motors under the Chevy nameplate. “Monte Carlo” is the name of a type of mathematical computer model. A Monte Carlo is merely a tool for figuring out how risky some particular situation is. It is a method to answer a question like: “what are the odds that such-and-such event will happen”. Now a good statistician can calculate an answer to this kind of question when the circumstances are simple or if the system that you’re dealing with doesn’t have a lot of forces that work together to give the final result. But when you’re faced with a complicated situation that has several processes that interact with each other, and where luck or chance determines the outcome of each, then calculating the odds for how the whole system behaves can be a very difficult task.Let’s just get some jargon out of the way. To be a little more technical, any process which has a range of possible outcomes and where luck is what ultimately determines the actual result is called “stochastic”, “random” or “probabilistic”. Flipping a coin or rolling dice are simple examples. And a “stochastic system” would be two or more of these probabilistic events that interact.
Imagine that the system you’re interested in is a chemical or pharmaceutical plant where to produce one batch of material requires a mixing and a drying step. Suppose there are 3 mixers and 5 dryers that function completely independent of one another; the department uses a ‘pool concept’ where any batch can use any available mixer and any available dryer. However, since there is not enough room in the area, if a batch completes mixing but there is no dryer available, then the material must sit in the mixer and wait. Thus the mixer can’t be used for any other production. Finally, there are 20 different materials that are produced in this department, and each of them can have a different average mixing and drying time.
Now assume that the graph of the process times for each of the 8 machines looks somewhat like what’s called a ‘bell-shaped curve’. This graph, with it’s highest point (at the average) right in the middle and the left and right sides are mirror images of each other, is known as a Normal Distribution. But because of the nature of the technology and the machines having different ages, the “bells” aren’t really centered; their average values are pulled to the left or right so the bell is actually a little skewed to one side or the other. (Therefore, these process times are really not Normally distributed.)
If you’re trying to analyze this department, the fact that the equipment is treated as a pooled resource means it’s not a straightforward calculation to determine the average length of time required to mix and dry one batch of a certain product. And complicating the effort would be the fact that the answer depends on how many other batches are then in the department and what products they are. If you’re trying to modify the configuration of the department, maybe make changes to the scheduling policies or procedures, or add/change the material handling equipment that moves supplies to and from this department, a Monte Carlo model would be the best approach to performing the analysis.
In a Monte Carlo simulation of this manufacturing operation, the model would have a clock and a ‘to-do’ list of the next events that would occur as batches are processed through the unit. The first events to go onto this list would be requests to start a batch, i.e. the paperwork that directs or initiates production. The order and timing for the appearance of these batches at the department’s front-door could either be random or might be a pre-defined production schedule that is an input to the model.
The model “knows” the rules of how material is processed from a command to produce through the various steps in manufacturing and it keeps track of the status (empty and available, busy mixing/drying, possibly blocked from emptying a finished batch, etc.) of all the equipment. And the program also follows the progress and location of each batch. The model has a simulated clock, which keeps moving ahead and as it does, batches move through the equipment according to the policies and logic that it’s been given. Each batch moves from the initial request stage to being mixed, dried and then out the back-door. At any given point in simulated time, if there is no equipment available for the next step, then the batch waits (and if it has just completed mixing it might prevent another batch from being started).
What sets a Monte Carlo model apart however is that when the program needs to make a decision or perform an action where the outcome is a matter of chance, it has the ability to essentially roll a pair of dice (or flip a coin, or “choose straws”) in order to determine the specific outcome. In fact, since rolling dice means that each number has an equal chance of “coming up”, a Monte Carlo model actually contains equations known as “probability distributions”, which will pick a result where certain outcomes have more or less likelihood of occurrence. It’s through the use of these distributions, that we can accurately reflect those skewed non-Normal process times of the equipment in the manufacturing department.
The really cool thing about these distributions is that if the Monte Carlo uses the same distribution repeatedly, it might get a different result each time simply due to the random nature of the process. Suppose that the graph below represents the range of values for the process time of material XYZ (one of the 20 products) in one of the mixers. Notice how the middle of the ‘bell’ is off-center to the right (it’s skewed to the right).

So if the model makes several repeated calls to the probability distribution equation for this graph, sometimes the result will be the 2.0-2.5 hrs, other times 3.5-4.0 hrs, and on some occasions >4hrs. But in the long run, over many repetitions of this distribution, the proportion of times for each of the time bands will be the values that are in the graph (5%, 10%, 15%, 20%, etc.) and were used to define the equation.
So to come back to the manufacturing simulation, as the model moves batches through production, when it needs to determine how much time will be required for a particular mixer or dryer, it runs the appropriate probability equation and gets back a certain process time. In the computer’s memory, the batch will continue to occupy the machine (and the machine’s status will be busy) until the simulation clock gets to the correct time when the process duration has completed. Then the model will check the next step required for the batch and it will move it to the proper equipment (if there is one available) or out of the department all together.
In this way then, the model would continue to process batches until it either ran out of batches in the production schedule that was an input, or until the simulation clock reached some pre-set stopping point. During the course of one run, the computer would have been monitoring the process and recording in memory whatever statistics were relevant to the goal of the analysis. For example, the model might have kept track of the amount of time that certain equipment was blocked from emptying XYZ to the next step. Or if the aim of the project was to calculate the average length of time to produce a batch, the model would have been following the overall duration of each batch from start to finish in the simulated department.
The results from just one run of the Monte Carlo model however are not sufficient to be used as a basis for any decisions. The reason for this is the fact that this is a stochastic system where chance determines the outcome. We can’t really rely on just one set of results, because just through the “luck of the draw” the process times that were picked by those probability distribution equations might have been generally on the high or low side. So the model is run repeatedly some pre-set number of repetitions, say 100 or 500, and results of each of these is saved.
Once all of the Monte Carlo simulations have been accumulated, it’s possible to make certain conclusions. For example, it might turn out that the overall process time through the department was 10 hrs or more on 8% of the times. Or the average length of blocked time, when batches are prevented from moving to the next stage because there was no available equipment, was 12 hrs; or that the amount of blocked time was 15hrs or more on 15% of the simulations.
With information like this, a decision maker would be able to weigh the advantages of adding/changing specific items of equipment as well as modifications to the department’s policies, procedures, or even computer systems. In a larger more complicated system, a Monte Carlo model such as the one outlined here, could help to decrease the overall plant throughput time significantly. At some pharmaceutical plants for instance, where raw materials can be extremely high valued, decreasing the overall throughput time by 30% to 40% would represent a large and very real savings in the value of the work in process inventory.
Hopefully, this discussion has helped to clarify just what a Monte Carlo model is, and how it is built. This kind of model accounts for the fundamental variability that is present is almost all decision making. It does not eliminate risk or prevent a worst-case scenario from actually occurring. Nor does it guarantee a best-case outcome either. But it does give the business manager added insight into what can go wrong or right and the best ways to handle the inherent variability of a process.
This article was written by John Hughes, Profit Point's Production Scheduling Practice Leader.
To learn more about our supply chain optimization services, contact us here.
Labels: Enterprise Resource Planning, Infrastructure Planning, John Hughes, Operations Research, Optimization, Risk Management, SC Operations Planning, Scheduling





