# Pareto Analysis

How to separate the trivial few from the significant many

**Pareto Analysis is named after Vilfredo Pareto. He was an Italian 19th-century sociologist and economist. In 1897, he argued that the distribution of income and wealth is uneven and follows a regular, mathematical pattern.**

In 1907, the American economist M. C. Lorenz expressed a similar theory diagrammatically. Both demonstrated that by far the largest share of a nation's wealth is owned by a very small proportion of the people. But it was Juran (one of the original quality gurus who went to Japan in the 1950s) who realised that Lorenz's diagram and Pareto's formula - also known as the 80-20 rule - could be observed in other fields. For example:

- 80 per cent of customer complaints could come from 20 per cent of the customers;
- 80 per cent of accidents could be generated by one age group;
- 80 per cent of the cost could be accounted for by 20 per cent of the parts.

It doesn't have to be an exact 80-20 split for every set of circumstances you investigate. It could be 90-10, 70-30 or 60-40. The important thing to remember is that where there is an uneven distribution of causes and effects we can separate the 'significant few' from the 'trivial many' and put our efforts into where they will have the biggest effect. The basic concept behind a Pareto analysis involves ranking data. Similar to a bar chart, a Pareto diagram shows a distribution, but it also necessitates ordering information from the largest to the smallest: the most significant to the most trivial.

Often the raw data is recorded on the left vertical axis with the percentage scale on the right vertical axis. Ensure that the two axes are drawn to the same scale so that the 100 per cent corresponds to the total on the left-hand scale.

Pareto diagrams can be used with or without a cumulative line. When cumulative lines are used, they represent the sum of the vertical bars, as if they were stacked on each other going from left to right. In this way you can answer questions such as: 'Which causes, when taken together, make up 80 per cent of the problem?' or 'What percentage of the total is accounted for by the first three categories?'

**Constructing a Pareto diagram**

The Pareto diagram is constructed in five steps:

- Decide how the data should be classified
- Use a check chart to collect the data
- Summarise data from the check chart
- Construct a bar graph with the tallest bar on the left and the shortest on the right
- Plot cumulative amounts using a single-line

Let me show you an example of how this works. Can you imagine a computer network which keeps on breaking down? To investigate this problem you would have to start by collecting data. The tool that's usually used for this is called a check chart. It's just a form that has been ruled up so that you can collect the data you want.

The breakdowns will be given an incident number in the first column, times at which the network breaks down will be shown in the second column, the times at which the network is up again will be in the third column and the downtime for each outage will be shown in the fourth column.

Incident | Off | On | Downtime | A | B | C | D | E |
---|---|---|---|---|---|---|---|---|

1 | 08:00 | 08:02 | 2 | X | ||||

2 | 09:16 | 09:20 | 4 | X | ||||

3 | 10:10 | 10:13 | 3 | X | ||||

4 | 11:05 | 11:06 | 1 | X | ||||

5 | 11:15 | 11:18 | 3 | X | ||||

6 | 12:00 | 12:02 | 2 | X | ||||

7 | 12:45 | 12:46 | 1 | X | ||||

8 | 3:09 | 13:11 | 2 | X | ||||

9 | 14:40 | 14:46 | 6 | X | ||||

10 | 14:50 | 14:51 | 1 | X | ||||

11 | 15:00 | 15:01 | 1 | X | ||||

12 | 15:19 | 15:22 | 3 | X | ||||

13 | 16:04 | 16:06 | 2 | X | ||||

14 | 16:32 | 16:34 | 2 | X | ||||

15 | 16:50 | 16:52 | 2 | X | ||||

16 | 17:01 | 17:02 | 1 | X | ||||

17 | 17:59 | 18:01 | 2 | X | ||||

18 | 18:28 | 18:30 | 2 | X | ||||

19 | 19:03 | 19:09 | 6 | X | ||||

20 | 20:23 | 20:25 | 2 | X | ||||

21 | 22:01 | 22:46 | 45 | X | ||||

22 | 23:17 | 23:20 | 3 | X | ||||

23 | 01:00 | 01:06 | 6 | X | ||||

24 | 02:02 | 02:03 | 1 | X | ||||

25 | 06:18 | 06:20 | 2 | X | ||||

26 | 07:45 | 07:46 | 1 | X | ||||

Downtime | 106m | 30m | 24m | 4m | 3m | 45m | ||

Stoppages | 13 | 5 | 4 | 3 | 1 |

The columns headed by A, B, C, D and E represent the reasons for the breakdowns. So, the first time that the network broke down was for Reason A, the second time was also for Reason A, but the third stoppage was for Reason B. The network broke down a total of 26 times and the total downtime was 106 minutes.

As a mass of figures does not mean much in themselves, the next step is to display the data.

When you do a Pareto analysis, the most significant effect is shown on the left-hand side and the least significant effect is shown on the right-hand side.

A percentage scale is added to the chart, so the 26 stops correspond to 100 per cent and Reason A accounts for 50 per cent of the stops.

If you add reason B to reason A, you will have accounted for nearly 70 per cent of the problem. You can see that if you were to put as much effort into eliminating the two least significant causes as you do into the two most significant causes, you would only solve 15 per cent of the problem as opposed to 70 per cent.

Deciding how the data should be classified is very important as it can make a great difference to the analysis. In the example above we decided that the data should be classified according to the number of breakdowns attributable to each of the causes. This classification is fine if the problem you are investigating is one of the computer network breaking down too often. But what if the problem were one of the network being down for too long? In this case, you would be interested in the duration of the breakdowns and which causes the longest outages.

Here you can see that Reason A has dropped from first place and Reason E has come up from behind to take first place. This underlines the importance of working on the right problem because, depending on the problem, you would attack different sets of causes.