Do we pay too much attention to accuracy in forecasting?

 

It sounds like a nonsense question. We forecast in the hope of getting an accurate picture of what the future will be like. We’ll be tempted to tweet less-than-complimentary messages about   weather forecasters if their predicted barbecue weekend turns into a drenching for guests at an outdoor party we’ve organized. Similarly, a forecast that tells us we can expect to sell 2000 units next week when we only sell 1500 will be galling if we wasted resources producing the surplus 500 units that now have to be dumped.

What I’m referring to is the use of accuracy to decide which forecasting method -or which human forecaster – we should employ in a particular circumstance. Typically, we look at the track record of the method or person. Or we provide them with some past data so any patterns can be detected. But then we keep the latest data hidden and see how accurately they can forecast these unseen observations, which are referred to as holdout data. This raises three practical problems.

First, to be confident in our choice of method or person we need to test a large number of their forecasts. One or two seemingly brilliant forecasts are not enough. Forecasters can be lucky –by chance an outrageous forecast coincides with what actually happens. A maverick analyst foresees a recession no one else had seen coming. Or a TV pundit risks ridicule to predict a stock market crash and a week later the market nosedives. In neither case can we conclude that the forecaster has some mystical powers of foresight. Research suggests that the opposite is more likely to be true.

But, if we need to assess accuracy over a large number of forecasts, where do we get them from? In many circumstances, there is a dearth of opportunities for evaluating forecast performance. Events like elections occur relatively infrequently so we have few chances to assess how skilled a politics expert is in identifying the most likely winner. Product life cycles are getting shorter so we usually have only a limited amount of past demand data. Once we’ve used some of this data to detect patterns there is not much left for the holdout observations. This means that, when comparing competing methods, it’s tempting to use just one, two or three unseen observations and then declare one method as the clear winner.

Of course, we can test the expert on lots of elections in different countries, just as we can test a statistical forecasting method on lots of different short life cycle products, if they are available. But if the expert only claims knowledge of the political landscape of one country, or if the products have different demand characteristics, our testing is likely to mislead us.

This leads to a related problem. As they warn in investment advertisements: past performance is no guarantee of future performance. In a rapidly changing world, what worked in the past may be a poor guide to what will work in the future. Forecasting has been compared to steering a ship by studying its wake. Similarly, to focus on past accuracy is to focus on history in an exercise that should be all about the future.

The third problem is how do we measure accuracy? There are a host of different measures, ranging from mean absolute errors to Brier scores, depending on the type of forecast being made. These make different –and often undeclared – assumptions about the seriousness of differences between the forecast and the outcome. As a result, they can lead to contradictory findings. Method A is more accurate than Method B on one accuracy measure, but B is more accurate than A on another measure. Moreover, the assumptions about the consequences of forecast-outcome discrepancies rarely coincide with the true consequences in a given situation –such as a soaking for my party guests, loss of customer goodwill through under production or the costs of surplus stocks.

So what’s the answer? In decision making it’s often said that you should not judge the quality of a decision by its outcome. I might decide to gamble everything I own on a 500 to 1 outsider in horse race and, incredibly, I win. That’s a great outcome. But most people would agree that it was an awful, reckless decision. We should judge the quality of a decision by the process that underpinned it. Was accurate, cost-effective, information gathered? Were all stakeholders consulted? Were risks assessed, and so on? The same should be true of forecasting.

Nearly twenty years ago, the Wharton Professor, Scott Armstrong, led the Forecasting Principles project, which was designed to identify the characteristics of a good forecasting process. The M-Competitions led by Spyros Makridakis have provided further guidance. Later work, such as the Good Judgment project led by Philip Tetlock have added to our knowledge of what makes a ‘good’ forecast –albeit in a more restricted range of contexts. Of course, the validity of the principles uncovered by these projects depends on their ability to improve the likelihood of an accurate or well calibrated forecast. This validity is established by testing them on very large numbers of forecasts under different conditions and using a range of measures.

As we’ve seen, in many practical situations we don’t have access to this richness of data to test each of our candidates for the title of ‘Best Forecasting Method’. So we should spend more time comparing how well they adhere to principles of good forecasting and give less prominence to fortuitous short-term bursts of apparent accuracy or a few unlucky instances that seem to suggest poor performance.

Paul Goodwin

Read more in: Forewarned: A Sceptic’s Guide to Prediction (Biteback Publications).

 

 

 

When is a ‘forecast’ not a forecast?

The answer is: when it’s a target or a decision.

A target is what we would like to achieve, even though we may think it’s unlikely. Companies often set sales targets for their staff to motivate them, not because they think the chosen level of sales is the most probable level that will ensue.

We make a decision when we choose a particular outcome from all those that might occur in the future because we think this choice will bring us the most benefit -not because we think it’s the most probable or the expected outcome. If I’m in a marketing department I might think that sales of 500 units are most likely next month, but I choose to present a ‘forecast’ of 400 units. By keeping the forecasts low it’s likely that I’ll be able to boast to senior managers that our brilliant marketing efforts have enabled us to exceed the forecast. If I do this, I’m not forecasting. I’m decision making.

If I’m an economic forecaster and circumstances are changing I might prefer to stick to my original ‘forecast’ of 2% growth, even though I think that 1.6% is now most likely. Changing my forecast too often might be seen as a sign of incompetence. Alternatively, I might play safe and stick to what others are forecasting –even though I think they are likely to be wrong. That way I won’t be exposed if I’m wrong. This is known as herding.

In other circumstances, it might pay me to deliberately make my ‘forecast’ different to others. If I’m the one person who says there is going to be a recession, when everyone else is forecasting growth, I’ll be seen as a brilliant prophet if the economy goes into a slump. I reason that my ‘forecast’ will soon be forgotten if I’m wrong –and anyway I have a catalogue of excuses ready to explain away the blunder.

Decisions masquerading as forecasts are particularly prevalent when forecasting gets mixed up with politics. In organisations, people often have a temptation to exaggerate their forecasts to obtain more funding for their departments. Even the International Monetary Fund (IMF) is not immune from political influence. There is evidence that governments of countries that are politically aligned with the US tend to receive favourable ‘forecasts’ of growth and inflation when they are coming up for re-election. The US is the major funder of the IMF.

Then there are those regular scary weather ‘forecasts’ in tabloid newspapers. ‘Forecasts’ of snowmageddons lasting for three months or summers that will be chillier than winter are outcomes chosen by editors to sell their papers. They know that readers will have long forgotten these headlines by the time the paper hits the recycling box.

The difference between forecasts, targets and decisions is more than a semantic quibble. It can cause confusion and inefficiency in organisations and mislead people in their decisions. Two eminent forecasters, Michael Clements and Sir David Hendry, define a forecast as simply “any statement about the future”. A more specific definition would be helpful. How about: “an honest expectation of what will occur at a specified time in the future based on information that is currently available”. I am sure this can be improved upon, but at least it’s a start.

Paul Goodwin