Becoming a better forecaster
In recent years I’ve increasingly been pondering the question of how to forecast. Philip Tetlock’s ‘Superforecasting’ has been a key reference; applying those lessons remains a work in progress.
My role for the last 15 years has been heavily geared towards making forecasts. It’s something that I kind of drifted into. My early years at the Reserve Bank were in financial markets rather than the economics department, and while I did a reasonable amount of econometrics at uni, I haven’t had experience in building or running large forecasting models. My approach tends to be piecemeal and based on intuition as much as anything else.
It was only after a long stretch of actually doing the job, including a stint at managing a team of forecasters, that I began to turn my mind to the question of how to do it well. That’s not just about making better forecasts – there’s never been a shortage of motivation for that – but also the process: whether I’m spending my limited time on the things that matter most.
My touchstone for this has been Philip Tetlock’s book Superforecasting: The Art and Science of Prediction. I first read this around the time it was published in 2015, and I’ve come back to it several times over the years as I’ve thought about how to apply it to my own work.
Firstly some background, for those who aren’t familiar with it. Tetlock’s work began in the 1980s, with a long-term study on political predictions. He gathered 284 people who were deemed to make a living from commenting or offering advice on political and economic trends, and asked them to give probabilities for a range of possible world events. (The study was anonymous but, perhaps tellingly, the big-name pundits at the time all declined to participate.) He also collected data on how they came to their predictions – their political leanings, how they incorporated new information, how they reacted to successes and failures, and more.
After 20 years and over 80,000 individual forecasts, his findings were published in 2005 as Expert Political Judgment. The results were just as you might have feared: on average the ‘experts’ had done about as well as random guessing, or as the cliché goes, they were no better than a dart-throwing chimp.
But the crucial part here is “on average”. Tetlock found that there was a subset of forecasters who consistently outperformed the rest. And the data revealed that what set them apart was not what they thought, but how they thought. Most importantly, he was convinced that the practices of the top forecasters were, at least to some degree, teachable.
The next step came a few years later. The US intelligence community, stung by the failures around Iraq’s non-existent weapons programme, set up IARPA, an agency to fund cutting-edge research to support their functioning. In 2011 IARPA began to run prediction ‘tournaments’, aimed at developing and refining methods of predicting political and economic trends. They invited five academic teams to compete against the internal control groups.
Tetlock’s Good Judgment Project was one of the five teams. He put out a call for thousands of volunteers from all walks of life, and after some screening for the traits of good forecasters, they were put to work on a range of questions. The team’s forecasts were an average of the individual predictions, drawing on ‘the wisdom of crowds’ but weighted more towards the top performers.
The results were decisive. In the first year, the GJP team substantially outperformed the others. In the second year it increased its winning margin, even beating the intelligence professionals who had access to classified information. By the third year, IARPA didn’t bother inviting the other teams back.
Superforecasting isn’t a recipe for success. The future is inherently uncertain, and even the best forecasters are going to spend a lot of time being wrong. But because of that, the difference between good and great really comes down to marginal improvements in accuracy in the short term, which in the long run can add up to something meaningful.
I’m not going to recap all of the book here; you can find some good summaries in places like this blog for example. What follows are the parts that have resonated with me the most, with some examples of how they’ve applied over the years.
Keep score
The hard truth is that most forecasters have no conception of what their track record looks like – against others, or even against the dart-throwing chimp. So the first obvious step to improving your performance is to measure it.
But that immediately runs into a problem that Tetlock encountered in his early work: in scoring the success of a forecast, it’s often not clear what the forecast actually was. Political commentators are especially adept at making ‘predictions’ that are so vague as to be meaningless. Or when their predictions don’t pan out, they’ll play semantics to argue that they were actually proven right, in a way.
Another way is to describe something as a risk, so if it doesn’t pan out you can say “I only said it was a risk”. This may not even be intentionally misleading. For a time, the CIA used terms like ‘a serious possibility’ in its intelligence briefings, until it was discovered that people had wildly different ideas about what that meant when asked to express it as a probability.
So to assess a forecast’s accuracy, you have to be very specific about what the forecast was. When it comes to economic forecasts that’s often self-evident, especially in the near term. A question like “what will the inflation rate be at the end of 2023?” has no room for obfuscation – it’s a discrete number at a fixed point in time.
But let’s take a broader prediction such as: “inflation will remain uncomfortably high for the foreseeable future”. That happens to fit with my current view, but at what point would I be proven wrong? What do I mean by ‘uncomfortably high’ – something like recent outturns, or above the top of the Reserve Bank’s 1-3% target range, or above the 2% midpoint? How long will it stay there? Is ‘declining slowly’ the same thing as ‘remaining high’? Some of these outcomes are much less worrying than others.
It’s fine to describe something as a risk to your central view. But it’s not all that informative if you’re just doing it to cover your bases. If there’s a really bad outcome that you think has a 10% chance of happening, you should be clear about those odds – and convey what the other 90% of outcomes look like.
Perform triage
Like a battlefield doctor, forecasters need to make decisions about where to devote their limited time and resources. Some questions are already answered well enough; some are beyond salvaging. The sweet spot is in those questions where a little extra effort will provide a meaningful payoff.
An example of a too-hard question is: will China invade Taiwan by 2027? We’d certainly like to know – there would be massive economic implications if it happened. But we’re just not going to make any headway on that question right now. There are too many factors that could intervene in the meantime – for one thing, it will depend on what US-China relations are like by that point, which in turn will depend on who wins the next presidential election.
At the other end of the scale is a question like: “what will inflation be for the September 2023 quarter?” Economists already have a decent track record on near-term CPI; most forecasts are within 0.2% of the result in normal times. (Not so much recently – higher inflation is also more variable inflation.) With some extra effort – say by doing your own price surveying on store shelves, or writing some bots to scrape price data from websites – you might be able to get that margin of error down to 0.1%. But at some point, it starts to look like the drunk who’s searching for his lost keys under the lamppost “because that’s where the light is”. Our understanding of the economy doesn’t hinge on whether inflation is 6.1% rather than 6.0%. When the forecast is good enough, move on.
My broad sense is that economists are pretty good at forecasting the next three months; once you get to around two years ahead, everyone is pretty poor and it’s not from lack of trying. The sweet spot – where you’ll get the most bang for your buck – is at the six- to twelve-month horizon. That’s where some thinking about the bigger picture is required; faster and more frequent data releases won’t be what makes the difference.
Break the problem down
When economists set themselves the task of forecasting ‘the economy’, they don’t get to pick and choose which things they want to forecast – everything has to add up to a consistent story. So there are inevitably some questions that might seem to fall into the too-hard basket, but you have to take a view on them anyway.
When that happens, it can be useful to break the question down into smaller, more answerable questions – Tetlock describes this as Fermi-izing the problem. Some of those questions might remain unanswerable, but in dealing with the more answerable ones, we can at least narrow the range of possibilities. As he puts it:
It is amazing how many arbitrary assumptions underlie pretty darn good forecasts. Our choice is not whether to engage in crude guesswork; it is whether to do it overtly or covertly.
Look for analogies
Tetlock describes this as balancing outside views – ‘how is this event similar to other events’ – with inside views – ‘how is this event different to others’. My interpretation is that a good forecaster needs to have those sorts of analogies in their back pocket to draw out as necessary. That requires both some reading of economic history, and some curiosity as to what goes on in other fields.
A tricky example of this was the Covid-19 pandemic in 2020. It was obvious that there was going to be a sharp drop in GDP initially; the question was what would happen beyond that. Firstly taking the outside view: we had plenty of examples of past recessions, and how slowly the subsequent recoveries played out. However, there were very few past examples of pandemics, and – rather unhelpfully for the narrative – they didn’t lead to recessions. (They tended to rip through the population too quickly to have a meaningful impact on activity.)
Taking the inside view, the obvious difference in this pandemic was in the government’s response. Shutting down large swathes of the economy was simply unheard of – as was paying workers to stay at home. It was clear that this was going to be a different kind of recession, but how it would differ was another matter.
It turns out that the better analogy was not the Spanish Flu, but the Second World War. Resources that were previously devoted toward meeting consumer demand were directed by the government to other purposes – military production in the case of war; and in the case of Covid, to doing nothing. That led to the appearance of a recession in private sector activity. But unlike other recessions, the underlying demand was still there – people were still getting paid after all. Once the crisis period was over, that demand was unleashed, onto a world where supply chains had been massively distorted and would take some time to return to normal.
Higher prices are another rationing mechanism.
In this case, giving more thought to the inside view – ‘what’s different this time’ – might in turn have led us to a different outside view, focusing on past examples of government redirection of the economy rather than disease outbreaks.
Be Bayesian
Tetlock observed that many of the ‘superforecasters’ in his group took what could be called a Bayesian approach to their forecasts. In a formal sense, Bayes’ theorem explains how to update your views when new information arrives:
P(A) is called the ‘prior’ – your initial probability that a certain statement or view of the world is correct. When something new happens – call it B – you need to assess the probability of that happening, P(B), but also the probability of it happening if your view of the world were correct, P(B|A). That then gives you your new estimate P(A|B) – the probability that your view of the world is correct given that event B happened.
Of course Tetlock’s superforecasters didn’t do this literally. Indeed, what he found was that while they were all highly numerate, they generally didn’t do a lot of statistical analysis. Rather, it reflects the way that they went about updating their predictions: when new information came along, they wouldn’t just jump to a new conclusion, but would weigh it against the accumulated evidence that had formed their prior view.
In practice, a Bayesian approach boils down to asking yourself two questions:
1. How strong is my existing view?
2. What sort of evidence would change my mind?
Here’s my real-life example. By early 2021 the economy was well past the Covid lockdown and was broadly back to pre-lockdown levels of activity. There were plenty of stories about businesses facing cost increases, but not a lot of evidence that they were able to pass these on into their own prices. As late as March 2021 the inflation rate was still 1.5%, in the lower half of the Reserve Bank’s target range. Temporary supply-side shocks, such as the gumming-up of global supply chains, seemed to be the dominant factor.
But by the middle of the year, I was hearing more stories from businesses about having their staff poached by competitors, and the need to offer big pay increases just to retain their best people. I decided in advance that the next QSBO survey would be critical – and not the cost measures, but whether this ‘churn’ in workers was a widespread thing. That evidence came through in the survey big-time: in the space of a few months, labour turnover had shot up to a record high. At that point I flipped my view. This was an economy that was not just recovering, but was overheating. The inflationary effects would not be ‘transitory’, and they would require a different policy response.
Why put so much weight on one data point? It comes down to those two questions I noted above. On the second one, the evidence that mattered was not the rise in businesses’ costs, because that could be either a demand or a supply story. But the rise in labour market churn could only be consistent with strong demand – businesses were bidding up to attract workers because they felt that they could afford to do so. As for the first question, to be honest the view was already wavering. The stories of rising costs weren’t decisive, but they were gradually eroding my confidence in the ‘transitory’ inflation story.
Update frequently and incrementally
This point follows from the previous one. In the IARPA tournaments, the superforecasters would update their predictions frequently – in some cases dozens of times for each question – before the tournament closed. As new information arrived they would shade their views accordingly, but mostly in small increments – say from 60% confidence to 65%. And it would happen in both directions; sometimes they might revise their view down and then up again in the space of a week. But the result was that they spent more time being closer to the truth than those who only made big changes (if at all).
This is the one that has probably required the biggest change in mindset for me. As a forecaster in the public eye, you tend to develop and market a narrative to go with your forecasts. But that storytelling element can end up being a rod for your own back. Any change in the forecasts feels like it needs an explanation of why the narrative has changed, which means writing another report, making sure it’s distributed to all of the right people… and if you do it too often, you’re not only creating more work for yourself, but you might look like a flip-flopper. It’s easier to save things up until you’re ready to announce a big change, and to downplay any news that comes along in the meantime.
What I’ve come to realise, especially during the rapidly-evolving Covid years, is that customers aren’t policing your past statements in that way, they just want to know what the view is now. That means there’s less stigma to frequent forecast changes than you might think. What you need to do is make it easy for people to keep track of those changes, and if they’re not material, no great explanation is required. Most economics writing, I suspect, is aimed at other economists, rather than the people who might actually be using this information to make decisions.
Finally…
In sharing all of this, I should say that I don’t claim to be anything close to a ‘superforecaster’. For the record, I’ve been among the top forecasters in New Zealand for the short-term stuff – next-quarter GDP, CPI, employment – but that’s not where I want to focus my efforts. As I noted earlier, the sweet spot, where I think there’s room for some real progress, is at the six- to twelve-month horizon.
The art of forecasting is something that I will come back to in future posts. There’s a lot more that has been written on this topic, ranging from more statistical to more judgmental approaches, but I have a bit more reading and absorbing to do first.
(One last note: Good Judgment runs prediction tournaments that are open to the public. I briefly signed up for it a long time ago, but the focus is obviously on overseas issues, and as a New Zealand specialist I quickly found myself out of my depth. With a bit more time on my hands now, I may give it another go.)