BIS Working Papers
Why you should use the Hodrick-Prescott filter – at least to generate credit gaps
by Mathias Drehmann and James Yetman
Monetary and Economic Department
JEL classification: demography, ageing, inflation, monetary policy
Keywords: E31, E52, J11
This publication is available on the BIS website (www.bis.org).
© Bank for International Settlements 2017. All rights reserved. Brief excerpts may be reproduced or translated provided the source is stated.
ISSN 1020-0959 (print)
ISSN 1682-7678 (online)
Why you should use the Hodrick-Prescott filter – at least to generate credit gaps
by Mathias Drehmann and James Yetman
The credit gap, defined as the deviation of the credit-to-GPD ratio from a Hodrick-Prescott (HP) filtered trend, is a powerful early warning indicator for predicting crises. Basel III therefore suggests that policymakers should use it as part of their countercyclical capital buffer frameworks. Hamilton (2017), however, argues that you should never use an HP filter as it results in spurious dynamics, has end-point problems and its typical implementation is at odds with its statistical foundations. Instead he proposes the use of linear projections. Some have also criticised the normalisation by GDP, since gaps will be negatively correlated with output. We agree with these criticisms. Yet, in the absence of clear theoretical foundations, all proposed gaps are but indicators. It is therefore an empirical question which measure performs best as an early warning indicator for crises – the question we address in this paper. We run a horse race using quarterly data from 1970 to 2017 for 42 economies. We find that no other gap outperforms the baseline credit-to-GDP gap. By contrast, credit gaps based on linear projections in real time perform poorly.
Keywords: early warning indicators; credit gaps; HP filter
JEL classifications: E44, G01
Excessive credit growth has long been recognised as integral to financial booms and busts (Minsky, 1982; Kindleberger, 2000). However, what constitutes "excessive" credit growth remains undefined. Borio and Lowe (2002) propose a credit-to-GDP gap measured by the deviations of the credit-to-GDP ratio from a one-sided Hodrick-Prescott (HP) filter with a large smoothing parameter (400 000 for quarterly data). Borio and Drehmann (2009), Drehmann et al (2010) and Drehmann et al (2012) revisit the gap in light of the crisis and do extensive comparisons of its early warning indicator (EWI) properties for systemic banking crises with other variables. They identify the credit-to-GDP gap as the best single EWI. This work underpins the choice of the Basel Committee for Banking Supervision to single out the credit-to-GDP gap as a useful guide for setting countercyclical capital buffers (BCBS, 2010).
But the credit-to-GDP gap is only one possible indicator of excessive credit growth.2 Following the work of Jorda et al (2011), for example, the academic literature has mainly relied on medium-term growth rates in credit-to-GDP. In addition, the gap has been challenged on conceptual grounds. We address two such challenges here.
Most importantly, many have criticised the use of the HP filter to derive the gap. It has long been known that the HP filter has serious problems. These are succinctly summarised by Hamilton (2017). In particular, the HP filter results in spurious dynamics that are not found in the underlying data, results in filtered data with properties that differ between the middle and ends of the sample, and its typical implementation is at odds with its statistical foundations. Hamilton therefore concludes that you should never use the HP filter for any purpose, including for deriving credit-to-GDP gaps. He proposes the use of linear projections as an alternative to derive deviations from trends.
In addition, some authors have criticised the use of GDP to normalise the level of credit in the economy. For instance, Repullo and Saurina (2011) point out that the credit-to-GDP gap will tend to be negatively correlated with GDP, and its use could exacerbate the procyclicality of macroprudential policy. Similar problems were highlighted by the Basel Committee (BCBS, 2010). Real-credit-per-capita has been proposed as an alternative measure to overcome this potential drawback.
From a conceptual perspective we agree with these criticisms. But, in the absence of clear theoretical foundations, any proposed gap measure should be treated as only an indicator. What should matter to policymakers is the relative performance of different possible measures, which can be assessed empirically.
In this paper, we therefore run a horse race between different proxies for excessive credit. Given that excessive credit is unobservable, we assess performance based on how well different credit gaps predict systemic banking crises. To keep the analysis concise, we consider eight gaps: two methods of normalising nominal credit (either by nominal GDP or by calculating real-credit-per-capita), combined with four means of deriving "gaps" (relative to an HP trend; 20-quarter changes; and linear projections using either the full sample or (quasi) real time information).3
We find that that no other gap measure outperforms the baseline credit-to-GDP gap. In fact, across many forecast horizons and sub-sample specifications, it turns out to have the highest "area under the
But while the performance of the baseline gap is robust, despite the criticisms it has received, there is little meaningful difference between it and that of gaps based on 20 quarter growth rates and/or using population to normalise credit, instead of GDP. AUCs across these different gaps are never statistically significantly different.
The real time linear projection variants, however, perform consistently poorly.4 Real time projection gaps are never the best performing indicator at any horizon. Generally, they do not even have any statistically significant forecasting power. This is not the case if we use the full sample to estimate the linear projection. In this case, performance based on credit-to-GDP often exceeds the other gaps. The divergence in performance is due to the instability of the estimated linear projection coefficients in our real time applications. However, from a policy perspective, the real time gaps are the relevant ones to consider, as policymakers can only use the information they have available at each point in time to predict a crisis. Hence we find that linear projections are ill-suited to generating credit gaps for crisis prediction by policymakers.
In the next section, we outline the two challenges to our baseline credit gap measure that we examine. Section 3 contains our methodology for comparing the different measures in light of the objective, and Section 4 the results. Robustness exercises are discussed in Section 5, before we conclude.
Our baseline credit gap was proposed by Borio and Lowe (2002). They suggested measuring the credit gap as deviations of the credit-to-GDP ratio from a one-sided Hodrick-Prescott (HP) filter with a large smoothing parameter (400 000 for quarterly data). This measure has been subject to a number of criticisms. Here we outline two prominent ones: namely that the normalisation is problematic, and the HP filter has undesirable properties.
In order to turn the nominal level of credit into a magnitude that is comparable both across time and across countries, it must be normalised in some manner. In our baseline measure, the normalisation is to divide nominal credit by nominal GDP. Repullo and Saurina (2011) suggest that this could be problematic, since it would suggest reducing capital requirements when GDP growth is high and increasing them when GDP growth is low, hence exacerbating the pro-cyclicality of regulations related to bank capital.5 As discussed, this was already identified as a potential problem by the Basel Committee (2010), which identified it as one of the reasons why policymakers' judgement is necessary when setting the countercyclical capital buffer. Jorda et al (2016) and Richter et al (2017) use real- credit-per-capita as their measure of normalised credit instead.
The other key component to measuring a credit gap is the definition of the gap - or, equivalently, defining the trend against which credit will be compared.
Following the original work by Borio and Lowe (2002), the long-term trend of the credit-to-GDP ratio is often calculated by means of a one-sided (ie real time) HP filter. The filter is run recursively, with an expanding sample each period. Thus, a trend calculated for, say, end-1998 only takes account of information up to 1998 even if this calculation is done in 2018. The HP filter also uses a much larger smoothing parameter - 400 000 for quarterly data - than the one employed in the business cycle literature. This choice can be rationalised by the observation that credit cycles are on average about four times longer than standard business cycles and crises tend to occur once every 20-25 years (Drehmann et al, 2010).6
Hamilton (2017) points out some serious potential shortcomings with the HP filter in general, in particular that:
To avoid these drawbacks, Hamilton suggests an alternative using a "linear projection" based on estimating the equation:
He suggests that a value of h corresponding to five years (ie h=20 with quarterly data, as we examine here) for applications to debt (or credit) cycles may be appropriate. The credit gap by this method is the estimated residual in the above equation, ie:
In implementing this method, Richter et al (2017) go one step further and normalise the residuals by their standard deviation, (av ).
An alternative approach that we also examine is to detrend by computing growth rates. Taking the 20-quarter change in credit/GDP or real-credit-per-capita provides a filter-free way of extracting a credit gap measure. This approach has been used, for example, in Jorda et al (2011, 2017).
In the following section, we outline a horse race between different measures of the credit gap to see how they compare.
As discussed in the introduction, all proposed gaps are intended to be indicators of excessive credit growth. In line with a long research tradition, we judge performance by how well the different measures predict systemic banking crises.
We follow the literature and use the area under the ROC curve (AUC) as a statistical measure to judge forecast performance.9 The ROC curve provides a full mapping between the rate of correctly predicted crises and the rate of false alarms. Statistically, the AUC is a convenient and interpretable summary measure of the signalling quality. A completely uninformative indicator has an AUC of 0.5. Correspondingly, the AUC for the perfect indicator equals 1. The AUC of an informative indicator falls in between and is statistically different from 0.5. Given the AUC for two competing indicators, it is also easy to judge whether indicator I1 outperforms indicator I2 using a Wald test.
For practical policy proposes, in addition to statistical power to predict crises, the right timing and stability of signals are important (Drehmann and Juselius, 2014). EWIs need to signal a crisis early enough so that policy actions can be implemented in time to be effective. Yet, EWIs should not signal crises too early as there are costs to macroprudential policies, and early adoption could undermine the support for necessary policy measures (eg Caruana, 2010). EWIs should also be stable, as policy makers tend to base their decisions on trends rather than reacting to changes immediately (eg Bernanke, 2004). A gradual implementation of policy measures may also allow policy makers to influence market expectations more efficiently, and to deal with uncertainties in the transmission mechanism (CGFS, 2012).
To assess the appropriate timing of an indicator S, we follow Drehmann and Juselius (2014) and compute AUC(Sij) for all horizons j within a three year window before a crisis, ie j runs from -12 to -1 quarters.10 When we compute AUC(Sij), we ignore signals in all other quarters than j in the window. For example, at horizon -6, the rate of correctly predicted crises is solely determined by signals issued 6 quarters before crises. False alarms, on the other hand, are based on all signals issued outside the three year window before crises occur. We also do not consider signals issued during a crisis, as binary EWIs become biased if the post-crisis period is included in the analysis (Bussiere and Fratzscher, 2006).
3.1 The alternative gaps
Initially, we consider six different gaps as summarised by Table 1.
We focus on two different normalisations of credit, namely by GDP (ie the credit-to-GDP ratio, with both credit and GDP measured in nominal terms) and per capita (that is, nominal credit divided by the product of the level of the CPI and the population). These different normalisations are indicated by "GDP" or "capita" respectively.
For each ratio, we apply three possible gap measures:
For the linear projection model, we regress our quarterly credit variable, for each country, on lags 20-23 of itself, plus a constant. We do this in real time, adding one observation at a time. With each recursion we take the final residual as a measure of the credit gap in that period. This approach is consistent with the idea that we require a measure that is useful in real time, just as with the use of a one-sided HP filter in our baseline approach.
When using real credit per capita, we face a scaling issue. The reason is that real credit per capita measured in units of local currency, normalised by the CPI and population. National currencies have, however, very different units, as indicated by simple dollar exchange rates ranging from below one to multiples of thousands. While the growth gap method is invariant to scaling, this is not the case for the HP gap used by Basel III or the projection gap.
To overcome the scaling problem for the per capita normalisations, we define the gaps as differences in natural logs rather than levels. Thus, for the projection capita gap, we use log(yt+h) — log(y+h), in place of vt+h as our measure of the residual, and we measure the HP capita gap as log(real- credit-per-capita) - log(trend real-credit-per-capita).11
Our data covers 42 economies.12 We use quarterly data with samples from as early as 1970 (depending on data availability) to derive the trend. The sample ends in the third quarter of 2017.
We include gaps once we have more than 10 years of quarterly data, to ensure adequate data for the calculation of trends with the HP filter or regression coefficients in the linear projections. Hence, 1980q1 is the earliest date included in the horse race.13 This starting point also approximately coincides with when many countries liberalised their financial systems, which in turn affected the dynamics of financial cycles and their relation with financial crises (Borio, 2014).
Our measure of credit is as published in the BIS database of total credit to the private non-financial sector (see Dembiermont et al, 2013), capturing total borrowing from all domestic and foreign sources. Our nominal GDP series used to generate credit-to-GDP are drawn from national sources. To generate the capita gaps, we use CPI from national sources and population numbers from the IMF and the World Bank.
In total we have 34 crises in our sample. For crisis dating, we rely on the new European Systemic Risk Board crisis data set (Lo Duca et al, 2017) for European countries and on Drehmann et al (2010) for the rest. As discussed, we drop post-crisis periods as identified in Lo Duca et al (2017) and Laeven and Valencia (2012) for European and non-European economies, respectively.
Graph 1 presents the main results (Appendix Table A1 shows the underlying data). Panels in the left- hand column are based on credit-to-GDP ratios, and the right-hand column on real-credit-per-capita. The top row shows the HP gaps, the middle row the growth gaps, and the bottom the projection gaps.
For each panel, the solid line represents the AUC at different horizons, up to 12 quarters. Symmetric dotted lines indicate 95% confidence bands around the point estimates. To highlight difference across the specifications, we add red diamonds and blue dots. They are defined as:
The graph summarises the key take-always from the horse race.
Frist, no specification consistently outperforms the baseline gap (HP GDP gap), despite its many criticisms. In fact, for horizons 1 to 7, it has the best forecast performance overall. And differences to the highest AUC at longer horizons are small.
Second, from a practical perspective, there is little meaningful difference between the performance of the HP gaps and the growth gaps. AUCs are never statistically significantly different and the difference to the best performing indicators is on average only 0.02. Furthermore, a strict comparison of AUCs implicitly assumes that the ROC curves are not overlapping. If this is the case, the indicator with the higher AUC has a better or equal trade-off between correctly predicted crises and false alarms than the indicator with a lower AUC for all points along the ROC curve. But this is not the case here. To illustrate this, Graph A1 in the Appendix plots the ROC curves for the HP GDP gap and the growth GDP gap for horizon 4. The AUC of the HP GDP gap is higher than that of the growth GDP gap (0.75 versus 0.72). Yet, for a set of specific preferences, the growth GDP gap dominates the HP GDP gap, because for a false alarm rate between 64% and 70%, the prediction rate of growth GDP gap is higher (94% versus 91%).
Last, linear projection gaps perform consistently poorly. In the case of the projection capita gap, AUCs are never statistically significantly different from an uninformative indicator (ie an AUC of 0.5). The projection GDP gap fares better, and has some significant forecasting power. But it is always significantly worse than the best performing indicator.
In Graph 2, we compare the results for the projection gaps based on real time information (top row; as in the third row of Graph 1) with those using the full sample (middle row), and using fixed estimates of the parameters in equation (1) obtained from the first 40 observations to all later periods (bottom row). Again, we have credit normalised by GDP on the left, and real-credit-per-capita on the right.
We can now see that extracting residuals from the full sample (as is done in in Richter et al (2017)), rather than in real time, results in a dramatic improvement in the performance of the linear projection. Indeed, it generally has the highest AUC, although the differences are small (between 0.00 and 0.04 across the 12 horizons). However, given that our objective is to assess the usefulness of measures of the credit gap to policymakers, the real time results are the relevant ones from our horse race.
The divergent forecast performance between the full-sample and real-time projection gaps is driven by large differences in the estimated gaps for both methods. This is evident, for example, for the United States (Appendix Graph A2). Given the qualitative nature of early warning indicators (ie a crisis is signalled whenever residuals exceed some fixed threshold value), the number and timing of positive signals will vary dramatically between full and real-time samples. On average, the correlation between real-time and full sample projection gaps is 0.66, but for some countries and time periods it is even negative (Appendix Table A2).
Two factors may explain the difference between the real time and full sample results. First, the estimated coefficients of equation (1) in the real time case display sample dependence, so that the residuals may not be comparable over time (for example, see the estimated coefficients for the United States in Appendix Graph A3). Second, with the exception of the last real-time sample, there are more observations available for estimation in the full sample, which should result in more precise estimates of the parameters.
Both factors drive the results. To see this we additionally estimate the AUCs for projection gaps based on parameters from the first 40 observations for each economy (final row of Graph 2). Avoiding sample dependence in this way leads to a considerable improvement in performance relative to the real time cases, especially at shorter horizons. This reflects gains to using fixed parameters over the full sample, even though the underlying parameters in the real time case are estimated based on more information in all except the earliest sample. Clearly we can do even better if we use fixed parameters over the full sample (middle row), as having more observations allows us to estimate the parameters more precisely.
For robustness, we look first at different sub-samples before using 3-year growth rates or projections based on data lagged by three years (instead of five in both bases) to derive alternative gap measures. We also consider the effect of normalised residuals for the projection gap measures, as used in Richter et al (2017).
This robustness check supports our two main findings. First, the projection gaps perform poorly, in both sub-samples, and especially so at shorter horizons. Further, the projection gaps' forecasting power is generally statistically insignificant from an uninformative indicator (ie an AUC of 0.5). In this respect it is always outperformed by other gap measures.
Second, the baseline gap continues to perform well in both samples. It is always either the best performing measure or statistically insignificantly different from it.
Interestingly, the per capita normalisations are useful indicators in the early sample, but noticeably less so in the later one. Up to 2000, one of the capita gap measures is the best performing at nine horizons, and for the other three its AUC is not significantly different from the best performing indicator. Their forecast performance falls dramatically after 2000 when the AUCs for the growth capita gap are generally not significantly different from an uninformative indicator. When we look into the underlying data for this period, we find correlations between real GDP and the population are weaker in the later sub-sample, and negative for six economies.
In the literature, 3-year growth rates have also been used to capture credit developments (eg Mian et al, 2017). As an alternative, we therefore derive the growth gaps for a 12 quarter horizon. In addition, we assess whether the performance of the projection gap changes if we estimated the residual from real time linear projections with h = 12 instead of 20, as in Richter et al (2017).
Graph 5 displays the results and shows that, if anything, these alternatives result in a slight deterioration in the performance of growth rate and projection-based gaps relative to those using the HP filter.
Graph 5: AUCs for different measures of the credit gap when growth gaps are based on 3-year changes (instead of five) and projection gaps are based on h = 12 (instead of h = 20)
Finally, Richter et al (2017) normalise the residuals from their linear projection using their estimated standard error, which may potentially reduce the influence of economies with particularly volatile normalised credit. The results in this case are presented in Graph 6, and are very similar to those for the other cases that we examine.
The credit gap, defined as the deviation of the credit-to-GDP ratio from an HP-filtered trend with a smoothing parameter of 400 000 (for quarterly data), has been suggested as a useful measure for predicting crises. Two criticisms levelled at this measure are that i) the normalisation may be problematic because of the positive correlation between credit and GDP, and ii) the HP filter has undesirable properties.
In this paper, we examine alternative measures of the credit gap that have been advocated by others to address these concerns. We find that none of these alternative gap measures consistently outperforms the baseline credit-to-GDP gap: it either has the highest AUC or the difference between it and the best performing measure is small and statistically insignificant.
But while the performance of the baseline gap is robust, there is little meaningful difference between it and that of gaps based on 20 quarter growth rates and/or using population to normalise credit, instead of GDP: AUCs across these different gaps are never statistically significantly different. In contrast, credit gaps based on linear projections perform consistently poorly when estimated in real time - the relevant case for policymakers.
One caveat of this work is that we focus on crisis prediction based on measures of credit. A burgeoning literature considers multivariate approaches, which improves forecast performance (eg Alessi and Detken, 2018 or Aldasoro et al, 2018). Given the small differences in AUCs of the baseline gap and 20 quarter growth rates, a multivariate analysis or different subsamples may lead to different results regarding the highest AUC for a particular forecast horizon. However, the performance of the baseline credit-to-GDP gap is sufficiently strong that it is likely to remain a useful early warning indicator. First, alternative measures will be hard-pushed to dominate the baseline measure statistically, and second, the relative simplicity of the baseline measure makes it easy to implement. We therefore conclude that you should use the Hodrick-Prescott filter - at least to generate credit gaps for crisis prediction.
Aldasoro, I, C Borio and M Drehmann (2018): "Early warning indicators of banking crises: expanding the family," BIS Quarterly Review 29-45, March.
Alessi, L and C Detken (2018): "Identifying excessive credit growth and leverage," Journal of Financial Stability 35, 215-225.
Basel Committee on Banking Supervision (2010): "Guidance for national authorities operating the countercyclical capital buffer," Bank for International Settlements, December.
Borio, C (2014): "The financial cycle and macroeconomics: what have we learnt?" Journal of Banking & Finance 45, 182-198.
Borio, C and P Lowe (2002): "Asset prices, financial and monetary stability: exploring the nexus," BIS Working Papers no 114.
Drehmann, M (2013): "Total credit as an early warning indicator for systemic banking crises," BIS Quarterly Review41-45, June.
Drehmann, M, C Borio, L Gambacorta, G Jimenez and C Trucharte (2010): "Countercyclical capital buffers: exploring options," BIS Working Papers no 317.
Drehmann, M, C Borio and K Tsatsaronis (2011): "Anchoring countercyclical capital buffers: the role of credit aggregates," International Journal of Central Banking 7(4), 189-240.
Drehmann, M and M Juselius (2014): "Evaluating early warning indicators of banking crises: satisfying policy requirements," International Journal of Forecasting 30(3), 759-780.
Drehmann, M and K Tsatsaronis (2014): "The credit-to-GDP gap and countercyclical capital buffers: questions and answers," BIS Quarterly Review 55-73, March
Hamilton, J (2017): "Why you should never use the Hodrick-Prescott filter," NBER Working Paper no 23429 (and Review of Economics and Statistics, forthcoming).
Jorda, O (2011): Discussion of "Anchoring countercyclical capital buffers: the role of credit aggregates," International Journal of Central Banking 7(4), 241-259.
Jorda, O, B Richter, M Schularick and A Taylor (2017): "Bank capital redux: solvency, liquidity, and crisis," NBER Working Paper no 23287.
Jorda, O, M Schularick and A Taylor (2016): "Macrofinancial history and the new business cycle facts," NBER Macroeconomics Annual 31, 213-263.
Jorda, O, M Schularick and A Taylor (2011): "Financial crises, credit booms, and external imbalances: 140 years of lessons," IMF Economic Review 59(2), 340-378.
Mian, A, A Sufi, and E Verner (2017): "Household debt and business cycles worldwide", Quarterly Journal of Economics 132(4), 1755-1817.
Ravn, M, and H Uhlig (2002): "On adjusting the Hodrick-Prescott filter for the frequency of observations," Review of Economics and Statistics 84(2), 371-376.
Repullo, R and J Saurina (2011): "The countercyclical capital buffer of Basel III: a critical assessment," CEPR Discussion Paper no 8304.
Richter, B, M Schularick and P Wachtel (2017): "When to lean against the wind," CEPR Discussion Paper no 12188.
Schuler, Yves S (2018): "On the cyclical properties of Hamilton's regression filter," Bundesbank Discussion Paper no 03/2018.