[7] Factor-augmented sparse MIDAS regressions with an application to nowcasting. forthcoming at Journal of Business & Economic Statistics.
with Jad Beyhum —
| arxiv |CRAN package
The common practice for GDP nowcasting in a data-rich environment is to employ either sparse regression using LASSO-type regularization or a dense approach based on factor models or ridge regression, which differ in the way they extract information from high-dimensional datasets. This paper aims to investigate whether sparse plus dense mixed frequency regression methods can improve the nowcasts of the US GDP growth. We propose two novel MIDAS regressions and show that these novel sparse plus dense methods greatly improve the accuracy of nowcasts during the COVID pandemic compared to either only sparse or only dense approaches. Using monthly macro and weekly financial series, we further show that the improvement is particularly sharp when the dense component is restricted to be macro, while the sparse signal stems from both macro and financial series.
[6] Testing for sparse idiosyncratic components in factor-augmented regression models. Journal of Econometrics 244.1 (2024): 105845.
with Jad Beyhum — | pdf |R package
We propose a novel bootstrap test of a dense model, namely factor regression, against a sparse plus dense alternative augmented model with sparse idiosyncratic components. The asymptotic properties of the test are established under time series dependence and polynomial tails. We outline a data-driven rule to select the tuning parameter and prove its theoretical validity. In simulation experiments, our procedure exhibits high power against sparse alternatives and low power against dense deviations from the null. Moreover, we apply our test to various datasets in macroeconomics and finance and often reject the null. This suggests the presence of sparsity — on top of a dense component — in commonly studied economic applications. The R package 'FAS' implements our approach.
[5] High-dimensional granger causality tests with an application to VIX and news. Journal of Financial Econometrics 22.3 (2024): 605-635.
with Andrii Babii & Eric Ghysels —
| pdf | CRAN package
We study Granger causality testing for high-dimensional time series using regularized regressions. To perform proper inference, we rely on heteroskedasticity and autocorrelation consistent (HAC) estimation of the asymptotic variance and develop the inferential theory in the high-dimensional setting. To recognize the time series data structures we focus on the sparse-group LASSO estimator, which includes the LASSO and the group LASSO as special cases. We establish the debiased central limit theorem for low dimensional groups of regression coefficients and study the HAC estimator of the long-run variance based on the sparse-group LASSO residuals. This leads to valid time series inference for individual regression coefficients as well as groups, including Granger causality tests. The treatment relies on a new Fuk-Nagaev inequality for a class of τ-mixing processes with heavier than Gaussian tails, which is of independent interest. In an empirical application, we study the Granger causal relationship between the VIX and financial news.
This paper uses structured machine learning regressions for nowcasting with panel data consisting of series sampled at different frequencies. Motivated by the problem of predicting corporate earnings for a large cross-section of firms with macroeconomic, financial, and news time series sampled at different frequencies, we focus on the sparse group LASSO regularization which can take advantage of the mixed frequency time series panel data structures. Our empirical results show the superior performance of our machine learning panel data regression models over analysts’ predictions, forecast combinations, firm-specific time series regression models, and standard machine learning methods.
[3] Machine learning panel data regressions with heavy-tailed dependent data: Theory and application. Journal of Econometrics 237.2 (2023): 105315.
with Andrii Babii, Ryan Ball & Eric Ghysels —
| pdf |CRAN package
The paper introduces structured machine learning regressions for heavy-tailed dependent panel data potentially sampled at differentrent frequencies. We focus on the sparse-group LASSO regularization. This type of regularization can take advantage of the mixed frequency time series panel data structures and improve the quality of the estimates. We obtain oracle inequalities for the pooled and fixed effects sparse-group LASSO panel data estimators recognizing that financial and economic data can have fat tails. To that end, we leverage on a new Fuk-Nagaev concentration inequality for panel data consisting of heavy-tailed τ-mixing processes. Lastly, we study the HAC estimator of the long-run variance based on the sparse-group LASSO residuals for pooled panel regression. Therefore, we provide a valid inference method individual regression coefficients as well as groups, including Granger causality tests, for high-dimensional pooled panel regressions.
Covariates in regressions may be linked to each other on a network. Knowledge of the network structure can be incorporated into regularized regression settings via a network penalty term. However, when it is unknown whether the connection signs in the network are positive (connected covariates reinforce each other) or negative (connected covariates repress each other), the connection signs have to be estimated jointly with the covariate coefficients. This can be done with an algorithm iterating a connection sign estimation step and a covariate coefficient estimation step. We develop such an algorithm, called 3CoSE, and show detailed simulation results and an application forecasting event times. The algorithm performs well in a variety of settings. We also briefly describe the publicly available R-package developed for this purpose.
[1] Machine learning time series regressions with an application to nowcasting. Journal of Business & Economic Statistics 40.3 (2022): 1094-1106.
with Andrii Babii & Eric Ghysels —
| pdf | CRAN package
This paper introduces structured machine learning regressions for high-dimensional time series data potentially sampled at different frequencies. The sparse-group LASSO estimator can take advantage of such time series data structures and outperforms the unstructured LASSO. We establish oracle inequalities for the sparse-group LASSO estimator within a framework that allows for the mixing processes and recognizes that the financial and the macroeconomic data may have heavier than exponential tails. An empirical application to nowcasting US GDP growth indicates that the estimator performs favorably compared to other alternatives and that text data can be a useful addition to more traditional numerical data.