All my homies hate LASSO

I fell out of love with LASSO. it's not you it's me baby :(

🔮how good is LASSO at recovering sparsity?
💡 lesson from a macro HF PM in building +150 forecasting models
🏰 (lack of) invariance of regularisation under transformation/change of basis

(this post is meant to be free, so if you were to RT my tweet for this post I'll be thankful and give you access 😄)

Intuition behind regression regularisation is super simple.

  • Standard OLS coefficients might be too big (large variance)
  • We introduce a penalty term (bias)
  • Hope we will get lower mean squared error

But we need to go deeper. LASSO is considered a sparse estimation method, while Ridge is considered a dense estimation. That sounds obvious, but rarely I ever thought about it in that framework before. Rarely I ever interrogate myself asking:

Does this target variable more likely to be represented by a sparse or a dense model given my set of predictors?

Checkout this exchange between @choffstein and Vivek, about using LASSO/Ridge/PCA/PLS in estimation of stock returns:

Vivek: ...And so we tried various methods, we tried ordinary least squares that overfit. And remember, we were predicting out of sample, so we got pretty bad results. We tried OLS, partial least squares (PLS) and principal component regression (PCR), and they performed poorly.

Now, let’s talk about why those perform poorly, because they all try to collapse the feature space. As a general rule trying to collapse the feature space underperforms methods that use the full range of features. Linear Ridge, on the other hand, will share loadings across collinear signals. If you have collinear signals, you don’t want to collapse them into one, which PCR and PLS effectively do or to push one out which LASSO effectively does. Linear Ridge shares the loaded between collinear features. And that helps when you’re predicting noisy variables, and of course, stock returns are noisy.

Effectively what Vivek was arguing is that stock returns, are more likely to be represented by a dense model over a sparse one. And if this is true, collapsing the feature space/dropping variable is likely to be very detrimental.

Do I have an opinion on this issue? Not really, to be honest I barely did any work on stock returns forecasting, so what do I know.

Regardless, in this article, we will try answer these questions:

❓ Should you use LASSO as a feature selection method?
❓ When is LASSO effective and when does it fail?
❓ Is Ridge good enough to be your L̶o̶r̶d̶ and Saviour?

Most of the time my instinct is to go for whichever regularisation method that has better error profile etc. If there's one thing I took away from writing this article, is not to do that. And as we will see later in this article, not thinking deeply about the data generating process (DGP) definitely going to be detrimental for our estimation task.

This post is for subscribers only

Already have an account? Sign in.

subscribe to quantymacro

sometimes I write cool stuffs