If I tell you all the important features, should you include all of them?

why statsmodels >>> sklearn

In Quantopolis, a young statistician named quantymacro struggled to perfect his G10 rates regression model. Despite using conventional data—rates momentum, rates carry, rates value—his predictions lacked precision.

One day, he heard about the legendary Oracle of Variables, a mystical entity residing in an ancient tome on probability, hidden in the oldest part of the city library. Driven by curiosity, quantymacro ventured into the forbidden section and discovered the tome. Opening it, a luminous figure emerged.

“I am the Oracle of Variables. Ask your question,” the figure intoned.

quantymacro asked, "What are the exact variables that drive rates expected returns?"

The Oracle revealed both expected and obscure variables. quantymacro eagerly adjusted his model, removing irrelevant variables and adding the newfound ones. To his dismay, the mean squared error increased. Frustrated, he returned to the Oracle.

"Why has my model failed despite only using the important variables?" he asked.

The Oracle replied, "That's for you to figure out. Subscribe to my newsletter if you want to know more."

This article is a fun one IMO, we will talk about when exactly should we include a feature in our model. Which is intimately related to Bias-Variance tradeoff. And at the end we will be greeted by a very familiar face as a pleasant surprise; it has been them at all along. If you want to sharpen your knowledge of linear algebra and regressions, hop on.

bonus points if you can relate the questions to the content of this article

This post is for paying subscribers only

Already have an account? Sign in.

subscribe to quantymacro

sometimes I write cool stuffs