Regularization with Categorical Covariates: Generalizations and Extensions
Final Report Abstract
By contrast to so-called metric variables like ‘body weight’, categorical variables are just measured on discrete levels. Those levels may be ordered or not. In the first case, the variable is called ‘ordinal’, otherwise ‘nominal’. For instance, opinions are often measured on ordinal scales like no agreement at all, . . . , total agreement. Though categorical variables are often used to explain or predict other quantities, literature on methods for categorical explanatory variables is limited. One aim of the project was to generalize approaches that had been proposed for categorical explanatory variables in a rather restrictive statistical framework, the classical linear model. The new methods developed in the project also work for more general models and lead to both higher accuracy and better interpretability, compared to standard approaches typically used to estimate unknown model parameters. A typical application is classification, when, for example, in biomedicine instances have to be classified as ‘benign’ or ‘malignant’ using categorical predictors. In the project, both ordinal and nominal explanatory variables were considered, but eventually the focus was more on variables with ordered levels. Besides generalizing existing methods as described above, new statistical testing procedures for ordinal predictors could be developed that are highly relevant in the social and behavioral sciences. In many situations, these new tests distinctly outperform standard procedures that are used so far. Furthermore, based on the methods developed for ordinal predictors, new methods for analyzing panel data in econometrics could be proposed.
Publications
- (2011): Testing linearity and relevance of ordinal predictors. Electronic Journal of Statistics 5, 1935–1959
Gertheiss, J. & Oehrlein, F.