Project Details
More Than Means Can Say: Regressing Entire Distribution Functions in Epidemiology and Biostatistics
Applicant
Professor Dr. Torsten Hothorn
Subject Area
Epidemiology and Medical Biometry/Statistics
Term
from 2012 to 2018
Project identifier
Deutsche Forschungsgemeinschaft (DFG) - Project number 225384399
The ultimate goal of regression analysis is to obtain information about theconditional distribution of a response given a set of explanatory variables.This goal is, however, seldom achieved because most established regressionmodels only estimate the conditional mean as a function of the explanatoryvariables and assume that higher moments are not affected by the regressors.For example, determinants of the mean childhood nutrition status cannot beinterpreted as risk factors for undernutrition since the latter quantity isdefined by lower quantiles of the nutrition distribution that cannot beexpected to be just a shifted version of the mean. The underlying reasonfor the restriction to conditional means is the assumption of additivity ofsignal and noise. We plan to relax this common assumption in the frameworkof transformation models. The novel class of semiparametric regressionmodels proposed herein allows transformation functions to depend onexplanatory variables. We will investigate the estimation of the underlying transformation functions by regularised optimisation of scoring rules forprobabilistic forecasts, e.g. the continuous ranked probability score. In acertain sense, these models can be viewed as ``inverse quantile regression''because we aim at estimating the distribution function instead of thequantile function.The resulting models promise to be valuable in epidemiology andbiostatistics, especially for describing possible heteroscedasticity,comparing spatially varying distributions, identifying extreme events,deriving prediction intervals and selecting variables beyond mean regressioneffects.The three main objectives of this grant proposal are the evaluation ofdifferent algorithms for model estimation with respect to their finitesample performance, the extension of the basic modelling framework tosurvival analysis and thus to models where higher moments of a censoredresponse may be described as functions of explanatory variables and,finally, the application of these novel techniques for the estimation of conditional distribution functions for childhood nutrition and birthweights. In these two applications, mean regression is a severe oversimplification since distributional properties, such as the assessmentof undernutrition by lower quantiles of the nutrition status or predictionintervals for birth weights, are the actual targets of interest.
DFG Programme
Research Grants
International Connection
Switzerland
Participating Persons
Professor Dr. Peter Bühlmann; Professor Dr. Thomas Kneib; Professor Dr. Matthias Schmid