Project Details
Computationally tractable bootstrap for high-dimensional data
Subject Area
Mathematics
Term
since 2021
Project identifier
Deutsche Forschungsgemeinschaft (DFG) - Project number 460867398
Computationally tractable but fully nonparametric bootstrap for a massive data scenario is developed and studied in the high-dimensional regime. Based on $n$ independent identically distributed $p$-dimensional observations where both, $n$ and $p$ may be large, we pursue the innovation of combining subsampling with suitable dimension reduction of the subsampled observations. This data reduction approach originates from the experience that in many situations, a suitably selected "representative subpopulation" of each datum already contains the essential statistical information for the problem under consideration. For statistics characterized by the spectrum of the population covariance matrix, we rigorously introduce the so-called representative subpopulation condition and investigate its validity in commonly used statistical models. The novel approach is accessible to distributed computation with subsequent averaging even in the high-dimensional regime, revealing a new data reduction based bootstrap which is computationally tractable for massive data sets.
DFG Programme
Research Units