Project Details
Modular Questionnaire Designs for Social Surveys: Statistical Modelling of Missingness in Real Social Survey Data
Applicants
Dr. Christian Bruch; Professor Dr. Christof Wolf
Subject Area
Empirical Social Research
Term
since 2018
Project identifier
Deutsche Forschungsgemeinschaft (DFG) - Project number 407454818
This project investigates how imputation procedures can be used in large-scale social surveys collected through so-called modular questionnaire designs such that an acceptable quality of imputations can be ensured. Modular questionnaire designs are used to reduce the length of questionnaires by randomly assigning respondents to different fractions of the full questionnaire (modules). This procedure helps to save costs and to ensure acceptable response rates and response quality. However, modular questionnaire designs also result in large amounts of planned missing data. One possibility to deal with the missing values is imputing them. However, imputing these data is an especially challenging exercise due to the typical features of large-scale social survey data, such as predominantly low correlations between variables, large numbers of variables combined with relatively small samples, and often many categorical variables in the dataset. In the first phase of this project, we have identified promising strategies to deal with this situation. However, to meet the requirements for social-survey applications of modular questionnaire designs, the imputation procedures need to be further evaluated and adjusted. First, the procedures must be able to deal with multinomial variables, which are widespread in social surveys. Second, the procedures must be tested and, if necessary, adjusted to fit a context with several hundred variables even better, especially regarding adequate ways to deal with potential overfitting. In order to evaluate the real-life usability of the imputation procedures in scenarios with data from modular questionnaire designs, more research also needs to address the perspective of the researchers who have to conduct their analyses based on the imputed data. This especially refers to complex multivariate models (such as multiple regressions), which are often used in the social sciences. It must be ensured that such models can be estimated with the (imputed) data from modular questionnaire designs reliably. Special attention will also be paid to analyses of subgroups. This evaluation also involves strategies to impute the whole data for general purposes or to focus the imputation on certain variables, particularly those of the analysis models. The latter would require the researchers to impute the missing values themselves. Thus, we propose continuing our research in a second phase to close the gap between previous basic research and future practical applications of imputation to such surveys.
DFG Programme
Research Grants