Generalisation of the Fractional Polynomial procedure for semi-continuous variables in epidemiology and clinical research
Final Report Abstract
A goal in the analysis of epidemiological or clinical data is often the estimation of a dose-response relationship for semicontinuous risk factors which are composed of positive continuous values and zeros. Typical examples in cancer or cardiovascular disease epidemiology are occupational exposures, e.g. asbestos exposure or alcohol and tobacco consumption where a proportion of individuals may be completely unexposed, and the exposure of those who have been exposed follows a continuous distribution. Hormon receptor levels are typical examples in clinical research. There are both statistical problems, and problems with regard to interpretation arising from this situation. The spike at zero (SAZ) situation has been considered using an extended fractional polynomial (FP) approach. The correct model under some specific assumptions on univariate continuous distributions was derived and expanded by investigating the correct dose-response curve for SAZ situations and univariate normal, log-normal and gamma distribution of the positive part of X. Theoretical results show that even the presumably simple case of two bivariate normally distributed covariates with two variables with SAZ poses some methodological challenges. An important part of modelling SAZ variables is the frequency of zeros and its relation to other covariates. One particular problem is that the four cell distribution (4CD) of two SAZ variables has effects on two levels, the correlation between the positive values of the continuous variables and the OR between binary indicators. Another issue is the correct way to combine variables for investigating an interaction. Depending on the 4CD, it is necessary to include up to three binary indicators into the model, as zero observations in one, the other, and both SAZ variables. An analytical derivation for this situation is presented . For more than two SAZ variables, the situation becomes even more complicated. A strategy is outlined and the first part based on log-linear modelling is illustrated in an example. In order to give practical recommendations for realistic scenarios, two simulation studies for one SAZ variable were performed in the logistic and Cox Regression m. Results from both simulation studies were that with increasing proportion of zeros the standard FP method, which ignores the SAZ, yielded unsatisfactory doseresponse estimates whereas FP-spike gives a better estimate of the true functional relationship by additional modelling of the binary indicator. With a relatively small effect size of the binary indicator, as defined in one of the investigated scenarios in the Cox model, similar results were obtained by both methods. Overall, standard FP selected a more complex model than FP-spike. Non-linear relationships were detected by both methods since both are based on the same functional class to model the positive continuous functional relationship. Obviously, both methods yielded less precise estimates if the true functional relationship was not within the FP class. In such cases FP-spike performed slightly better in terms of magnitudes of error and CIs of estimates compared to standard FP. For the situation of two covariates with SAZ, four strategies for the analysis were proposed. Every method has advantages and disadvantages depending on the specific distributional situation of the variables . The usefulness of the different approaches depends strongly upon the bivariate distribution of the zero and non-zero values and their correlation. A dataset on laryngeal cancer was used to illustrate the four approaches. A simulation study was started to asses the properties of these methods. Key aims are to assess the influence of the effect size on the relevance of including binary indicators in a linear and non-linear functional relationship and to investigate the influence of the number of observations in the zero subsets. For more than two SAZ variables we propose an approach which starts with log-linear modelling of the binary versions (0, >0) of the SAZ variables. With this first step we try to assessment the interrelationship between the SAZ variables. Provided that there are no three- or higherdimensional relationships, the bivariate approaches can be used to derive a final model giving dose response functions for all SAZ variables. However, further issues (eg. order of variable investigation) need to be considered in practice. To illustrate the methods, we obtained data from large epidemiological studies and compared five procedures to model exposure variables with a SAZ. These are: categorical analysis (as used in the original publications), standard FP (ignoring the spike), orig-FP-spike, FP-spike and linear with spike. While these datasets do not cover all possible practical data situations, they provide substantial insight into some strengths and weaknesses of the methods. We recommend the FP-spike procedure as the preferable method.
Publications
- (2012) Analysing covariates with spike at zero: a modified FP procedure and conceptual issues, Biom J 54:686-700
Becher H, Lorenz E, Royston P, Sauerbrei W
(See online at https://doi.org/10.1002/bimj.201100263) - (2015) Dose-response modelling for bivariate covariates with and without a spike at zero: Theory and application to binary outcomes, Stat Neerl 69 374-398
Lorenz E, Jenkner C, Sauerbrei W, Becher H
(See online at https://doi.org/10.1111/stan.12064) - (2015) Dose-response modelling for semicontinuous variables in epidemiology and clinical research. Doctoral thesis (Dr. sc. hum.), University of Heidelberg
Lorenz E
(See online at https://doi.org/10.11588/heidok.00019981) - Modeling continuous covariates with a “spike” at zero: Bivariate approaches. Biom J, 58,4, July 2016, Pages 783-796
Jenkner C, Lorenz E, Becher H, Sauerbrei W
(See online at https://doi.org/10.1002/bimj.201400112) - Modeling Variables With a Spike at Zero: Examples and Practical Recommendations. American Journal of Epidemiology, Volume 185, Issue 8, 15 April 2017, Pages 650–660
Lorenz E, Jenkner C, Sauerbrei W, Becher H
(See online at https://doi.org/10.1093/aje/kww122) - Multivariable modeling of continuous covariates with a spike at zero, Doctoral thesis (Dr. rer. nat.), University of Freiburg, Fakultät für Mathematik und Physik, XII, 140 S., März 2018
Jenkner C