Review article
Regression Analyses and Their Particularities in Observational Studies
Part 32 of a Series on Evaluation of Scientific Publications
; ;
Background: Regression analysis is a standard method in medical research. It is often not clear, however, how the individual components of regression models are to be understood and interpreted. In this article, we provide an overview of this type of analysis and discuss its special features when used in observational studies.
Methods: Based on a selective literature review, the individual components of a regression model for differently scaled outcome variables (metric: linear regression; binary: logistic regression; time to event: Cox regression; count variable: Poisson or negative binomial regression) are explained, and their interpretation is illustrated with respect to a study on multiple sclerosis. The prerequisites for the use of each of these models, their applications, and their limitations are described in detail.
Results: Regression analyses are used to quantify the relation between several variables and the outcome variable. In randomized clinical trials, this flexible statistical analysis method is usually lean and prespecified. In observational studies, where there is a need to control for potential confounders, researchers with knowledge of the topic in question must collaborate with experts in statistical modeling to ensure high model quality and avoid errors. Causal diagrams are an increasingly important basis for evaluation. They should be constructed in collaboration and should differentiate between confounders, mediators, and colliders.
Conclusion: Researchers need a basic understanding of regression models so that these models will be well defined and their findings will be fully reported and correctly interpreted.


Regression analyses form an essential part of medical research (1, 2). The basic idea is that an outcome variable is explained by predictor variables. In a cross-sectional study, the aim is generally to show the association of various variables. Due to the lack of a temporal effect, it is not possible to make a statement of causality without further consideration. In a prospective/longitudinal study, the effect of various variables on an outcome variable can be evaluated to allow predictions to be made or to explain associations.
In randomized controlled trials (RCTs), structural equality is created in principle by random allocation to the intervention groups. Accordingly, the models in RCTs are very lean and should primarily include the intervention group, the baseline value (if possible) and key factors that were used to form groups upon randomization (stratification variables) (3). In observational studies, by contrast, a large number of independent variables can and at times must be included in the model.
In this series of articles on the evaluation of scientific publications, an article on linear regression analysis was already published (1). In addition, regression analyses have been discussed in other articles (4, 5, 6).
The aim of our article is to provide in the first part an overview of the components of regression analyses and in the second part to address the particularities of observational studies. The prerequisites, possibilities and limitations of specific regression models are explained and discussed in the appendix. Table 1 provides a glossary of the technical terms used..
Methods
Based on a selective search of the literature in the PubMed database and with Perplexity.ai, the individual components of a regression model are explained for outcome variables of different scales. Its interpretation is illustrated using a study on multiple sclerosis as an example. The requirements for the use of each of these models, their applications and their limitations are described in detail in the eMethods.
Components of regression analyses
A regression model can be compared to a kit with various possible components. In this section, we will use a fictitious study on multiple sclerosis to illustrate these components.
Fictitious example study
Multiple sclerosis (MS) is a chronic inflammatory neurological disease which can take a relapsing-remitting or a progressive course. It is characterized by symptoms such as paralysis and sensory disturbances caused by demyelination and degradation of nerve fibers. The lesions in the brain can be visualized using magnetic resonance imaging (MRI). For this fictitious study, we conceive that an experimental intervention (E) is to be compared to the standard of care (S). Because the time since diagnosis is a major prognostic factor, randomization is stratified by time since diagnosis (<5 versus ≥ 5 years). This means that patients with more than 5 years and less than 5 years since diagnosis are randomized separately to the two treatment groups. The aim of this approach is to ensure that both intervention groups are equally represented within the two strata.
Outcome variable
The outcome variable is the variable of interest. The scale on which the outcome variable is measured determines the class of regression model to be used. In MS, disease progression can be defined in various ways. For our fictitious study, we use the following outcome variables:
- The time a patient takes to walk 25 feet (just under ten meters), known as Timed 25-Foot Walk (T25FW) (7). Alternatively, this time can be converted to foot per second; this is what we will work with in the following. In addition, the change from the baseline value (after minus before the intervention) will be used as an outcome variable (8): As this is a metric variable, a linear regression is used.
- Relapse within 6 months (yes versus no): As this is a dichotomous variable, a logistic regression is used (9).
- Time to relapse: As this is a time-to-event variable, a Cox regression is used (10).
- Number of new lesions in the period of 2 years (determined using MRI): As this is a count variable, a Poisson regression is used (11).
In the eMethods, the various regression models (linear, logistic, Cox, and Poisson regressions) are described in detail with assumptions and applied to the example study with comprehensive interpretation of the results
In most cases, one outcome variable is modelled; the resulting regression model is referred to as univariate. If several outcome variables are to be modeled jointly (for example, the T25FW score in the morning and in the evening), the regression model is referred to as multivariate. Multivariate regression is not discussed in more detail in this article; if you are interested, please refer to Zelterman (2015), among others (12). It should be noted here that in the literature the term “multivariate regression” is often used incorrectly to mean “multiple regression” (13).
Predictor variable
The predictor variable is the variable whose effect is to be investigated. In epidemiological studies, the term exposure is generally used. The scale level of predictor variables can take any form. In the case of a single predictor variable, the term univariable regression or simple regression is used; in the case of two or more independent variables, the term multivariable or multiple regression is used. The primary result of a regression analysis relates to the predictor variables. In general, the regression coefficient, the corresponding (typically two-sided 95%) confidence interval and the corresponding p-value for the test that the coefficient equals 0 is reported for each independent variable. Useful details on the procedure and reporting of regression models can be found in the literature (14, 15).
In the eMethods, concrete models are created, regression coefficients, confidence intervals and p-values are interpreted, and the necessary requirements are explained, all using the fictitious example study. Table 2 provides a summary of the results and interpretations of the fictitious study, separately for the various outcome variables. For the sake of completeness, the various intercept results are also given (for explanation see eMethods und the section “Common mistakes”); however, this factor is generally not taken into account in the interpretation and is also not included in Table 2.
Figure 1 shows, for the example research question, a scatter plot and the resulting regression line of a linear simple regression of the effect of disease duration (independent variable x, in years) on the change in T25FW score (dependent variable y, ΔT25FW in feet per second): ΔT25FW = 0.137–0.01 years since diagnosis.
The intercept is the value at which the regression line for x = 0 intersects the y-axis (here 0.137) and the slope parameter is the value by which the regression line rises (or falls) when the value on the x-axis increase by one unit (also known as slope; here –0.01).
Interaction term
An important requirement for regression models is the additivity of the effects of the individual independent variables. In other words, the various independent variables do not influence each other‘s effect on the outcome variable. On the other hand, if a so-called interaction is thought to exist, a suitable interaction term can be included in the model. However, in order to facilitate a better interpretation of the results, it is generally advisable to only consider interactions that are plausible from a content perspective (16). In the example study, it could be assumed that the experimental intervention is effective in patients with relapsing-remitting disease course, but without benefit in patients with progressive multiple sclerosis. In this case, it would be advisable to stratify the randomization by the factor “type of disease course”. In any case, however, this independent variable should be included in the model and the interaction term “intervention group × type of course“ should also be added. From a statistical perspective, the evidence for an interaction is stronger if the corresponding p-value is smaller. If it is concluded that there is a relevant interaction, the analysis should be performed in a stratified manner. In such a case, the intervention effect should be interpreted separately for progressive and relapsing-remitting disease courses.
Dependent measurements
It is common for clinical studies to have several observations per patient: either per point in time (for example, several lesions in the MRI) or over time (multiple study visits). However, there may also be other factors that lead to dependencies in the data. For example, in a study in which practices are assigned to intervention groups (so-called cluster randomization), the data of patients within one practice would be dependent. In order to take such dependencies into account, random effects are introduced into the regression model. This means that individual intercepts (random intercepts) or slope parameters (random slopes) are estimated for the individual units (e.g. patient or center) with several measurements. In the case of multiple measurements, the patient would then be the random effect; in the case of multicenter trials, it would be the center. More details can be found in the literature (17, 18). Examples of alternative approaches to the evaluation of data with dependent measurements include generalized linear models (GLMs) and generalized estimating equations (GEEs) (19, 20).
Particularities of observational studies
Even though randomized controlled trials (RCTs) offer the highest level of evidence, it is not always possible to conduct an RCT. For example, randomization may not be possible or ethically justifiable for certain reasons (e.g. smoking as a risk factor for multiple sclerosis). In our example, it would be conceivable that patients who do not wish to be randomized are included in a registry and observed over the course of their disease. If regression analyses are then performed on the resulting observational data, there are particularities that should be taken into account.
Evaluation of model quality
While in clinical trials, models are generally pre-specified and not selected based on results, the model is at times changed in observational studies if the results require so to achieve the best possible fit of the model. The square of the correlation coefficient is often estimated as part of a linear regression, the so-called coefficient of determination (R2). It describes the proportion of variation in the values of the outcome variable that can be explained by the regression model. Hence, a value close to 1 is considered very good, while a value close to 0 is considered poor. If in our illustrative model a coefficient of determination of 0.63 is estimated, this means that 63% of the variation in the ΔT25FW score can be explained by the model. Examples of other measures of model quality include the adjusted coefficient of determination and Akaike‘s Information Criterion; further measures are described by Moons et al. (14), among others.
Various measures have been proposed in the literature to estimate the quality of other regression models (e.g. Hosmer and Lemeshow [21] and Moons et al. [14]); some of these measures are similar to those used for the linear regression model (e.g. the pseudo coefficient of determination and a prognostic index), while others are derived from measures of diagnostic accuracy (e.g. sensitivity and specificity). These parameters also indicate a better diagnostic accuracy the closer they are to 1.
Good and poor adjustment variables:
confounders, colliders, mediators
The key difference between randomized controlled trials and epidemiological observational studies, such as cohort studies, is that statements on causality are only possible to a limited extent, due to potential confounding variables (confounders). While in randomized clinical trials, the receipt of an intervention is determined by chance, confounders frequently influence the intervention/exposure variable and the outcome variable. Here, it must be distinguished between independent variables and confounders.
While independent variables are any variables included as independent variables included as predictor variables in a regression model, confounders are characterized by a directed dependency structure. Confounders are assumed to have a causal influence on exposure/treatment and on the outcome variable (common cause). Under this assumption, confounding can also be eliminated in observational studies by adjusting for all relevant confounders within a regression model; in this way, causal interpretations of regression coefficients are theoretically possible, too (22). However, the prerequisite for this is that all relevant confounders are actually taken into account, a fact that cannot be proven. For this reason, the German Institute for Quality and Efficiency in Health Care (IQWiG, Institut für Qualität und Wirtschaftlichkeit im Gesundheitswesen), for example, uses statistical significance tests with null hypotheses above a minimum difference (shifted null hypothesis) for benefit assessments based on non-randomized studies in order to take this residual uncertainty into account (23). Further methods of dealing with confounding have already been covered in this series of articles (stratification [24] and propensity score matching [4]). However, all of these methods are based on the uncertain assumption that confounding can be completely eliminated. For further discussion of whether and under what assumptions causal interpretations from observational studies are possible, please refer to the literature (25, 26). Despite this, it is generally not advisable for a causal interpretation of the regression coefficients, even in observational studies, to include all available independent variables in the regression model.
In this series of articles, a type of bias was already discussed that can arise through the inclusion of variables in multivariable models: the collider bias (27). A further bias for the estimation of the total effect of an independent variable can arise, if adjustments are made for variables that are located in the causal structure between exposure and outcome; such variables are called mediators. These three basic structures can be illustrated in causal diagrams (directed acyclic graphs, DAGs). For our MS research example, our DAG is based on a review by Koch-Henriksen et al. 2021 (28). Apart from the type of treatment, it is important to establish whether the variables “number of relapses during the 12 months before study start “, “adherence” (measured based on the number of tablets taken) and “frequency of follow-up visits“ should be included in a multivariable regression model (Figure 2).
Confounding structure
It is necessary to adjust for confounders. In our example, the number of relapses in the 12 months prior to study inclusion can influence the decision on the type of treatment and at the same time be a factor influencing MS progression; therefore, we should adjust for this variable in a regression model.
Mediator structure
To estimate the overall effect of an exposure/treatment on an outcome variable, it is important not to adjust for mediators. To investigate the mechanisms of an effect, a mediation analysis is required, separating the overall effect into a direct and an indirect effect (29). In our example, the type of treatment can have an effect on adherence which in turn can have an effect on MS progression. To estimate the overall treatment effect, we should not adjust for the variable adherence as a mediator in a regression model.
Collider structure
It is important not to adjust for colliders. In our example, the type of treatment and the progression of MS can influence how often patients present for follow-up visits. Thus, the frequency of follow-up visits is a collider for which no adjustment should be made in a regression model.
Summary
In our example, therefore, the only adjustment to be made is for the variable “number of relapses during the 12 months before study inclusion“, as this is the only confounding variable. In a real-world setting, it can be assumed that the DAG is considerably more complex than ours.
Ratio of number of cases to number of variables
The sample size has a major impact on the estimation of a regression model.
The more variables are to be included in a multivariable regression model, the larger the sample should be. Rules of thumb can serve as a simple heuristic for estimation. In this regard, the information in the literature is inconsistent; however, the number of cases should be 10 to 20 times as large as the number of model parameters to be estimated (1, 30, 31). The number of observations in the rarer category or the number of events are important for the logistic regression and the Cox regression. In the literature, a value of at least 10 events per variable is given (32, 33). If during a follow-up period, an event has occurred in 20 of 200 persons, for example, the maximum number of independent (predictor) variables would be two in a logistic regression/Cox regression. However, simulation studies have shown that such simple rules of thumb for determining the number of variables do not accurately reflect complex data structures where the correlations of the predictors and the order of magnitude of the regression coefficients should also be taken into account. For further discussion of this topic, please refer to van Smeden et al. and Courvoisier (34, 35), specifically for prediction models with continuous, dichotomous and time-to-event outcome variables to Riley et al. (36, 37).
Common mistakes
There are typical pitfalls in the estimation and subsequent reporting of a regression model.
First, it should be noted that the estimation of effect sizes and the quality of a regression model is based on various assumptions which are outlined in conjunction with the individual models in the eMethods. In practice, these assumptions must always be checked in order to ultimately prove the validity of the findings from a regression model.
Furthermore, the purpose of a regression model may vary. If, for example, the primary goal is to interpret the relationship of individual predictor variables with the outcome variable—adjusted for other predictor variables—, the p-value in particular is often reported. It should be noted that any statement about the significance of each predictor variable represents an individual statistical test. However, the interpretation of several predictor variables poses the problem of multiple testing (38) which requires adjustments of the level of significance or other possible solutions. As a rule, effect estimates and confidence intervals are reported for the predictor variables. However, if prediction is the primary purpose of the model, the parameters of the individual predictor variables are less important than the overall quality of the model.
Finally, the fact that a regression model is just a model adjusted to the data in the best possible way must be borne in mind both during development and interpretation. It is not possible to establish causality from the estimated statistical parameters; this can only be achieved with a suitable study design. If a data-driven model is chosen, i.e. if predictor variables are selected in a way that optimum model quality is achieved, there is a risk of overfitting. This means that both model quality and the association of individual predictor variables are overestimated. Especially in observational studies, such an approach is common practice; thus, it is necessary to validate the resulting models both internally and in independent data (14). Ideally, however, the model is specified a priori, using a hypothesis-led approach.
Conclusion
The aim of this article was to address basic components of regression analyses in observational studies in an introductory manner, using a study on multiple sclerosis as an example. In medical research, regression analysis plays an important role as a statistical evaluation method due to the flexibility of its use. They have therefore been the subject of guidance articles in various journals, including Nature Methods and the British Medical Journal’s Statistics Notes (39, 40). The fact that our article focusses on regression analyses in observational studies highlights the importance of the collaboration between scientists with domain knowledge and modelers. Causal diagrams form an increasingly important basis for the evaluation of observational studies; they should be prepared jointly and distinguish between confounders, mediators and colliders. Beyond the topics covered in our article, please refer to the literature for further discussion of advanced methods, such as the modelling of the functional form of continuous predictor variables or the advantages and disadvantages of various variable selection options (31, 41). In conclusion, it can be stated that basic knowledge of regression models is necessary for the understanding of many scientific papers.
Conflict of interest statement
The authors declare no conflict of interest exists.
Manuscript received on 28 September 2023, revised version accepted on 18 December 2023.
Translated from the original German by Ralf Thoene, M.D.
Corresponding author
Prof. Dr. rer. nat. Antonia Zapf
Institut für Medizinische Biometrie und Epidemiologie
Universitätsmedizin Hamburg-Eppendorf
Martinistraße 52, 20246 Hamburg, Germany
a.zapf@uke.de
Cite this as:
Zapf A, Wiessner C, König IR: Regression analyses and their particularities in observational studies—part 32 of a series on evaluation of scientific publications. Dtsch Arztebl Int 2024; 121: 128–34. DOI: 10.3238/arztebl.m2023.0278
Institute of Medical Biometry and Statistics, University of Lübeck, Lübeck, Germany: Prof. Dr. rer. biol. hum. Inke Regina König
1. | Schneider A, Hommel G, Blettner M: Linear regression analysis: part 14 of a series on evaluation of scientific publications. Dtsch Arztebl Int 2010; 107: 776–82 VOLLTEXT |
2. | Katz MH: Multivariable analysis: a practical guide for clinicians and public health researchers. Cambride, UK: Cambridge university press 2011 CrossRef |
3. | EMA: Guideline on adjustment for baseline covariates in clinical trials. EMA/CHMP/295050/2013. 2015. www.ema.europa.eu/en/documents/scientific-guideline/guideline-adjustment-baseline-covariates-clinical-trials_en.pdf (last accessed on 22 September 2023). |
4. | Kuss O, Blettner M, Börgermann J: Propensity score: an alternative method of analyzing treatment effects. Dtsch Arztebl Int 2016; 113: 597–603 CrossRef |
5. | Ressing M, Blettner M, Klug SJ: [Data analysis of epidemiological studies – part 11 of a series on evaluation of scientific publications.] DZZ 2011; 66: 456–62. |
6. | Zwiener I, Blettner M, Hommel G: Survival analysis: part 15 of a series on evaluation of scientific publications. Dtsch Arztebl Int 2011; 108: 163–9 VOLLTEXT |
7. | Kalinowski A, Cutter G, Bozinov N, et al.: The timed 25-foot walk in a large cohort of multiple sclerosis patients. Mult Scler 2022; 28: 289–99 CrossRef MEDLINE PubMed Central |
8. | Goodman AD, Brown TR, Krupp LB, et al.: Sustained-release oral fampridine in multiple sclerosis: a randomised, double-blind, controlled trial. Lancet 2009; 373: 732–8 CrossRef MEDLINE |
9. | University of California, San Francisco MS-EPIC Team, Cree BAC, Hollenbach JA, et al.: Silent progression in disease activity-free relapsing multiple sclerosis. Ann Neurol 2019; 85: 653–66 CrossRef MEDLINE PubMed Central |
10. | Healy BC, Glanz BI, Stankiewicz J, Buckle G, Weiner H, Chitnis T: A method for evaluating treatment switching criteria in multiple sclerosis. Mult Scler 2010; 16: 1483–9 CrossRef MEDLINE |
11. | Pongratz V, Bussas M, Schmidt P, et al.: Lesion location across diagnostic regions in multiple sclerosis. Neuroimage Clin 2023; 37: 103311 CrossRef MEDLINE PubMed Central |
12. | Zelterman D: Applied multivariate statistics with R. Cham, Switzerland: Springer 2015 CrossRef |
13. | Hidalgo B, Goodman M: Multivariate or multivariable regression? Am J Public Health 2013; 103: 39–40 CrossRef MEDLINE PubMed Central |
14. | Moons KGM, Altman DG, Reitsma JB, et al.: Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): explanation and elaboration. Ann Intern Med 2015; 162: W1–73 CrossRef MEDLINE |
15. | Vandenbroucke JP, von Elm E, Altman DG, et al.: Strengthening the reporting of observational studies in epidemiology (STROBE): explanation and elaboration. Int J Surg 2014; 12: 1500–24 CrossRef MEDLINE |
16. | Harrell FE, Lee KL, Mark DB: Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Stat Med 1996; 15: 361–87 CrossRef |
17. | Murray DM, Varnell SP, Blitstein JL: Design and analysis of group-randomized trials: a review of recent methodological developments. Am J Public Health 2004; 94: 423–32 CrossRef MEDLINE PubMed Central |
18. | Detry MA, Ma Y: Analyzing repeated measurements using mixed models. JAMA 2016; 315: 407–8 CrossRef MEDLINE |
19. | Liang KY, Zeger SL: Longitudinal data analysis using generalized linear models. Biometrika 1986; 73: 13–22 CrossRef |
20. | Bender R: Introduction to the use of regression models in epidemiology. Methods Mol Biol 2009; 471: 179–95 CrossRef MEDLINE |
21. | Hosmer Jr. DW, Lemeshow S, Sturdivant RX: Applied logistic regression. John Wiley & Sons 2013 CrossRef |
22. | Hernán MA, Robins JM: Causal inference: what if. Boca Raton, FL, USA: Chapman & Hall/CRC 2020. |
23. | IQWiG: Konzepte zur Generierung versorgungsnaher Daten und deren Auswertung zum Zwecke der Nutzenbewertung von Arzneimitteln nach § 35a SGB V; Rapid Report A19–43; 2020. |
24. | Röhrig B, du Prel J-B, Wachtlin D, Blettner M: Types of study in medical research: part 3 of a series on evaluation of scientific publications. Dtsch Arztebl Int 2009; 106: 262–8 CrossRef |
25. | Pearl J: Causal inference in statistics: an overview. Statist Surv 2009; 3: 96–146 CrossRef |
26. | Hernán MA: A definition of causal effect for epidemiological research. J Epidemiol Community Health 2004; 58: 265–71 CrossRef MEDLINE PubMed Central |
27. | Tönnies T, Kahl S, Kuss O: Collider bias in observational studies. Dtsch Arztebl Int 2022; 119: 107–22 VOLLTEXT |
28. | Koch-Henriksen N, Sørensen PS, Magyari M: Relapses add to permanent disability in relapsing multiple sclerosis patients. Mult Scler Relat Disord 2021; 53: 103029 CrossRef MEDLINE |
29. | Tönnies T, Schlesinger S, Lang A, Kuss O: Mediation analysis in medical research. Dtsch Arztebl Int 2023; 120: 681–7 VOLLTEXT |
30. | Bender R, Ziegler A, Lange S: Multiple Regression. Dtsch Med Wochenschr 2007; 132: e30–2 CrossRef MEDLINE |
31. | Harrell FE: Regression modeling strategies: with applications to linear models, logistic regression, and survival analysis. New York, NY: Springer 2015. (Springer Series in Statistics) CrossRef |
32. | Peduzzi P, Concato J, Kemper E, Holford TR, Feinstein AR: A simulation study of the number of events per variable in logistic regression analysis. J Clin Epidemiol 1996; 49: 1373–9 CrossRef MEDLINE |
33. | Peduzzi P, Concato J, Feinstein AR, Holford TR: Importance of events per independent variable in proportional hazards regression analysis. II. Accuracy and precision of regression estimates. J Clin Epidemiol 1995; 48: 1503–10 CrossRef MEDLINE |
34. | van Smeden M, de Groot JAH, Moons KGM, et al.: No rationale for 1 variable per 10 events criterion for binary logistic regression analysis. BMC Med Res Methodol 2016; 16: 163 CrossRef MEDLINE PubMed Central |
35. | Courvoisier DS, Combescure C, Agoritsas T, Gayet-Ageron A, Perneger TV: Performance of logistic regression modeling: beyond the number of events per variable, the role of data structure. J Clin Epidemiol 2011; 64: 993–1000 CrossRef MEDLINE |
36. | Riley RD, Snell KIE, Ensor J, et al.: Minimum sample size for developing a multivariable prediction model: part I—continuous outcomes. Stat Med 2019; 38: 1262–75 CrossRef CrossRef CrossRef PubMed Central |
37. | Riley RD, Snell KI, Ensor J, et al.: Minimum sample size for developing a multivariable prediction model: PART II—binary and time-to-event outcomes. Stat Med 2019; 38: 1276–96 CrossRef CrossRef CrossRef PubMed Central |
38. | Bender R, Lange S, Ziegler A: Multiples Testen—Artikel Nr. 12 der Statistik-Serie in der DMW. Dtsch med Wochenschr 2002; 127: T 4–7 CrossRef |
39. | Krzywinski M, Altman N: Multiple linear regression. Nat Methods 2015; 12: 1103–4 CrossRef MEDLINE |
40. | Bland JM, Altman DG: Correlation, regression, and repeated data. BMJ 1994; 308: 896 CrossRef MEDLINE PubMed Central |
41. | Heinze G, Dunkler D: Five myths about variable selection. Transpl Int 2017; 30: 6–10 CrossRef MEDLINE |
e1. | Lange S, Bender R: (Lineare) Regression/Korrelation. Dtsch Med Wochenschr 2001; 126: T 33–5 CrossRef |
e2. | Bender R, Ziegler A, Lange S: Logistische Regression—Artikel Nr. 14 der Statistik-Serie in der DMW. Dtsch med Wochenschr 2002; 127: T 11–3 CrossRef |
e3. | Kleinbaum DG, Klein M: Logistic regression: a self-learning text. 3rd edition. New York, USA: Springer 2010 CrossRef |
e4. | Ziegler A, Lange S, Bender R: Überlebenszeitanalyse: Eigenschaften und Kaplan-Meier Methode—Artikel Nr. 15 der Statistik-Serie in der DMW. Dtsch med Wochenschr 2002; 127: T 14–6 CrossRef |
e5. | Kleinbaum DG, Klein M: Survival analysis a self-learning text. New York: Springer 2011 CrossRef |
e6. | Ziegler A, Lange S, Bender R: Überlebenszeitanalyse: Die Cox-Regression. Dtsch Med Wochenschr 2004; 129: T1–3 CrossRef |
e7. | Coxe S, West SG, Aiken LS: The analysis of count data: a gentle introduction to poisson regression and its alternatives. J Pers Assess 2009; 91: 121–36 CrossRef MEDLINE |