DÄ internationalArchive3/2026Missing Values in Empirical Research Theory and Practice

Review article

Missing Values in Empirical Research Theory and Practice

Part 39 of a Series on the Evaluation of Scientific Publications

Dtsch Arztebl Int 2026; 123: 71-6. DOI: 10.3238/arztebl.m2025.0213

Schaefer, E; Lang, A; Piedboeuf-Potyka, K; Kuss, O

Background: Values that are missing because of drop-outs and other reasons are a major challenge in clinical and epidemiological studies. If not dealt with appropriately, missing values can impair the validity of study findings, distort effect estimates, and reduce statistical power. In this article, we present statistical methods of dealing with missing values in the assessment of scientific publications and compare their suitability for minimizing distortion and improving the precision of estimates.

Methods: A variety of methods of dealing with missing values are presented and discussed on the basis of publications retrieved by a selective search, as well as examples from the authors’ personal experience.

Results: When reading a scientific article, one should ascertain how missing values are dealt with, what assumptions are made, and what methods are applied. The underlying mechanisms—missing completely at random [MCAR], missing at random [MAR], and missing not at random [MNAR]—determine the choice of suitable analytic methods. The exclusion of incomplete observations causes distortion except in the case of an MCAR mechanism. Simple imputation methods, such as mean or regression imputation, generally lead to an underestimate of variance, because they neglect uncertainties. In contrast, in the case of an MAR mechanism, multiple imputation yields reliable results, as it replaces missing values multiple times and thereby takes proper account of estimation uncertainties.

Conclusion: Multiple imputation is an effective method to minimize distortion caused by missing values, but it requires a meticulous examination of the underlying assumptions and of the results.

Cite this as: Schaefer E, Lang A, Piedboeuf-Potyka K, Kuss O: Missing values in empirical research: Theory and practice. Part 39 of a series on the evaluation of scientific publications. Dtsch Arztebl Int 2026; 123: 71–6. DOI: 10.3238/arztebl.m2025.0213

LNSLNS

Although missing values are a common and unavoidable problem in clinical and epidemiological studies (1), their potential to compromise the validity of research findings is often underestimated. In fact, it is common practice only to use simple imputation methods or to completely exclude persons with missing data from the analyses (complete-case analysis) (2). Missing values may be attributable to incorrect or incomplete information provided by participants, flawed survey instruments, technical problems, or the dropout of participants over the course of the study. While careful study planning and structured data management can help to reduce the degree of missing values, it is usually impossible to avoid missing values completely. It has been demonstrated in numerous empirical studies (3, 4, 5) that inadequate handling of missing values can compromise the accuracy of estimates and also give rise to biased conclusions (6). To what extent missing values are influencing study findings depends primarily on the pattern underlying their occurrence. For a randomized trial, this could mean that the prerequisites for a comparable distribution of confounding factors between the groups are no longer met. In scientific publications, it is therefore important to describe the occurrence of missing values in detail and to provide plausible reasons for the choice of methodology used to prevent them. A number of statistical methods are available for handling missing values. As explained in this article, in most cases, multiple imputation is the statistical method of choice, given that it correctly accounts for missing values, reduces bias and increases the accuracy of the estimates obtained (7, 8). However, a recently published review revealed that in many studies, the handling of missing values continues to be inadequately documented or methodologically flawed (9). Of the 334 included cohort studies with missing values, only 49% used formal or analytical methods; 37% of these studies excluded observations with missing values and 14% provided no information about the handling of missing values at all. This highlights the gap that exists in the methodological handling of missing values and underscores the need for greater awareness of this aspect of statistical analysis.

Methods

A variety of methods of dealing with missing values are presented and discussed on the basis of publications retrieved by a selective search of the literature, as well as examples from the authors’ personal experience. The aim of this article is to educate readers about what to look out for regarding missing values when evaluating scientific publications.

Results

First, the question arises at what point missing values in a study actually become a problem. The literature suggests various thresholds above which multiple imputation is recommended; typically, these lie in the range of 5% to 10% (10, 11). However, a simulation study showed that multiple imputations can improve the efficiency of effect estimates regardless of the percentage of missing data (between 1% and 90%) (12). Additionally, the impact of missing data on bias and efficiency depends on both the underlying mechanism (13) and the type of missing values (outcome, exposure, or confounder variables) (14). Therefore, when studying scientific articles, special attention should be paid to whether the reasons and mechanisms that led to missing values have been presented and given due consideration. When selecting appropriate methods for handling missing values, this aspect is of greater importance than the quantitative proportion of missing values (13).

Mechanisms for missing values

Three key mechanisms for missing values are distinguished: “missing completely at random” (MCAR), “missing at random” (MAR), and “missing not at random” (MNAR) (15, 16). In the case of MCAR, the missing values occur randomly and independent of observed and non-observed variables. In the case of MAR, missing values do not occur randomly; instead, their occurrence depends exclusively on other observed variables and not on the missing values themselves. In the case of MNAR, there is a relationship between missing values and the values that are actually not observed. Figure 1 illustrates these mechanisms, using a simple, hypothetical example.

Practical illustration of the mechanisms for missing values for blood pressure readings at two different measurement points
Figure 1
Practical illustration of the mechanisms for missing values for blood pressure readings at two different measurement points

Various approaches can be used to identify the mechanism underlying missing data; here, plausibility considerations and specialist background knowledge play a key role. In the case of MCAR, initial insights can be gained by comparing the characteristics of persons with and without missing values. Absence of systematic differences points to MCAR (17). MAR cannot be determined directly from the data. It is possible, however, to model the probability of missing values as a function of observed variables, e.g., using logistic regression models where missing is treated as a dependent variable. In this way, it can be assessed whether specific observable characteristics are systematically associated with missing values. It is, however, necessary to further substantiate the assumption of MAR. The assumption of MNAR cannot be verified on the basis of the data. In this case, it is necessary to scrutinize whether there are comprehensible reasons why certain values are missing. In the presence of MNAR, the missing values themselves are modeled in addition to the analysis model (18, 19), e.g., using selection models or pattern-mixture models (20).

These aspects are crucial for handling missing data adequately. Current publication guidelines for randomized trials and observational studies stress that authors should describe in detail their handling of missing values, including assumptions about the mechanism of missing data and the methods used and their rationale (21, 22). Box 1 summarizes in key points the aspects that readers of scientific publications should pay attention to in order to assess whether appropriate methods for handling missing data have been used.

What aspects require special attention when reading scientific publications?
Box 1
What aspects require special attention when reading scientific publications?

Handling of missing values

Several statistical methods for handling missing values have been proposed in the literature, provided the MCAR or MAR mechanism is present (6). On the one hand, it is possible to exclude all persons with missing values from the analysis (complete-case analysis). On the other hand, missing values can be treated statistically using imputation methods where missing values are replaced with plausible values. Here, a fundamental distinction is made between the imputation of individual values, e.g., mean imputation or regression imputation, and multiple imputation. Multiple imputation involves replacing missing values differently in each of several copies of the data set.

In the following, we will present various methods for handling missing data based on a sample study and discuss their impact on the estimates obtained from the statistical model. The Table lists the respective correlation coefficients r and regression coefficients β with the corresponding 95% confidence intervals (CI) for each method. In our example, we look at the relationship between height and body weight of 500 persons in a simulated data set (eSupplement Table 1–2). Figure 2a presents a scatter plot illustrating the relationship between height (cm) and body weight (kg) in the study population. The complete data set without missing values yields a correlation coefficient of r = 0.52 (95% CI: [0.45; 0.58]) and a regression coefficient of β = 0.76 [0.65; 0.87].

Relationship between height (x) and body weight (y) based on the example study
Figure 2
Relationship between height (x) and body weight (y) based on the example study
Overview of the effect estimates for the relationship between height (x) and body weight (y), based on the data of the example study
Table
Overview of the effect estimates for the relationship between height (x) and body weight (y), based on the data of the example study

Suppose it was not possible to measure the height of 153 participants in the study (31%), for example due to staff shortages. The reason for the missing values would then meet the MCAR assumption. One option would be to exclude observations with missing values and thus perform the analysis on the basis of complete cases only (complete-case analysis). Under these conditions, r is 0.52 [0.43; 0.59] and β is 0.76 [0.63; 0.89] (Figure 2b), i.e., an exact replication of the parameter estimates from the full data set. However, the sample size on which the new results are based has been reduced. Consequently, there is a loss of statistical power and the confidence intervals are wider. Furthermore, only if the MCAR criterion is met will the exclusion of observations with missing values yield an unbiased estimate.

Simple imputation methods

With mean imputation, all missing values for a variable are replaced by its mean value. This method is simple and easy to apply. The mean of the data remains unchanged after the imputation (23). However, mean imputation underestimates the variance of the data, since it does not reflect the dispersion of the actual values. After imputation of mean height (169.7 cm) for all 153 missing values, r is 0.43 [0.36; 0.50] and β is 0.76 [0.62; 0.90] (Figure 2c). Thus, underestimating the variance in height due to mean imputation leads to an underestimation of the correlation between height and body weight.

The regression imputation method replaces missing values with values predicted from a regression analysis using the observed values. In contrast to simple mean imputation, regression imputation makes use of the relationship between variables, thereby providing more plausible estimates of the missing values. However, this method also underestimates the variance of the data, since it requires a linear relationship between the variables. In the case of non-linear relationships, this can lead to incorrect estimates (23). In addition, when values with a perfectly linear correlation are imputed, the estimated value of the correlation is significantly overestimated. Accordingly, we find that r is 0.69 [0.65; 0.74] and β is 0.76 [0.69; 0.83] (Figure 2d).

As shown above, from a statistical point of view, the use of simple imputation methods is problematic. Since their uncritical application can introduce bias, they should not be considered an optimal method for handling missing values (24, 25). The methods “last observation carried forward“ (LOCF) and „baseline observation carried forward“ (BOCF) are also simple imputation methods. They can only be applied to longitudinal data sets with multiple measurement points per person for the same variable. An already obtained value, e.g. the last observed measurement, is used to replace a measurement that is missing at a later point in time. However, this also produces a fictitious data point and the underestimation of variance is not taken into account (26, 27). As illustrated in Box 2, even the LOCF method, which is often described as conservative or a worst-case scenario approach, can introduce bias into the results. For this reason, the European Medicines Agency (EMA) recommends that LOCF and BOCF should not be used as the primary method for handling missing values in studies (28). Multiple imputation, on the other hand, is a more flexible method since it covers a wider range of missing data and at the same time appropriately takes the variability of estimates into account.

Potential distortions caused by the lastobservation- carried-forward method (LOCF method) for missing data based on a practical example (<a class=29)" width="250" src="https://cf.aerzteblatt.de/bilder/181121-250-0" loading="lazy" data-bigsrc="https://cf.aerzteblatt.de/bilder/181121-1400-0" data-fullurl="https://cf.aerzteblatt.de/bilder/2026/03/img293329050.png" />
Box 2
Potential distortions caused by the lastobservation- carried-forward method (LOCF method) for missing data based on a practical example (29)

Multiple imputation and its application in practice

When applying the multiple imputation method, the statistical software generates several versions of the original data set. In this process, missing values are estimated using other variables that are incorporated into the model in addition to the affected variables. To reflect the associated uncertainty, this process is repeated multiple times, with each iteration producing slightly different data sets. This approach reflects the uncertainty about the “true” value more realistically than it would be possible using a single imputed value (7). The method is easy to apply and consists of three steps: I. creation of multiple (m) imputed data sets with plausible values; II. analysis of all m imputed data sets; III. pooling of parameter estimates from the m imputed data sets (eSupplement Table 3 and Figure). The corresponding Statistical Analysis System Codes (SAS Codes) and R codes are described in detail in eSupplement Table 4–5.

The number of imputations m can affect the precision of the variance estimate. A smaller number of imputations results in a less accurate estimate of the mean variance, giving rise to greater variability between estimates. While the accuracy of the estimate can be improved with a higher number of imputations, this approach also generates data sets with more extreme patterns of imputed values. For this reason, there are generally no relevant differences resulting from m. Typically, a low double-digit number of imputations is sufficient.

Discussion

When conducting clinical and epidemiological studies, missing values are a widespread and unavoidable problem that can have an impact on the results. A variety of statistical methods are available for handling missing values. These include complete-case analysis, simple imputation methods and the multiple imputation method. The use of the first two options is generally not recommended. The exclusion of persons with missing values can result in a non-representative sample, less precise results and a loss of statistical power. Simple imputation methods, such as mean imputation or regression imputation can cause distortion of variance and consequently an underestimation of statistical uncertainty. For this reason, the use of multiple imputations is recommended when handling missing values. Provided that MCAR or MAR assumptions apply, this method creates multiple data sets in order to take the uncertainties in estimating missing values into account, thereby reducing potential distortions.

By providing examples, we have illustrated how imputation can be applied in practice and what needs to be taken into account. Applying the multiple imputation method offers numerous advantages. A key aspect is that multiple imputation does not require the restrictive MCAR assumption, but already provides valid results under the less restrictive MAR assumption (15). Imputed data sets can be analyzed in a flexible way using standard software and reused for a variety of analyses (30, 31). Estimation is simple and usually requires only a few imputations. Furthermore, the multiple imputation method has been validated in empirical studies and yielded results similar to those found with data sets without missing values (32, 33).

It is, however, important when reading scientific publications to be aware of the potential problems associated with multiple imputation analyses. These include the handling of categorical variables, the plausibility of the MAR assumption and the correct inclusion of the outcome variables in regression analyses (34, 35). Several factors influence the effects of missing values on distortion and accuracy. These include the proportion of complete observations in relation to the total data set of the sample, the underlying mechanism of missing values and possible systematic differences between complete and incomplete observations (35). If the MAR assumption does not apply, the bias in multiple imputation-based analyses can be just as large or even larger than in analyses based exclusively on complete cases. If, on the other hand, the MNAR mechanism applies, violating the MAR assumption results in systematic bias, since the imputation is based on a false assumption. This can distort effect estimates and lead to potentially wrong conclusions. For this reason, it is of particular importance to critically examine the validity of the MAR assumption. Where background knowledge or uncertainties about the mechanisms of missing values cast doubt on the validity of the MAR assumption, sensitivity analyses should be carried out to assess the robustness of the results (36). Ultimately, there is no clear rule as to how many missing values in a data set are acceptable for the multiple imputation method to be applied. Nevertheless, the use of multiple imputation can be advantageous even with a low single-digit number (5–10%) of missing values (10, 11). In general, complete data sets are the best basis for valid analyses. Consequently, researchers should strive to keep the proportion of missing values as low as possible.

In summary, multiple imputation is an effective method for minimizing the effects of missing values in clinical and epidemiological studies. When interpreting scientific publications, however, it is imperative to critically question and carefully examine the underlying assumptions.

Conflict of interest
The authors declare that no conflict of interest exists.

Manuscript received on 25 April 2025, revised version accepted on 14 November 2025

Translated from the original German by Ralf Thoene, M.D.

Corresponding author
Dr. PH Alexander Lang

alexander.lang@ddz.de

1.
Powney M, Williamson P, Kirkham J, Kolamunnage-Dona R: A review of the handling of missing longitudinal outcome data in clinical trials. Trials 2014; 15: 237 CrossRef MEDLINE PubMed Central
2.
Bell ML, Fiero M, Horton NJ, Hsu CH: Handling missing data in RCTs; A review of the top medical journals. BMC Med Res Methodol 2014; 14: 118 CrossRef MEDLINE PubMed Central
3.
Spiess M, Goebel J: On the effect of item nonresponse on the estimation of a two–panel–waves wage equation. Allgemeines Statistisches Archiv 2005; 89: 63–74 CrossRef
4.
Blankers M, Koeter MWJ, Schippers GM: Missing data approaches in eHealth research: Simulation study and a tutorial for nonmathematically inclined researchers. J Med Internet Res 2010; 12: e54 CrossRef MEDLINE PubMed Central
5.
White IR, Carlin JB: Bias and efficiency of multiple imputation compared with complete-case analysis for missing covariate values. Stat Med 2010; 29: 2920–31 CrossRef MEDLINE
6.
Schafer JL, Graham JW: Missing data: Our view of the state of the art. Psychol Methods 2002; 7: 147–77 CrossRef
7.
Rubin DB: Multiple Imputation for Nonresponse in Surveys. New York: Wiley 1987 CrossRef
8.
Royston P: Multiple imputation of missing values. Stata Journal 2004; 4: 227–41 CrossRef
9.
Wu TT, Smith LH, Vernooij LM, Patel E, Devlin JW: Data missingness reporting and use of methods to address it in critical care cohort studies. Crit Care Explor 2023; 5: e1005 CrossRef MEDLINE PubMed Central
10.
Schafer JL: Multiple imputation: A primer. Stat Methods Med Res 1999; 8: 3–15 CrossRef CrossRef MEDLINE
11.
Bennett DA: How can I deal with missing data in my study? Aust N Z J Public Health 2001; 25: 464–9 CrossRef MEDLINE
12.
Madley-Dowd P, Hughes R, Tilling K, Heron J: The proportion of missing data should not be used to guide decisions on multiple imputation. J Clin Epidemiol 2019; 110: 63–73 CrossRef MEDLINE PubMed Central
13.
Lee JH, Huber JC Jr.: Evaluation of multiple imputation with large proportions of missing data: How much is too much? Iran J Public Health 2021; 50: 1372–80.
14.
Lee KJ, Carlin JB: Recovery of information from multiple imputation: A simulation study. Emerg Themes Epidemiol 2012; 9: 3 CrossRef MEDLINE PubMed Central
15.
Rubin DB: Inference and missing data. Biometrika 1976; 63: 581–92 CrossRef
16.
Little RJ, Rubin DB: Statistical analysis with missing data. Hoboken, New Jersey: John Wiley & Sons, 2019 CrossRef
17.
Little RJ: A test of missing completely at random for multivariate data with missing values. J Am Stat Assoc 1988; 83: 1198–202 CrossRef
18.
Holman R, Glas CA: Modelling non-ignorable missing-data mechanisms with item response theory models. Br J Math Stat Psychol 2005; 58 (Pt 1): 1–17 CrossRef MEDLINE
19.
Glas CAW, Pimentel JL: Modeling nonignorable missing data in speeded tests. Educational and Psychological Measurement 2008; 68: 907–22 CrossRef
20.
Glas CAW: Missing Data. In: Peterson P, Baker E, McGaw B (eds.): International Encyclopedia of Education. Oxford: Elsevier, 2010: 283–8 CrossRef
21.
Hopewell S, Chan AW, Collins GS, et al.: CONSORT 2025 statement: Updated guideline for reporting randomised trials. BMJ 2025; 389: e081123 CrossRef MEDLINE PubMed Central
22.
Cuschieri S: The STROBE guidelines. Saudi J Anaesth 2019; 13 (Suppl 1): S31–S4 CrossRef MEDLINE PubMed Central
23.
Haukoos JS, Newgard CD: Advanced statistics: Missing data in clinical research—part 1: An introduction and conceptual framework. Acad Emerg Med 2007; 14: 662–8 CrossRef
24.
Vach W, Blettner M: Biased estimation of the odds ratio in case-control studies due to the use of ad hoc methods of correcting for missing values for confounding variables. Am J Epidemiol 1991; 134: 895–907 CrossRef MEDLINE
25.
Carpenter J, Kenward M: Missing data in randomised controlled trials—A practical guide. Health Technology Assessment Methodology Programme, Birmingham, p. 199. https://researchonline.lshtm.ac.uk/id/eprint/4018500 (last accessed on 3 December 2025).
26.
Siddiqui O, Ali MW: A comparison of the random-effects pattern mixture model with last-observation-carried-forward (LOCF) analysis in longitudinal clinical trials with dropouts. J Biopharm Stat 1998; 8: 545–63 CrossRef MEDLINE
27.
Kenward MG, Molenberghs G: Last observation carried forward: A crystal ball? J Biopharm Stat 2009; 19: 872–88 CrossRef MEDLINE
28.
European Medicines Agency: Missing data in confirmatory clinical trials—Scientific guideline. 2010. www.ema.europa.eu/en/missing-data-confirmatory-clinical-trials-scientific-guideline (last accessed on 3 December 2025).
29.
Lachin JM: Fallacies of last observation carried forward analyses. Clin Trials 2016; 13: 161−8 CrossRef MEDLINE PubMed Central
30.
Royston P: Multiple imputation of missing values: Update of ice. Stata Journal 2005; 5: 527–36 CrossRef
31.
SAS Institute Inc. 2015. SAS/STAT® 14.1 User’s Guide. Cary N, Inc. SI.
32.
Kolaja CA, Porter B, Powell TM, Rull RP, Millennium Cohort Study Team: Multiple imputation validation study: Addressing unmeasured survey data in a longitudinal design. BMC Med Res Methodol 2021; 21: 5 CrossRef MEDLINE PubMed Central
33.
Wahl S, Boulesteix AL, Zierer A, Thorand B, van de Wiel MA: Assessment of predictive performance in incomplete data by combining internal validation and multiple imputation. BMC Med Res Methodol 2016; 16: 144 CrossRef MEDLINE PubMed Central
34.
Sterne JAC, White IR, Carlin JB, et al.: Multiple imputation for missing data in epidemiological and clinical research: Potential and pitfalls. BMJ 2009; 338: b2393 CrossRef MEDLINE PubMed Central
35.
Newgard CD, Haukoos JS: Advanced statistics: Missing data in clinical research—part 2: Multiple imputation. Acad Emerg Med 2007; 14: 669–78 CrossRef CrossRef
36.
Kenward MG, Carpenter J: Multiple imputation: Current perspectives. Stat Methods Med Res 2007; 16: 199–218 CrossRef MEDLINE
*These two authors are co-first authors.
Institute of Biometrics and Epidemiology, German Diabetes Center (DDZ), Leibniz Center for Diabetes Research at the Heinrich Heine University Düsseldorf, Düsseldorf, Germany: Edyta Schaefer, M.Sc.; Dr. PH Alexander Lang; Katharina Piedboeuf-Potyka, M.Sc.; Prof. Dr. sc. hum. Oliver Kuss
German Center for Diabetes Research (DZD), Partner Düsseldorf, München-Neuherberg, Germany: Edyta Schaefer, M.Sc.; Dr. PH Alexander Lang; Prof. Dr. sc. hum. Oliver Kuss
Centre for Health and Society, Medical Faculty and University Hospital of Düsseldorf, Heinrich Heine University Düsseldorf, Düsseldorf, Germany: Prof. Dr. sc. hum. Oliver Kuss
What aspects require special attention when reading scientific publications?
Box 1
What aspects require special attention when reading scientific publications?
Potential distortions caused by the lastobservation- carried-forward method (LOCF method) for missing data based on a practical example (29)
Box 2
Potential distortions caused by the lastobservation- carried-forward method (LOCF method) for missing data based on a practical example (29)
Practical illustration of the mechanisms for missing values for blood pressure readings at two different measurement points
Figure 1
Practical illustration of the mechanisms for missing values for blood pressure readings at two different measurement points
Relationship between height (x) and body weight (y) based on the example study
Figure 2
Relationship between height (x) and body weight (y) based on the example study
Overview of the effect estimates for the relationship between height (x) and body weight (y), based on the data of the example study
Table
Overview of the effect estimates for the relationship between height (x) and body weight (y), based on the data of the example study
1.Powney M, Williamson P, Kirkham J, Kolamunnage-Dona R: A review of the handling of missing longitudinal outcome data in clinical trials. Trials 2014; 15: 237 CrossRef MEDLINE PubMed Central
2.Bell ML, Fiero M, Horton NJ, Hsu CH: Handling missing data in RCTs; A review of the top medical journals. BMC Med Res Methodol 2014; 14: 118 CrossRef MEDLINE PubMed Central
3.Spiess M, Goebel J: On the effect of item nonresponse on the estimation of a two–panel–waves wage equation. Allgemeines Statistisches Archiv 2005; 89: 63–74 CrossRef
4.Blankers M, Koeter MWJ, Schippers GM: Missing data approaches in eHealth research: Simulation study and a tutorial for nonmathematically inclined researchers. J Med Internet Res 2010; 12: e54 CrossRef MEDLINE PubMed Central
5.White IR, Carlin JB: Bias and efficiency of multiple imputation compared with complete-case analysis for missing covariate values. Stat Med 2010; 29: 2920–31 CrossRef MEDLINE
6.Schafer JL, Graham JW: Missing data: Our view of the state of the art. Psychol Methods 2002; 7: 147–77 CrossRef
7.Rubin DB: Multiple Imputation for Nonresponse in Surveys. New York: Wiley 1987 CrossRef
8.Royston P: Multiple imputation of missing values. Stata Journal 2004; 4: 227–41 CrossRef
9.Wu TT, Smith LH, Vernooij LM, Patel E, Devlin JW: Data missingness reporting and use of methods to address it in critical care cohort studies. Crit Care Explor 2023; 5: e1005 CrossRef MEDLINE PubMed Central
10.Schafer JL: Multiple imputation: A primer. Stat Methods Med Res 1999; 8: 3–15 CrossRef CrossRef MEDLINE
11.Bennett DA: How can I deal with missing data in my study? Aust N Z J Public Health 2001; 25: 464–9 CrossRef MEDLINE
12.Madley-Dowd P, Hughes R, Tilling K, Heron J: The proportion of missing data should not be used to guide decisions on multiple imputation. J Clin Epidemiol 2019; 110: 63–73 CrossRef MEDLINE PubMed Central
13.Lee JH, Huber JC Jr.: Evaluation of multiple imputation with large proportions of missing data: How much is too much? Iran J Public Health 2021; 50: 1372–80.
14.Lee KJ, Carlin JB: Recovery of information from multiple imputation: A simulation study. Emerg Themes Epidemiol 2012; 9: 3 CrossRef MEDLINE PubMed Central
15.Rubin DB: Inference and missing data. Biometrika 1976; 63: 581–92 CrossRef
16.Little RJ, Rubin DB: Statistical analysis with missing data. Hoboken, New Jersey: John Wiley & Sons, 2019 CrossRef
17.Little RJ: A test of missing completely at random for multivariate data with missing values. J Am Stat Assoc 1988; 83: 1198–202 CrossRef
18.Holman R, Glas CA: Modelling non-ignorable missing-data mechanisms with item response theory models. Br J Math Stat Psychol 2005; 58 (Pt 1): 1–17 CrossRef MEDLINE
19.Glas CAW, Pimentel JL: Modeling nonignorable missing data in speeded tests. Educational and Psychological Measurement 2008; 68: 907–22 CrossRef
20.Glas CAW: Missing Data. In: Peterson P, Baker E, McGaw B (eds.): International Encyclopedia of Education. Oxford: Elsevier, 2010: 283–8 CrossRef
21.Hopewell S, Chan AW, Collins GS, et al.: CONSORT 2025 statement: Updated guideline for reporting randomised trials. BMJ 2025; 389: e081123 CrossRef MEDLINE PubMed Central
22.Cuschieri S: The STROBE guidelines. Saudi J Anaesth 2019; 13 (Suppl 1): S31–S4 CrossRef MEDLINE PubMed Central
23.Haukoos JS, Newgard CD: Advanced statistics: Missing data in clinical research—part 1: An introduction and conceptual framework. Acad Emerg Med 2007; 14: 662–8 CrossRef
24.Vach W, Blettner M: Biased estimation of the odds ratio in case-control studies due to the use of ad hoc methods of correcting for missing values for confounding variables. Am J Epidemiol 1991; 134: 895–907 CrossRef MEDLINE
25.Carpenter J, Kenward M: Missing data in randomised controlled trials—A practical guide. Health Technology Assessment Methodology Programme, Birmingham, p. 199. https://researchonline.lshtm.ac.uk/id/eprint/4018500 (last accessed on 3 December 2025).
26.Siddiqui O, Ali MW: A comparison of the random-effects pattern mixture model with last-observation-carried-forward (LOCF) analysis in longitudinal clinical trials with dropouts. J Biopharm Stat 1998; 8: 545–63 CrossRef MEDLINE
27.Kenward MG, Molenberghs G: Last observation carried forward: A crystal ball? J Biopharm Stat 2009; 19: 872–88 CrossRef MEDLINE
28.European Medicines Agency: Missing data in confirmatory clinical trials—Scientific guideline. 2010. www.ema.europa.eu/en/missing-data-confirmatory-clinical-trials-scientific-guideline (last accessed on 3 December 2025).
29.Lachin JM: Fallacies of last observation carried forward analyses. Clin Trials 2016; 13: 161−8 CrossRef MEDLINE PubMed Central
30.Royston P: Multiple imputation of missing values: Update of ice. Stata Journal 2005; 5: 527–36 CrossRef
31.SAS Institute Inc. 2015. SAS/STAT® 14.1 User’s Guide. Cary N, Inc. SI.
32.Kolaja CA, Porter B, Powell TM, Rull RP, Millennium Cohort Study Team: Multiple imputation validation study: Addressing unmeasured survey data in a longitudinal design. BMC Med Res Methodol 2021; 21: 5 CrossRef MEDLINE PubMed Central
33.Wahl S, Boulesteix AL, Zierer A, Thorand B, van de Wiel MA: Assessment of predictive performance in incomplete data by combining internal validation and multiple imputation. BMC Med Res Methodol 2016; 16: 144 CrossRef MEDLINE PubMed Central
34.Sterne JAC, White IR, Carlin JB, et al.: Multiple imputation for missing data in epidemiological and clinical research: Potential and pitfalls. BMJ 2009; 338: b2393 CrossRef MEDLINE PubMed Central
35.Newgard CD, Haukoos JS: Advanced statistics: Missing data in clinical research—part 2: Multiple imputation. Acad Emerg Med 2007; 14: 669–78 CrossRef CrossRef
36.Kenward MG, Carpenter J: Multiple imputation: Current perspectives. Stat Methods Med Res 2007; 16: 199–218 CrossRef MEDLINE