DÄ internationalArchive24/2024Interrupted Time Series for Assessing the Causality of Intervention Effects

Review article

Interrupted Time Series for Assessing the Causality of Intervention Effects

Part 35 of a series on evaluating scientific publications

Dtsch Arztebl Int 2024; 121: 800-4. DOI: 10.3238/arztebl.m2024.0150

Mathes, T; Röding, D; Stegbauer, C; Laxy, M; Pieper, D

Background: The gold standard for evaluating interventions in medicine and health care is the randomized controlled trial (RCT). In practice, however, RCTs may be difficult to conduct because of high costs, ethical aspects, or practical considerations. This is particularly true of studies on the population level, e.g., for the evaluation of health policy measures.

Methods: We present a type of study design called the interrupted time series (ITS) and its critical interpretation, with several illustrative examples. This discussion is based on selected methodological publications.

Results: ITS are suitable for the assessment of interventions with a clear point of intervention (interruption). They are analyzed with the statistical methods of time-series analysis. One strength of ITS is that they can be used to estimate an immediate effect as well as a gradually developing effect. Under certain assumptions, the findings of an ITS analysis can be interpreted causally. The main assumption underlying an ITS is that the trend after the intervention would have been exactly the same as the trend before the intervention if the intervention had not taken place and all other conditions had remained unchanged. A further assumption is that there should be no differences in the pre- versus postintervention phases in the subjects or other entities being tested (e.g., hospitals) that might affect the measured endpoints (e.g., differences in mean age affecting measured mortality). Moreover, the intervention periods must be properly distinct from one another in order to prevent biased effect estimates. The robustness of the assumptions should also be checked with sensitivity analyses.

Conclusion: As long as all sources of bias have been avoided and the findings are both plausible and robust, the effects revealed by ITS can be interpreted as causal. ITS may serve as an alternative method for evaluating intervention effects when an RCT cannot be performed.

LNSLNS

The gold standard for evaluating interventions in medicine and health care is the randomized controlled trial (RCT). In terms of internal validity, this study design is superior to all others. Good and successful randomization between groups (intervention arms) tends to create structural equality in terms of known and unknown as well as unobserved factors. The prerequisite of causal statements is structural equality of the groups being compared. In practice, however, RCTs may be difficult to conduct due to high costs, ethical aspects, or practical considerations (1, 2).

A rough distinction can be made between interventions on the individual level and those on the population level. On the individual level, individuals are allocated either to the intervention or to the control group. This can be achieved by randomized (RCT) or non-randomized allocation (non-randomized controlled intervention study) by the study director or through self-allocation of individuals (e.g., diet, medication use) to one of the groups. If interventions such as health policy measures are implemented on a large scale, i.e., nationally or regionally, it is generally not possible for the principal investigator to allocate subjects, nor can subjects allocate themselves at the individual level. Whether a person receives the intervention or not is determined here according to which group or population they belong to (for example, individuals who live in a particular district, federal state, or country). This is the case, for example, when the remuneration system in hospitals is changed or in the case of public awareness campaigns in the mass media. In certain cases, a stepped-wedge cluster RCT may not be possible (for example, when evaluating changes due to new legislation) or may be very difficult (for example, for cost reasons), meaning that alternative controlled non-randomized study designs, which also include interrupted time series (ITS) studies, are needed (2, 3). Initial considerations on this study design were presented in an earlier article in this series (4). It is from this point that we start.

Methods

The article was written in an iterative process based on a selective literature search. First, we explain the basic assumptions and methods of ITS and outline the most important requirements for an estimation of unbiased results. Finally, an example taken from the field of smoking prevention is used to illustrate the most significant aspects.

Basic assumptions of ITS design and comparison with other study designs

With ITS, the intervention effect is estimated on the basis of longitudinal data (3). ITS are well-suited to the evaluation of interventions (for example, the implementation of a disease management program) that occur at a clearly defined point in time (interruption). ITS can also be used to assess several chronologically consecutive interventions with the same objective, that is to say, several interruptions (for example, measures to improve health care). The analysis takes place at the population level or similar cluster level, such as regions or administrative units (for example, a hospital association). The main characteristic of ITS is that they use data at multiple time points, both before and after the the intervention is implemented. The data from before the intervention is implemented make it possible to assess a time trend in relation to an outcome (for example, mortality, quality of care). Using the time trend before the implementation of the intervention, it is possible to extrapolate how the outcome would develop over time without the intervention (trend extrapolation). The forecast is then compared with the trend actually observed. If the data and methods of analysis are adequate (see chapter on “Reliability of the results”), effect estimates obtained from ITS are deemed to be reliable for the estimation of intervention effects (5).

A distinction needs to be made between ITS and “usual” before-and-after studies (BA). The main difference between an ITS and a BA is that an ITS always includes at least three measurements, both before and after the intervention, in the analysis, whereas a BA considers a maximum of two measurements or a value aggregated over several measurements (for example, the mean of three measurements of quality of life) (6). This also gives rise to the greatest weakness of a BA compared to an ITS analysis, namely, that it is not possible to adequately assess the underlying trend, hence the intervention effect cannot be distinguished from this trend. In addition, “natural” fluctuations, including regression towards the mean (an extreme value is usually followed by a value that is closer to the mean), are not sufficiently taken into consideration.

One advantage of ITS over a comparison of individuals with and without an intervention (control group) in a cohort study is that no parallel control group is needed and no matching or adjustment processes are required to obtain an unbiased estimate of causal effect. This is due to the fact that, thanks to the design, both observed and unobserved confounders (i.e., variables that can influence the selection of the intervention and may have an effect on the endpoint) are controlled for. However, a prerequisite of this is that the main assumptions are met (see below).

Analysis methods

Time series analysis methods are used to analyze ITS. The most frequently used analysis methods for ITS include ARIMA (autoregressive integrated moving average) models and segmented regression analyses (7). One reason for this is that they make it possible to estimate both an immediate effect and a change in trend. An immediate change to the endpoint (for example, an immediate reduction in the mortality rate) corresponds to a change in the level at the y-axis directly after implementation of the intervention. A change in the underlying time trend (for example, an annual change in the mortality rate) can be determined by the difference in slope of the regression line before and after implementation of the intervention. Figure 1 illustrates the basic principle of an ITS analysis (based on [8]). The option to separate in time the effect of the intervention can be considered one of the strengths of ITS design, since the time at which the effect occurs may be important for decision-making. For example, in some situations, the immediate effect may be of particular importance (for example, infection control measures during a pandemic), while in others, the change in trend may be more important (for example, a drop in the infection rate as a result of a vaccination campaign).

Fictitious interrupted time series study
Figure 1
Fictitious interrupted time series study

An example of segmented regression analysis is presented below as an option for an ITS analysis. The model can be mathematically formulated as follows (compare Figures):

Y(t) = B0 + B1*Pre-intervention T + B2*postintervention (T–Ti) + B3*intervention Xt + e(t).

Here, Y(t) is the outcome for the endpoint in month t. B0 describes the level (for example, rate to baseline). In this model, B1 estimates the slope of the regression line in the period prior to the intervention and B2, the slope of the regression line in the period after the intervention. The difference in slope is calculated by B2 minus B1 (trend change). B3 estimates the y-level change (immediate effect) as the difference between the extrapolated first point following the intervention and the first point actually measured following the intervention. Confidence intervals can be constructed for these parameters and statistical tests used for comparison. In the analysis, autocorrelation (the correlation of measurement points with previous measurements) should be taken into consideration, since observations made in close succession usually resemble each other more closely than observations made further apart and/or periodically fluctuating observations (for example, in the case of infections that occur more frequently during certain seasons). Therefore, it is assumed that the error term e(t) is autoregressive.

This model can be used for virtually any type of regression (for example, linear regression, logistic regression). Since regression analysis covers a wide spectrum of methods, ITS are highly flexible with regard to the type of endpoints (for example, continuous data, count data) and various other aspects (for example, taking non-linear relationships into consideration). The choice of method for the regression analysis can have a significant effect on the results (9); therefore, this should always be specified in the study protocol.

Reliability of the results

The main assumption underlying an ITS is that the trend after the intervention would have been exactly the same as the trend before the intervention if the intervention had not taken place and all other conditions had remained unchanged. This assumption cannot be empirically tested. To avoided biases, it is essential to have precise knowledge of the context of the intervention. In this way, potential simultaneous interventions or changes in the boundary conditions that could affect the endpoints can be excluded or controlled for in the statistical models. In order to estimate the trend correctly, one needs to take into account not only regular fluctuations (seasons, quarters, etc.) but also potential exogenous events (for example, accompanying health policy measures) or factors that vary over time (for example, hospital closures) and could affect the trend of the endpoint under investigation. It is often the case that the required information is not included in the data used for the analysis, needing instead to be obtained from other sources (e.g., quality reports, publications by the legislature). It is particularly important to take these confounders into consideration when they occur in close temporal proximity to the intervention being investigated.

Since this is not a repeated measurement of the same individuals, one must also ensure that the subjects or other entities being tested (for example, hospitals) in the pre-intervention phase do not systematically differ from the subjects in the postintervention phase in terms of characteristics that might affect the measured endpoints (for example, age and mortality). Furthermore, the intervention should not affect (for example, as a result of altered documentation requirements) the quality of data (for example, data completeness).

In order to avoid biased effect estimates, the intervention periods must be properly distinct from one another. If it takes a certain amount of time to implement the intervention, for example, for logistical reasons, a transition phase should be taken into consideration in addition to the pre- and postintervention phase. The same applies to interventions that could have an effect even before their implementation (for example, the announcement of laws).

At least three measurements from before and three measurements from after an intervention should be available, since for technical statistical reasons, this is the minimum number needed to estimate a trend. This requirement is also supported by the Cochrane Collaboration (10). However, the appropriate number of measurements always depends on the research question as well as the course of, and fluctuations in, the underlying trend. For example, in the case of seasonal fluctuations (e.g., infections in winter), at least three measurements should be available for each season. Furthermore, variability and intervals between the tested periods can affect the data. Therefore, depending on the case, the overall number of measurements can be very high. Thus, studies indicate that more than three measurements are usually required, since otherwise the results may be anti-conservative or the statistical significance very low due to incorrectly estimating the trend, or to be more precise, excessively weighting short-term effects (9).

Due to the main assumptions and sources of bias mentioned above, a causal interpretation of the treatment effect should be underpinned by plausibility checks and sensitivity analyses. Although this is not strictly necessary, it increases the reliability of the results of the primary analysis if these are confirmed. There are two options in particular to achieve this. By using what are referred to as negative controls, it is possible, for example, to investigate whether there is a change in endpoints that should not be affected by the intervention (11). If an effect can be observed here, too, this indicates that there could be unknown trends or co-interventions, in which case the main assumption to avoid biases would be violated. An example of this would be that, following the introduction of a tax on alcopops, not only the demand for alcopops but also the demand for beer falls, since, for example, a prevention campaign was conducted at the same time. When using negative controls, on the other hand, one investigates whether the primary endpoint changes in subpopulations not affected by the intervention and thus in which no effect should be seen. If, for instance, positive effects are seen in age groups that are not expected to benefit from the introduction of an age-specific screening program and cannot be plausibly explained by spillover effects, one can assume that the effect estimate is biased due to the violation of the main assumption.

The Box summarizes the most important points for the critical interpretation of results from an ITS analysis.

Checklist for interrupted time series studies
Box
Checklist for interrupted time series studies

Use-case example

Figure 2 shows an example of an ITS analysis. An investigation was conducted to determine whether the increased cigarette tax in Pennsylvania and the associated price increase had an effect on the number of smokers among 18- to 39-year-olds (12). The increase occurred in the first quarter of 2004, and data collection spanned from the first quarter of 1998 to the fourth quarter of 2010. Thus, 23 measurements from before the intervention and 29 measurements from after the intervention are available. The dataset used for the analysis can be found in the eSupplement. An ARIMA model was used for the analysis.

Use-case example of an ITS
Figure 2
Use-case example of an ITS

The blue line shows the actual trend. The red line describes the trend estimated by the model on the assumption that the tax was not increased. It runs virtually horizontal, corresponding to a consistent percentage of smokers over time. Both the y-level and slope (regression line) of the observed values clearly deviate from the estimated trend. The change in level (own calculation) is –0.20%, 95% confidence interval [−1.86%; 1.46%] and in slope –0.27% [−0.36%; –0.18%] per quarter. If one were to evaluate the same data with a t-test as the BA, the result would be –4.33% [–5.90%; –2.75%], thus tending towards a slight underestimation of the effect (–0.27% per quarter, compared to an average of –4.33% across all quarters).

The change in level can be interpreted as an immediate effect of the tax increase on the number of smokers. However, the effect is relatively small (−0.20) and imprecise [–1.86%; 1.46%], which is due in particular to the relatively high fluctuation in the number of smokers between the individual survey time points. Irrespective of the cause of this fluctuation (for example, unsystematic measurement error), it means that the apparently abrupt fall in the rate of smoking cannot be unequivocally attributed to the intervention. Under certain circumstances, a “normal” statistical test for the before-and-after comparison could have produced a different result here (that is to say, a greater effect and/or statistical significance) and, as such, might be misinterpreted as a direct effect.

The change in the trend (slope of the regression line) from a virtually constant progression before the 3rd quarter of 2003 to a slight fall thereafter can be interpreted as a gradually developing effect of the intervention. Here, the 95% confidence interval does not include the zero effect.

To the extent possible, the authors of this study ensured that no other measures were carried out simultaneously that could have contributed to this effect (for example, awareness campaigns). Since no evidence of this was found, one can, with a relatively high degree of certainty, attribute the investigated effect to the tax increase. However, sensitivity analyses, for example, using negative endpoints (such as alcohol consumption), that could have further increased the quality of the evidence were not conducted.

Conclusion

ITS are a non-randomized controlled study design that can be used to evaluate measures at the population level if sufficient measurements from before and after the intervention are available. Compared to a simple before-and-after comparison, this design has the advantage that the potential intervention effect can be separated from a possible underlying secular trend. However, in order to ensure that an ITS analysis has a high level of internal validity, the requirements in terms of the data used need to be relatively high. It is not possible to verify the main assumption that the pre-intervention trend would have continued in the postintervention phase without the intervention. Therefore, one needs to have good knowledge of the intervention context, especially regarding other measures that may be taking place simultaneously, in order to consider further data in the ITS model if necessary. Various robustness tests and sensitivity analyses can also contribute to increasing the reliability of the results of an ITS.

As long as all sources of bias have been avoided and the findings are both plausible and robust, the effects revealed by ITS can be interpreted as causal. Thus, ITS may serve as an alternative method for evaluating intervention effects when a randomized controlled trial cannot be performed.

Conflict of interest statement
CS is the spokesperson for the Department of Public Health of the EbM Network.

The remaining authors declare that no conflict of interest exists.

Manuscript submitted on 21 March 2024, revised version accepted on
11 July 2024.

Translated from the original German by Christine Rye.

Corresponding author
Prof. Dr. rer. medic. Tim Mathes

Institut für Medizinische Statistik, Universitätsmedizin Göttingen

Humboldtallee 32, 37073 Göttingen, Germany

tim.mathes@med.uni-goettingen.de

Cite this as
Mathes T, Röding D, Stegbauer C, Laxy M, Pieper D: Interrupted time series for assessing the causality of intervention effects. Part 35 of a series on evaluating scientific publications. Dtsch Arztebl Int 2024; 121: 800–4. DOI: 10.3238/arztebl.m2024.0150

1.
Grimshaw J, Campbell M, Eccles M, Steen N: Experimental and quasi-experimental designs for evaluating guideline implementation strategies. Fam Pract 2000; 17 Suppl 1: 11–6.
2.
Black N: Why we need observational studies to evaluate the effectiveness of health care. BMJ 1996; 312: 1215–8. CrossRef MEDLINE PubMed Central
3.
Kontopantelis E, Doran T, Springate DA, Buchan I, Reeves D: Regression based quasi-experimental approach when randomisation is not an option: interrupted time series analysis. BMJ 2015; 350: h2750. CrossRef MEDLINE PubMed Central
4.
Gianicolo EAL, Eichler M, Muensterer O, Strauch K, Blettner M: Methods for evaluating causality in observational studies. Dtsch Arztebl Int 2020; 117: 101–7.
5.
Fretheim A, Zhang F, Ross-Degnan D, et al.: A reanalysis of cluster randomized trials showed interrupted time-series studies were valuable in health system evaluation. J Clin Epidemiol 2015; 68: 324–33. CrossRef MEDLINE
6.
Mathes T, Pieper D: Study design classification of registry-based studies in systematic reviews. J Clin Epidemiol 2018; 93: 84–7.
7.
Turner SL, Karahalios A, Forbes AB, et al.: Design characteristics and statistical methods used in interrupted time series studies evaluating public health interventions: a review. J Clin Epidemiol 2020; 122: 1–11. CrossRef MEDLINE
8.
Hategeka C, Ruton H, Karamouzian M, Lynd LD, Law MR: Use of interrupted time series methods in the evaluation of health system quality improvement interventions: a methodological systematic review. BMJ Glob Health 2020; 5: e003567. CrossRef MEDLINE PubMed Central
9.
Korevaar E, Turner SL, Forbes AB, Karahalios A, Taljaard M, McKenzie JE: Evaluation of statistical methods used to meta-analyse results from interrupted time series studies: a simulation study. Res Synth Methods 2023; 14: 882–902. CrossRef MEDLINE PubMed Central
10.
Reeves BC, Deeks JJ, Higgins JP, et al.: Including non randomized studies on intervention effects. Updated August 2023. In: Cochrane handbook for systematic reviews of interventions [Internet], 595–620. www.training.cochrane.org/handbook (last accessed on 25 July 2024).
11.
Lipsitch M, Tchetgen Tchetgen E, Cohen T: Negative controls: a tool for detecting confounding and bias in observational studies. Epidemiology 2010; 21: 383–8. CrossRef MEDLINE PubMed Central
12.
Ma ZQ, Kuller LH, Fisher MA, Ostroff SM: Use of interrupted time-series method to evaluate the impact of cigarette excise tax increases in Pennsylvania, 2000–2009. Prev Chronic Dis 2013; 10: E169. CrossRef MEDLINE PubMed Central
Department for Medical Statistics, University Medical Center Goettingen, Goettingen, Germany: Prof. Dr. rer. medic. Tim Mathes
Hannover Medical School, Institute for Epidemiology, Social Medicine and Health System Research. Germany: Dr. rer. pol. Dominik Röding
aQua-Institute for Applied Quality Improvement and Research in Health Care, Goettingen, Germany: Constance Stegbauer, M.Sc.
Professorship of Public Health and Prevention, TUM School of Medicine and Health, Technical University of Munich, Germany; Department of Global Health, Rollins School of Public Health, Emory University: Prof. Dr. Michael Laxy
Faculty of Health Sciences Brandenburg, Brandenburg Medical School (Theodor Fontane), Institute for Health Services and Health System Research, Germany; Center for Health Services Research, Brandenburg Medical School (Theodor Fontane), Germany: Prof. Dr. rer. medic. Dawid Pieper

Checklist for interrupted time series studies
Box
Checklist for interrupted time series studies
Fictitious interrupted time series study
Figure 1
Fictitious interrupted time series study
Use-case example of an ITS
Figure 2
Use-case example of an ITS
1.Grimshaw J, Campbell M, Eccles M, Steen N: Experimental and quasi-experimental designs for evaluating guideline implementation strategies. Fam Pract 2000; 17 Suppl 1: 11–6.
2.Black N: Why we need observational studies to evaluate the effectiveness of health care. BMJ 1996; 312: 1215–8. CrossRef MEDLINE PubMed Central
3.Kontopantelis E, Doran T, Springate DA, Buchan I, Reeves D: Regression based quasi-experimental approach when randomisation is not an option: interrupted time series analysis. BMJ 2015; 350: h2750. CrossRef MEDLINE PubMed Central
4.Gianicolo EAL, Eichler M, Muensterer O, Strauch K, Blettner M: Methods for evaluating causality in observational studies. Dtsch Arztebl Int 2020; 117: 101–7.
5. Fretheim A, Zhang F, Ross-Degnan D, et al.: A reanalysis of cluster randomized trials showed interrupted time-series studies were valuable in health system evaluation. J Clin Epidemiol 2015; 68: 324–33. CrossRef MEDLINE
6.Mathes T, Pieper D: Study design classification of registry-based studies in systematic reviews. J Clin Epidemiol 2018; 93: 84–7.
7.Turner SL, Karahalios A, Forbes AB, et al.: Design characteristics and statistical methods used in interrupted time series studies evaluating public health interventions: a review. J Clin Epidemiol 2020; 122: 1–11. CrossRef MEDLINE
8.Hategeka C, Ruton H, Karamouzian M, Lynd LD, Law MR: Use of interrupted time series methods in the evaluation of health system quality improvement interventions: a methodological systematic review. BMJ Glob Health 2020; 5: e003567. CrossRef MEDLINE PubMed Central
9.Korevaar E, Turner SL, Forbes AB, Karahalios A, Taljaard M, McKenzie JE: Evaluation of statistical methods used to meta-analyse results from interrupted time series studies: a simulation study. Res Synth Methods 2023; 14: 882–902. CrossRef MEDLINE PubMed Central
10. Reeves BC, Deeks JJ, Higgins JP, et al.: Including non randomized studies on intervention effects. Updated August 2023. In: Cochrane handbook for systematic reviews of interventions [Internet], 595–620. www.training.cochrane.org/handbook (last accessed on 25 July 2024).
11.Lipsitch M, Tchetgen Tchetgen E, Cohen T: Negative controls: a tool for detecting confounding and bias in observational studies. Epidemiology 2010; 21: 383–8. CrossRef MEDLINE PubMed Central
12.Ma ZQ, Kuller LH, Fisher MA, Ostroff SM: Use of interrupted time-series method to evaluate the impact of cigarette excise tax increases in Pennsylvania, 2000–2009. Prev Chronic Dis 2013; 10: E169. CrossRef MEDLINE PubMed Central