Review article
Age Standardization of Epidemiological Frequency Measures
Part 37 of a Series on the Evaluation of Scientific Publications
;
Background: When raw incidence rates are compared across populations, age can be a major confounding factor if the populations differ in age structure. In this paper, we elucidate the principles of age standardization of rates.
Methods: Crude and age-standardized rates are derived, and methodological aspects explained, in the light of pertinent publications from a selective literature search and by means of a particular example: a comparison of the incidence of stomach cancer in Cali, Colombia, and in the German federal state of North Rhine–Westphalia (NRW) (2013–2017, men).
Results: The crude incidence rates were 21.5 per 100 000 person-years in Cali and 22.9 per 100 000 person-years in NRW, but the corresponding age-standardized incidence rates (old European standard) were 30.0 and 15.7 per 100 000 person-years, respectively. Because of the markedly different age structures of the two populations, the crude incidence misleadingly suggested a practically identical incidence of stomach cancer in Cali and in North Rhine–Westphalia. Age standardization revealed a markedly higher incidence in Cali.
Conclusion: The numerical value of a standardized rate is an artificial rate that can only be interpreted in the light of the standard used. Standardization only makes sense if rates are to be compared across populations, where a difference in a particular factor (e.g., age) might distort the comparison. Standardization is not needed to describe the epidemiological situation in a single population.
Cite this as: Stang A, Gianicolo E: Age standardization of epidemiological frequency measures: Part 37 of a series on the evaluation of scientific publications. Dtsch Arztebl Int 2025; 122: 387–92. DOI: 10.3238/arztebl.m2025.0072
According to the German Federal Statistical Office, a total of 41 026 519 men and 42 128 512 women were living in Germany in 2020. For purposes of epidemiological research, this number of individuals in Germany represents the total person-years (PYs) contributed by the population in 2020. In 2020, the estimated number of new cases of cancer (excluding non-melanoma skin cancer) in Germany was 261 850 for men and 231 400 for women (1).
This corresponds to a cancer incidence rate for the whole population of 638 per 100 000 person-years (100 000 × 261 850/41 026 519 PY) for the male population and 549 per 100 000 person years (100 000 × 231 400/42 128 512 PY) for the female population. These are referred to as crude rates and are important for assessing disease risk and for population health care planning.
When crude disease rates are compared across different populations with different age distributions, then it remains uncertain whether an observed difference in the rates is due to different age distribution, different age-specific rates, or a combination of these two factors. The same applies to the comparison of crude disease rates within a population which has changed over time with regard to its age distribution. Why is that the case? The incidence of the majority of cancer diseases is heavily age-dependent: the higher the age, the higher the incidence of cancer. For example, in 2020 the cancer incidence in 0 to 4-year-olds throughout Germany was 22.8/10 000 PY (males) and 21.0/100 000 PY (females). In contrast, it was 2712.4/100 000 PY (men) and 1666.3/100 000 PY (women) in those aged 85 years and older. This means that the cancer incidence rate of those aged 85 years and older is 119 times (men) and 79 times (women) the cancer incidence rate for 0 to 4-year-olds (2).
This steep age gradient in the rates of new cases of cancer illustrates that a difference in the age structure of populations being compared with regard to their incidence of cancer can very quickly result in different crude cancer incidence rates – even though the cancer incidence of the populations to be compared is identical for each age group. For example, if the proportion of older individuals in a population A is larger than that in population B, then a higher crude cancer incidence rate in proportion A will still be noted despite identical age-specific cancer incidence rates in both populations. This does not reflect different cancer risks in the population groups, however, but merely the different age structures.
So, when comparing different populations whose different age structures produce a steep age gradient in cancer incidence, the comparison of cancer incidence rates is only fair if the age-specific incidence rates within the same age groups are compared across the populations. Overall, 18 comparisons would have to be made for a total of 18 age groups (0–4, 5–9, …, 80–84, ≥ 85 years), but this would prove laborious in the long run.
This problem was already recognized in London as far back as the 18th century. Apparently, William Dale was the first to publish the principles of age standardization (observed cases versus expected cases) in 1772 (3, 4, 5). In the 19th century, age standardization became part of regular reporting of mortality data. The Registrar General of England and Wales had reported age-standardized mortality rates as early as 1883. Direct age standardization was conducted in 1883 in England and Wales using the age structure of 1881, the year of the consensus (6). Using age standardization, it was possible to compare mortality rates of various time periods or populations without age differences between the populations confounding the results. Historically speaking, age standardization is the oldest method to correct for confounding (7). Modern techniques to correct for confounding include, among other things, restriction, matching, stratified analysis, and regression adjustment.
Using the particularly illustrative example of a comparison of stomach cancer incidence in Cali, Colombia (2013–2017), referred to here as “Cali”, and North Rhine-Westphalia (2013–2017), referred to as “NRW”, the aim of the present article is to demonstrate the problems that arise when comparing crude incidence rates of stomach cancer and the effect that age standardization of the rates has.
Methods
The principles of direct and indirect age standardization of frequency measures (Box 1), in this case stomach cancer incidence rates, and their interpretations are elucidated in the light of pertinent publications from a selective literature search with the aid of a particular example. For this, we extracted the incidence details from the cancer registries Cali (2013–2017) and North Rhine-Westphalia (2013–2017) (8). Only incidence rates for men were used to keep the volume of figures to a minimum.
Results
Table 1 presents incidence rates of stomach cancer in men in Cali and NRW per study region and age group. The crude incidence rates were 21.5 per 100 000 PY (100 000 × 1216 cases/5664 701 PY) for Cali and 22.9 per 100 000 PY (100 000 × 9955/43 443 610 PY) for NRW. The difference in crude rates is 21.5 − 22.9 = −1.4 per 100 000 PY, and the ratio of the crude rates is 21.5/22.9 = 0.94. This finding is surprising since one would expect that countries of the global south would have a higher incidence of stomach cancer than countries of the global north, given their higher prevalence of risk factors for stomach cancer (prevalence of Helicobacter pylori and processing of cured meat [9]). An examination of the age-specific rates, however, shows that the age-specific incidence rates from the age of 20 onward are considerably higher in Cali than in NRW. This difference (Cali minus NRW) becomes greater with increasing age.
Table 1 also presents the natural (synonym: latent) weights (PYi) of the population. They are also expressed per 100 000 person-years (gi) to further illustrate this issue and for comparison with standard populations (eTable 1). Each crude rate (cR) may be understood as a weighted rate, in which the age-specific rates Ri have been weighted using the natural weights PYi (Box 2).
Natural weights result from the proportion of person-years within the respective age group (10). It is interesting to note that the natural weights gi per 100 000 PYi found in the younger age groups in Cali are considerably greater than in NRW. On the other hand, the natural weights per 100 000 PYi for the older population in Cali are considerably smaller than in NRW, reflecting the different demographic age structure of the population. For example, the proportion of 0 to 19-year-olds and of those over 60 in NRW is 19.9% and 24.2%, respectively, whereas these proportions are 33.2% and 10.5% in Cali.
Direct standardization
Using direct age standardization, the natural weights of the age-specific rates are replaced by new weights of a standard population. The standard population can be either internal or external. The internal standard population in the comparison between Cali and NRW could be, for example, the total number of person-years of the respective age strata (Cali + NRW) or the total number of person-years of one of the populations (Cali or NRW). The disadvantage of this approach is that rates standardized in this way cannot be compared with rates published elsewhere which were standardized with a different standard population. For an external standard population, an internationally known standard population is generally used, e.g. the European or Segi world standard (eTable 1). The product of the weight from the standard population and the age-specific rate is obtained per age group to calculate the age-standardized rate. The product terms are then added and divided by the sum of the weights of the standard population (Box 2).
If the Old European Standard is used as the standard population (11), then the age-standardized rates are 30.0 per 100 000 PY for Cali and 15.7 per 100 000 PY for NRW, and the difference of the standardized rates is 30.0 − 15.7 = 14.3 per 100 000 PY. If the World Standard is used instead of the Old European Standard, then the age-standardized rates would be 20.9 per 100 000 PY and 11.0 per 100 000 PY and the difference of the standardized rates would be 20.9 − 11.0 = 9.9 per 100 000 PY. The standardized rate ratio is 1.91 (Old European Standard) and 1.90 (World Standard). Depending on the chosen standard population, this then produces a 1.91-fold or 1.90-fold rate in Cali in comparison with NRW (Table 2).
Thus, after age standardization, it should be noted that the incidence rate of stomach cancer in Cali is considerably higher than in NRW, as one would expect from an epidemiological point of view. The conclusions drawn from Table 2 are as follows:
- the crude incidence rate of stomach cancer is slightly higher in NRW than in Cali
- however, from the age of 20, the age-specific incidence rate of stomach cancer in Cali is considerably higher than in NRW, and
- the age-standardized incidence of stomach cancer is considerably higher in Cali than in NRW.
Age-standardized rates are hypothetical rates. They are the rates which would have been seen if the age structure of the populations had corresponded to the age structure of standard populations (10). Only when the natural weights match the weights of the standard population does the age-standardized rate match the crude rate. For this reason, it is important that the chosen standard is stated in reports of standardized rates.
The numerical value of the age-standardized rate depends on the choice of standard population. It is noteworthy that, when the World Standard was used, the age-standardized rates were considerably lower than the age-standardized rates when the Old European Standard was employed. This is not surprising because the World Standard has particularly large weights for young individuals and particularly small weights for older individuals. So, when standardizing, the low gastric cancer incidence rates at younger ages have a greater weight than the high gastric cancer incidence rates at older ages.
It is important to note that the numerical value of the age-standardized rate is a hypothetical rate. Standardized rates merely serve to facilitate comparisons of rates (12, 13). For the international readership it is helpful to use an internationally accepted standard population that has also been used in other publications on the same subject. This is because only age-standardized incidence rates which have been standardized with the same age-standard allow a fair comparison. Thus, for example, the Old and New European Standards, the World Standard, and the 2000 U.S. Standard Population are very often used in international cancer epidemiology (eTable 1).
The examination of temporal trends using age-standardized rates is useful if the age-specific rates change in the same direction over time, i.e., increase or decrease, and to the same extent. If age-specific rates behave differently over time, for example, a decrease in incidence rates for younger ages and an increase in incidence rates for older ages, then, apart from the trend of age-standardized incidence rates, the age-specific trends should definitely be examined separately. The switch from one age standard to another can alter temporal trend patterns of age-specific rates (14).
Indirect standardization
Another method of age standardization is referred to as indirect standardization. This method follows the opposite way to direct age standardization: The number of expected cases which are used by applying the rates of the reference population to the natural weights of a population of interest is compared with the cases actually observed in the population of interest. This ratio is generally referred to as the standardized morbidity ratio (SMR).
In our example, the indirect standardization results in a standardized incidence ratio (SIR), where the actual number of new cases observed in Cali are contrasted with the number of expected new cases in Cali. For the latter, the number of cases is used which could have been observed if the age-specific incidence rates observed in NRW had also existed in Cali. An SIR of 1.0 shows that there is no difference in incidence between the regions, whereas values of more than one indicate that the incidence in Cali is higher and values below one show that the incidence in Cali is lower. The SIR calculation produces a value of 1216/635.25 = 1.91. That means that the incidence in Cali is 1.91 times higher than in NRW.
This SIR can be derived as follows: If the natural weights of Cali were used to standardize the incidence rate in NRW, then this would produce a direct age-standardized rate of 11.2 per 100 000 person-years. If one compares the crude rate from Cali with this direct age-standardized rate in NRW, then the resulting rate ratio is 1.91 (21.5/11.2) (Table 3).
If the incidence rate or the cumulative risk, the mortality or prevalence is used to measure morbidity, then the corresponding metric is appropriately referred to as the standardized incidence ratio (SIR), the standardized mortality ratio (SMR), or the standardized prevalence ratio (SPR), respectively. Formulae for calculating confidence intervals for crude, age-specific, and age-standardized rates, of SMR, and of ratios or differences between two age-standardized rates may be found in the eBoxes 1–4.
Which of the two standardization methods is preferred and when?
Direct age standardization presupposes that the age-specific rates in the population being studied are known. Since only the total number of cases and the age-specific person-years of the population of interest need to be known for indirect standardization, then indirect standardization is the only practicable method when no age-specific rates of the population of interest are available.
One advantage of the direct method over the indirect method is that the age-standardized rates may be ranked; the ratio of age-standardized rates may also be calculated. If a ratio is required, than the direct standardized rates may be compared with one another. With indirect age standardization, on the other hand, SIRs from different cohort studies cannot be readily compared with each other because the weights of the study populations are usually different.
Indirect age standardization is preferred when the case numbers in certain age groups are small (as is typical with small populations), because weighting of very imprecise age-specific rates results in an unnecessarily large standard error of the age-standardized rate. With indirect standardization, rates from a large population may be used, thus minimizing the standard error (sampling error). SMRs are commonly used in cohort studies in which, for example, only occupationally exposed persons are included and the expected number of sick persons in this exposed cohort is identified using registry data from the general population. Indirect standardization is not appropriate for time trend analyses because the age-specific rates in the reference population change over time. Both standardized methods may also be used for variables other than age. For example, standardizations may also be applied to gender, social class, and other factors (eTable 2).
Conclusions
Standardization of epidemiological frequency measures is a suitable approach to eliminate the confounding effect of a factor, such as age, when comparing frequency measures. A standardized rate is a hypothetical rate and can only be interpreted with knowledge of the standard used. Standardization only makes sense when a comparison of rates is to be made in which a factor could confound the comparison. Standardization is not required for describing the epidemiological situation in one’s own population.
Conflict of interest statement
The authors declare that there are no conflicts of interest.
Manuscript received on 30 January 2025, revised version accepted on 14 April 2025
Translated from the original German by Dr. Grahame Larkin
Contact address:
Prof. Dr. med. Andreas Stang
imibe.dir@uk-essen.de
North Rhine-Westphalia State Cancer Registry, Bochum: Prof. Dr. med. Andreas Stang, MPH
School of Public Health, Department of Epidemiology, Boston University, USA: Prof. Dr. med. Andreas Stang, MPH
Institute of Medical Biometrics, Epidemiology and Informatics, (IMBEI), Mainz: Dr. rer. physiol. Emilio Gianicolo
Institute of Clinical Physiology, National Research Council, Lecce, Italy: Dr. rer. physiol. Emilio Gianicolo
| 1. | Robert Koch-Institut & Gesellschaft der epidemiologischen Krebsregister in Deutschland: Krebs in Deutschland für 2019/2020. Berlin 2023. www.krebsdaten.de/Krebs/DE/Content/Publikationen/Krebs_in_Deutschland/krebs_in_deutschland_node.html (last accessed on 28 December 2024). |
| 2. | Robert Koch-Institut, Zentrum für Krebsregisterdaten, www.krebsdaten.de/Krebs/DE/Home/homepage_node.html. (last accessed on 28 December 2024). |
| 3. | Dale W: Calculations deduced from first principles, in the most familiar manner, by plain arithmetic, for the use of the societies instituted for the benefit of old age. London: J. Ridley, 1772. |
| 4. | Dale W: A supplement to calculations of the value of annuities, published for the use of societies instituted for benefit of age. London: J. Ridley, 1777. |
| 5. | Keiding N: The method of expected number of deaths, 1786–1886–1986. Int Stat Rev 1987; 55: 1–20. CrossRef |
| 6. | Curtin LR: Chapter 2: A short history of standardization for vital events. In: Feinleib M, Zarate AO (eds.): Reconsidering age adjustment procedures: Workshop proceedings. Hyattsville, Mayrland 1992; 11–6. |
| 7. | Hammer GP, du Prel JB, Blettner M: Avoiding bias in observational studies: Part 8 in a series of articles on evaluation of scientific publications. Dtsch Arztebl Int 2009; 106: 664–8. CrossRef MEDLINE PubMed Central VOLLTEXT |
| 8. | Bray F, Colombet M, Aitken JF, et al.: Cancer incidence in five continents volume XII. Lyon, France: International Agency for Research on Cancer; 2024. |
| 9. | De Martel C, Parsonnet J: Chapter 31 stomach cancer. In: Thun MJ, Linet MS, Cerhan JR, Haiman CA, Schottenfeld D (eds.): Schottenfeld and Fraumeni cancer epidemiology and prevention. New York: Oxford University Press 2018; 593–610. |
| 10. | Rothman KJ, Huybrechts KF, Murray EJ: Epidemiology. An introduction. New York: Oxford University Press; 2024. CrossRef |
| 11. | Doll R, Cook P: Summarizing indices for comparison of cancer incidence data. Int J Cancer 1967; 2: 269–79. CrossRef MEDLINE |
| 12. | Wolfenden HH: On the method of comparing mortalities of two or more communities, and the standardization of death rates. J R Stat Soc 1923; 86: 399–411. CrossRef |
| 13. | Kitagawa EM: Components of a difference between two rates. J Am Stat Assoc 1955; 50: 1168–94. CrossRef |
| 14. | Choi BC, de Guia NA, Walsh P: Look before you leap: Stratify before you standardize. Am J Epidemiol 1999; 149: 1087–96. CrossRef MEDLINE |
| 15. | Segi M: Cancer mortality for selected sites in 24 countries (1950–57). Sendai, Japan: Tohoku University of Medicine; 1960. |
| 16. | Boyle P, Parkin DM: Statistical methods for registries. In: Jensen OM, Parkin DM, MacLennan R (eds.): Cancer registration: principles and methods. Lyon 1991; 136–9. |
| 17. | Eurostat: https://ec.europa.eu/eurostat/de/web/products-manuals-andguidelines/-/KS-RA-13-028 (last accessed on 28 December 2024). |
| 18. | National Cancer Institute USA: https://seer.cancer.gov/stdpopulations/stdpop.19ages.html (last accessed on 29 December 2024). |
