DÄ internationalArchive27-28/2023The Quality and Utility of Artificial Intelligence in Patient Care

Review article

The Quality and Utility of Artificial Intelligence in Patient Care

Dtsch Arztebl Int 2023; 120: 463-9. DOI: 10.3238/arztebl.m2023.0124

Wehkamp, K; Krawczak, M; Schreiber, S

Background: Artificial intelligence (AI) is increasingly being used in patient care. In the future, physicians will need to understand not only the basic functioning of AI applications, but also their quality, utility, and risks.

Methods: This article is based on a selective review of the literature on the principles, quality, limitations, and benefits AI applications in patient care, along with examples of individual applications.

Results: The number of AI applications in patient care is rising, with more than 500 approvals in the United States to date. Their quality and utility are based on a number of interdependent factors, including the real-life setting, the type and amount of data collected, the choice of variables used by the application, the algorithms used, and the goal and implementation of each application. Bias (which may be hidden) and errors can arise at all these levels. Any evaluation of the quality and utility of an AI application must, therefore, be conducted according to the scientific principles of evidence-based medicine—a requirement that is often hampered by a lack of transparency.

Conclusion: AI has the potential to improve patient care while meeting the challenge of dealing with an ever-increasing surfeit of information and data in medicine with limited human resources. The limitations and risks of AI applications require critical and responsible consideration. This can best be achieved through a combination of scientific

LNSLNS

Human intelligence is one of the most remarkable results of evolution. Of crucial importance for the intellectual performance of our brain is its ability to build models that are able to provide a detailed representation of complex reality with the goal of making predictions in order to successfully interact with our environment (1). Artificial intelligence (AI), in contrast, is a collective term for processes that enable computers to fulfill tasks that normally require human intelligence. As such, even the algorithms in a simple chess computer constitute AI. One form of AI is what is known as machine learning (ML), in which patterns are derived from data in order to either better interpret underlying data or make certain predictions based on these data.

Significant advances have been made in the field of ML in the last 10 years, not least through the development of multilayer (deep) artificial neural networks (deep neural networks, DNN) (2). However, it will likely take years or decades (if ever) before ML or AI is able to fully match the broad spectrum of human intelligence (3). Nevertheless, AI in the form of ML is already achieving results that exceed human performance in some areas of medicine. However, the future development of methods of this kind must be accompanied by critical and independent expertise in order that medicine can continue to do justice to the maxim of providing the best-possible patient care. The medical profession carries special responsibility here.

This article provides an overview of the important aspects of assessing the quality, utility, and limitations of AI applications in patient care, not least to contribute to the responsible use of this technology.

Methods

Based on a selective literature search in PubMed, the article presents selected aspects in the evaluation of the quality and benefits of (in particular ML-based) AI applications in patient care. This presentation of the status quo is expanded on with examples of current applications taken from the relevant specialist media and scientific studies.

Results

Data as the basis of machine learning

Machine learning (ML) is based on data that provide an example representation of a particular learning world. Patterns or abstract rules need to be recognized in the learning data and then applied to new data in order to recognize characteristics, make predictions, or generate statements. Thus, conceptually, ML exhibits considerable similarity to human learning from examples and the recognition of similarities and differences.

The more unstructured the data used and the greater the need to combine different data modalities, the greater the challenges posed to an AI application (4). For example, there are AI-based techniques that are able to detect breast cancer on mammograms with a sensitivity and specificity comparable to that of an averagely experienced radiologist (but hitherto not that of an expert) (5). However, an AI-based approach to gaining knowledge from a variety of unstructured data types, such as DNA sequences, histopathological images, and laboratory findings, still fails in practice (6). Moreover, the use of extensive medical datasets always harbors the risk of violating individual personal rights, which could lead to restrictions in line with data protection laws (7). At present, the limited quality and availability of complex and heterogeneous data are among the challenges in broad areas that cannot be satisfactorily solved for the implementation of medical AI applications.

Concepts of machine learning in medicine

ML approaches can be divided into primarily three groups (Figure 1):

Principal concepts of machine learning
Figure 1
Principal concepts of machine learning
  • Unsupervised learning attempts, without concrete specifications, to identify associations, structures, or anomalies in data. This approach is used, for example, to identify subgroups in multiomics datasets (8). In patient care, methods of unsupervised learning are still at an experimental stage, but their use is conceivable in the future, for example, in syndromic surveillance—potentially as part of outbreak monitoring for infectious diseases (9).
  • Reinforcement learning trains in terms of rewards given for a particular outcome. In medicine, this approach has also only been investigated in studies so far, but may be suitable in the future, for example, to adapt insulin administration to the individual patient in a closed-loop approach (10).
  • Approaches in supervised learned are often aimed at classifying data or predicting future events. The respective algorithms are trained with training data in which the learning objective is specified (for example, X-rays with marked masses and images for comparison that contain no masses). The patterns recognized are then validated in terms of their quality on test datasets. The majority of AI applications that already have marketing authorization are based on supervised learning from uniform, unimodal data (for example, solely analyzing images of possible skin lesions to identify malignant lesions) (11, 12).

Risks and limitations of AI applications

In order to be able to assess the risks and limitations in the validity of ML applications, it is important to be aware of the ML lifecycle that always underlies them and which is based on multiple, strongly interdependent stages (Figure 2). The first stage focuses on the real-life conditions which are mapped as representatively as possible in the form of digital data. The related variables need to be selected and prepared (referred to as feature selection and engineering, which are partially dispensed with in DNNs), in order to then be analyzed by the ML algorithm. The results are utilized by users (namely, physicians) and in turn have an effect on the real world (namely, the treatment of patients) (13).

Selected limitations and risks in terms of the quality of artificial intelligence (AI) applications at stages of the learning and application lifecycle of machine learning (ML)
Figure 2
Selected limitations and risks in terms of the quality of artificial intelligence (AI) applications at stages of the learning and application lifecycle of machine learning (ML)

At each stage of the ML lifecycle, a multitude of partially redundant influencing factors are at work that can significantly distort the results of an AI application and limit its validity. These limitations are mainly responsible for the fact that the practical application of AI in patient care still falls short of expectations and hopes in many areas. Therefore, a critical reflection on the individual stages of the ML lifecycle is essential for a realistic evaluation of the potentials and qualities of ML applications (14, 15).

Real-life world

People live in real-life worlds. These are, as a rule, characterized by socioeconomic, biological, and other inhomogeneities that may be associated with a threat or disadvantage to certain individuals or population groups. When collecting the data that underlie an AI application, potential biases of this kind must be taken into account and, where necessary, compensated for (14, 16, 17).

Digital data

In principle, data are only able to represent the real-life world incompletely and in partial aspects. In order to nevertheless achieve an adequately good picture of the real-life world, the data collection itself must be as objective, precise, and accurate as possible. Also, when selecting a data source, it is important to ensure an appropriate level of representativeness (18). However, a lot of medical information, in particular individual-specific information, can only be recorded in text form using the complexity of natural language, that is to say, in the form of unstructured data that need to be preprocessed by means of language recognition (natural language processing) (19). And finally, information that cannot be digitally documented cannot generally be made usable for AI applications (for example, making an assessment of a patient’s overall clinical picture based on experience and intuition) (15, 20).

Selection and preparation of variables

In order to obtain the most valid models possible of the real world from AI applications, the variables included therein need to be suitably selected and prepared (for example, restricted to X-ray findings and specific clinical parameters in oncology diagnostics). This selection, as well as the subsequent standardization and normalization of data, can reduce their representativeness and limit the validity of the results of an AI application (15).

Algorithm design

The design of an ML algorithm includes programming the software code and integrating the previously selected variables. Likewise at this stage, errors and biases may occur, for example as a result of inadequate consideration of special features of the data to be used, unclear definitions of targets for pattern recognition, or embedding unsuitable cut-off values (13, 21). To ensure sufficient acceptance and critical reflection on design by the user, the algorithm should also be able to provide explanations of the obtained results in each case (referred to as explainability) (22, 23).

Application in the real world

At the stage of practical application, namely, patient care, errors and biases at all the aforementioned stages of the ML lifecyle can have a negative effect. Particular risks arise as a result of unconsidered differences between the learning world and the application world and through a lack of orientation of AI applications to their subsequent practical deployment (24). Imprecise results, technical hurdles, insufficient content transparency, and mistrust quickly result in failure to fully exploit the potential of AI applications in patient care, for example, when programs for the AI-based analysis of histopathological findings cannot be integrated into existing workflows or they fail to save time (18, 25, 26). On the other hand, the noncritical, overly confident use of AI applications can lead to, for example, important differential diagnostic considerations being ignored in practice. In principle, the unreflected pursuit of AI-based treatment concepts carries the risk that medicine will be robbed of important human factors through over-technification. For example, the quasi-objective calculation of outcome probabilities poses a major challenge in terms of the differentiated communication between physicians and patients (27, 28). Ethical dilemmas can also escalate if the results of AI applications are used, without reflection, as the basis for decisions regarding allocation and prioritization (29).

Quality and utility of clinical AI applications

Evidence base for the assessment of ML applications

A scientific basis is one of the fundamental quality requirements placed on modern medicine. Accordingly, it should also be possible to transparently assess the objectivity (freedom from uncontrolled influencing factors), reliability, and validity of AI-based medical applications. In order to represent the quality of AI algorithms for decision-making, the statistical test variables sensitivity, specificity, and precision (positive predictive value) are mostly used. These should be complemented by a critical assessment of bias and risk in the respective ML lifecycle (Figure 2). Furthermore, an evidence-based evaluation of utility includes an investigation of the method in a real-world setting and in comparison to alternative procedures, similar to the approach used in a clinical trial (for example, a prospective intervention study that compares an AI-based with a classic diagnostic procedure) (30).

Depending on the application, appropriate patient-specific endpoints such as quality of life, survival, disease progression, and symptom reduction should be assessed in addition to accuracy. Ideally, the improvement of these kinds of patient-specific endpoints should be borne in mind as early on as at the training stage of an AI application. Only in this way can the application have the prospect of becoming better than an evaluation of diagnosis and treatment data undertaken by humans (24). However, to date, only a handful of prospective studies have examined AI applications compared to the status quo of medical care or have already been able to demonstrate a utility in this regard (31, 32). A comprehensive evaluation of the additional benefit of an AI application includes not only an assessment of potential risks to patient safety but also that of cost-effectiveness (including potential savings in terms of time and resources) and ethical and sociocultural consequences (33, 34, 35, 36).

Practical implementation of ML-based applications in patient care

The US Food and Drug Administration (FDA) currently lists 521 authorized medical AI applications (37). Official data are lacking in Germany, but one can assume a few dozen authorizations to date.

Judging by the number of AI-related publications, the real relevance of applications approved in Germany is comparatively modest. The reasons for this include, in particular, the aforementioned limitations with regard to the data basis, the limited transferability of applications from the learning world to the application world, and the challenges faced in terms of their practicable and economically beneficial integration in existing care processes (25). As a rule, AI applications for patient care are medical devices that can only be marketed or used after a conformity assessment has been carried out for the respective risk class. Technically, their approval usually relates only to decision support and requires that responsibility remain with the physicians using them. Thus, the unconsidered use of these methods (for example, in the sense of an automation bias) may involve risks (18). Moreover, it has not been mandatory to date to publish utility-oriented application studies for medical device approval. Rather, the basis for approval is often limited to functionality testing without any published scientific research, or the associated studies took place in artificial settings. Thus, the quality and utility of many of the ML-based applications already in use in Germany are presented in an accordingly non-transparent manner.

The Table presents a synopsis of some of the systems approved (or in the process of being approved) in Germany, together with examples of the available scientific evidence in each case. The vast majority of applications are based on unimodal, uniform data. Overall, the publication basis with regard to approved AI applications provides a disparate and at times non-transparent picture. For some techniques, one can infer utility from published application studies, for example, ML-facilitated colonoscopic detection of colorectal polyps or photo-based detection of malignant skin lesions. These applications were shown to perform at a level comparable to standard techniques in both cases (12, 38). For other authorized applications, either no data on clinical utility or only selected statistical parameters (for example, sensitivity, specificity) have been published. For other applications, utility is not demonstrated by an additional clinical benefit, but rather by more efficient processes or a lowering of treatment barriers. Examples of this can be taken from areas in which specific specialist knowledge is lacking on site (for example, identifying rare electrocardiographic findings) or in which high throughput is required (for example, mammography screening). Lastly, cost savings and the facilitation of access to certain treatments in underserved regions can significantly contribute to the practical utility of AI applications, such as in the diagnosis of diabetic retinopathy and malignant melanoma (e1, e2, e3, e4, e5, e6, e7, e8, e9, e10, e11).

Examples of AI-based applications in patient care that have been approved or are in the approval process
Table
Examples of AI-based applications in patient care that have been approved or are in the approval process

Summary

A rapidly growing knowledge and information base as well as the diagnostic and therapeutic opportunities resulting therefrom pose the challenge to medicine of either compressing this information in such a way that it remains manageable or using it in its entirety and in the best possible way for the good of patients and society.

Physicians today are forced to expend an ever greater amount of effort to stay abreast of the current state of science and technology and, at the same time, cope with the economic boundary conditions and meet the demands of humane medicine. In so doing, they sometimes reach the limits of their capacities. Machine learning, as currently the most powerful development in artificial intelligence, mimics human learning and, depending on the quality of data and the available computing power, is able to provide ever better medical predictions and classifications.

The current evidence shows that preventive, diagnostic, and therapeutic patient care can increasingly benefit from AI support. However, any technology that has indirect effects on medical practice, and thus potentially on people’s lives, diseases, and deaths, must be especially carefully scrutinized in terms of utility and risks. In the learning and application lifecycle of ML, risks can arise at various stages as a result of possible biases, negative reinforcements, and errors. Therefore, AI applications represent a potential risk to patients and, as such, still need to be subjected to the critical scrutiny of human judgment.

A broad array of skills is required for the quality assessment of AI applications, ranging from the original medical expertise, the design of care processes, and data science to computer science, ethics, and law. Precisely since not all of these facets belong to the narrower domain of medicine, the medical profession must gain a comprehensive understanding of AI in order to be able to fulfill its social responsibility by implementing AI in patient care in a critically considered manner (for example, via the course: www.ki-campus.org/courses/drmedki_basics_cme) (eBox) (39, 40).

Explanations of selected terms and constructs
eBox
Explanations of selected terms and constructs

Due to the complexity of AI applications and their tendency to be opaque, there is also a need for regulatory safeguards that place a strong emphasis on the practical benefits, the application-related and sometimes considerable risks, and on ensuring a high level of content transparency. Used responsibly, AI can promote evidence-based and cost-efficient patient care in the future, while at the same time supporting the human essence (and human intelligence) of medicine.

Conflict of interest statement
The authors declare that no conflict of interest exists.

Manuscript received on 30 November 2022, revised version accepted on 8 May 2023.

Translated from the original German by Christine Rye.

Corresponding author
Prof. Dr. med. Kai Wehkamp, MPH
Klinik für Innere Medizin I
Universitätsklinikum Schleswig-Holstein
Arnold-Heller-Straße 3 (Haus 6), 24105 Kiel, Germany
Kai.Wehkamp@uksh.de

Cite this as:
Wehkamp K, Krawczak M, Schreiber S: The quality and utility of artificial intelligence in patient care. Dtsch Arztebl Int 2023; 120: 463–9. DOI: 10.3238/arztebl.m2023.0124

Supplementary material

eReferences, eBox:
www.aerzteblatt-international.de/m2023.0124

1.
Hawkins J, Lewis M, Klukas M, Purdy S, Ahmad S: A framework for intelligence and cortical function based on grid cells in the neocortex. Front Neural Circuits 2019; 12: 121 CrossRef MEDLINE PubMed Central
2.
Topol EJ: High-performance medicine: the convergence of human and artificial intelligence. Nat Med 2019; 25: 44–56 CrossRef MEDLINE
3.
Katritsis DG: Artificial intelligence, superintelligence and intelligence. Arrhythm Electrophysiol Rev 2021; 10: 223–4 CrossRef MEDLINE PubMed Central
4.
Zhang D, Yin C, Zeng J, Yuan X, Zhang P: Combining structured and unstructured data for predictive models: a deep learning approach. BMC Med Inform Decis Mak 2020; 20: 280 CrossRef MEDLINE PubMed Central
5.
Rodriguez-Ruiz A, Lång K, Gubern-Merida A, et al.: Stand-alone artificial intelligence for breast cancer detection in mammography: comparison with 101 radiologists. J Natl Cancer Inst 2019; 111: 916–22 CrossRef MEDLINE PubMed Central
6.
Acosta JN, Falcone GJ, Rajpurkar P, Topol EJ: Multimodal biomedical AI. Nat Med 2022; 28: 1773–84 CrossRef MEDLINE
7.
Vidalis T: Artificial intelligence in biomedicine: a legal insight. BioTech (Basel) 2021; 10: 15 CrossRef MEDLINE PubMed Central
8.
Eicher T, Kinnebrew G, Patt A, et al.: Metabolomics and multi-omics integration: a survey of computational methods and resources. Metabolites 2020; 10: 202 CrossRef MEDLINE PubMed Central
9.
Wen A, Wang L, He H, et al.: An aberration detection-based approach for sentinel syndromic surveillance of COVID-19 and other novel influenza-like illnesses. J Biomed Inform 2021; 113: 103660 CrossRef MEDLINE PubMed Central
10.
Tejedor M, Woldaregay AZ, Godtliebsen F: Reinforcement learning application in diabetes blood glucose control: a systematic review. Artif Intell Med 2020; 104: 101836 CrossRef MEDLINE
11.
Rajpurkar P, Chen E, Banerjee O, Topol EJ: AI in health and medicine. Nat Med 2022; 28: 31–8 CrossRef MEDLINE
12.
Haenssle HA, Fink C, Toberer F, et al.: Man against machine reloaded: performance of a market-approved convolutional neural network in classifying a broad spectrum of skin lesions in comparison with 96 dermatologists working under less artificial conditions. Ann Oncol 2020; 31: 137–43 CrossRef MEDLINE
13.
Kocak B, Kus EA, Kilickesmez O: How to read and review papers on machine learning and artificial intelligence in radiology: a survival guide to key methodological concepts. Eur Radiol 2021; 31: 1819–30 CrossRef MEDLINE
14.
Leslie D, Mazumder A, Peppin A, Wolters MK, Hagerty A: Does “AI“ stand for augmenting inequality in the era of covid-19 healthcare? BMJ 2021; 372: n304 CrossRef MEDLINE PubMed Central
15.
Suresh H, Guttag J: A framework for understanding sources of harm throughout the machine learning life cycle. In: ACM International Conference Proceeding Series 2021. www.doi.org/10.1145/3465416.3483305 (last accessed on 16 March 2022) CrossRef
16.
Celi LA, Cellini J, Charpignon ML, et al.: Sources of bias in artificial intelligence that perpetuate healthcare disparities—a global review. PLOS Digit Health 2022; 1: e0000022 CrossRef MEDLINE PubMed Central
17.
Pierson E, Cutler DM, Leskovec J, Mullainathan S, Obermeyer Z: An algorithmic approach to reducing unexplained pain disparities in underserved populations. Nat Med 2021; 27: 136–140 CrossRef MEDLINE
18.
Challen R, Denny J, Pitt M, Gompels L, Edwards T, Tsaneva-Atanasova K: Artificial intelligence, bias and clinical safety. BMJ Qual Saf 2019; 28: 231–7 CrossRef MEDLINE PubMed Central
19.
Goh KH, Wang L, Yeow AYK, et al.: Artificial intelligence in sepsis early prediction and diagnosis using unstructured data in healthcare. Nat Commun 2021; 12: 711 CrossRef MEDLINE PubMed Central
20.
van der Niet AG, Bleakley A: Where medical education meets artificial intelligence: ‘Does technology care?’ Med Educ 2021; 55: 30–6.
21.
Barboi C, Tzavelis A, Muhammad LN: Comparison of severity of illness scores and artificial intelligence models that are predictive of intensive care unit mortality: meta-analysis and review of the literature. JMIR Med Inform 2022; 10: e35293 CrossRef MEDLINE PubMed Central
22.
Loftus TJ, Tighe PJ, Ozrazgat-Baslanti T, et al.: Ideal algorithms in healthcare: explainable, dynamic, precise, autonomous, fair, and reproducible. PLOS Digit Health 2022; 1: e0000006 CrossRef MEDLINE PubMed Central
23.
Amann J, Vetter D, Blomberg SN, et al.: To explain or not to explain?—Artificial intelligence explainability in clinical decision support systems. PLOS Digital Health 2022; 1: e0000016 CrossRef MEDLINE PubMed Central
24.
Obermeyer Z, Topol EJ: Artificial intelligence, bias, and patients’ perspectives. Lancet 2021; 397(10289): 2038 CrossRef MEDLINE
25.
Cabitza F, Campagner A, Balsano C: Bridging the “last mile” gap between AI implementation and operation: “data awareness” that matters. Ann Transl Med 2020; 8: 501 CrossRef MEDLINE PubMed Central
26.
Gaube S, Suresh H, Raue M, et al.: Do as AI say: susceptibility in deployment of clinical decision-aids. NPJ Digit Med 2021; 4: 31 CrossRef MEDLINE PubMed Central
27.
Nagy M, Sisk B: How will artificial intelligence affect patient-clinician relationships? AMA J Ethics 2020; 22: E395–400 CrossRef MEDLINE
28.
Lu SC, Xu C, Nguyen CH, Geng Y, Pfob A, Sidey-Gibbons C: Machine learning-based short-term mortality prediction models for patients with cancer using electronic health record data: systematic review and critical appraisal. JMIR Med Inform 2022; 10: e33182 CrossRef MEDLINE PubMed Central
29.
Wingfield LR, Ceresa C, Thorogood S, Fleuriot J, Knight S: Using artificial intelligence for predicting survival of individual grafts in liver transplantation: a systematic review. Liver Transpl 2020; 26: 922–34 CrossRef MEDLINE
30.
Caliebe A, Leverkus F, Antes G, Krawczak M: Does big data require a methodological change in medical research? BMC Med Res Methodol 2019; 19: 125 CrossRef MEDLINE PubMed Central
31.
Nagendran M, Chen Y, Lovejoy CA, et al.: Artificial intelligence versus clinicians: systematic review of design, reporting standards, and claims of deep learning studies. BMJ 2020; 368: m689 CrossRef MEDLINE PubMed Central
32.
Zhou Q, Chen ZH, Cao YH, Peng S: Clinical impact and quality of randomized controlled trials involving interventions evaluating artificial intelligence prediction tools: a systematic review. NPJ Digit Med 2021; 4: 154 CrossRef MEDLINE PubMed Central
33.
Ryan M: In AI we trust: ethics, artificial intelligence, and reliability. Sci Eng Ethics 2020; 26: 2749–67 CrossRef MEDLINE PubMed Central
34.
Collins GS, Dhiman P, Andaur Navarro CL, et al.: Protocol for development of a reporting guideline (TRIPOD-AI) and risk of bias tool (PROBAST-AI) for diagnostic and prognostic prediction model studies based on artificial intelligence. BMJ Open 2021; 11: e048008 CrossRef MEDLINE PubMed Central
35.
Wiens J, Saria S, Sendak M, et al.: Do no harm: a roadmap for responsible machine learning for health care. Nat Med 2019; 25: 1337–40 CrossRef CrossRef
36.
He J, Baxter SL, Xu J, Xu J, Zhou X, Zhang K: The practical implementation of artificial intelligence technologies in medicine. Nat Med 2019; 25: 30–6 CrossRef MEDLINE PubMed Central
37.
FDA US: Artificial Intelligence and Machine Learning (AI/ML)-Enabled Medical Devices. www.fda.gov/medical-devices/software-medical-device-samd/artificial-intelligence-and-machine-learning-aiml-enabled-medical-devices (last accessed on 5 May 2023).
38.
Repici A, Spadaccini M, Antonelli G, et al.: Artificial intelligence and colonoscopy experience: lessons from two randomised trials. Gut 2022; 71: 757–65 CrossRef MEDLINE
39.
Keane PA, Topol EJ: AI-facilitated health care requires education of clinicians. Lancet 2021; 397 (10281): 1254 CrossRef MEDLINE
40.
Young AT, Amara D, Bhattacharya A, Wei ML: Patient and general public attitudes towards clinical artificial intelligence: a mixed methods systematic review. Lancet Digital Health 2021; 3: e599–e611 CrossRef MEDLINE
e1.
Haenssle HA, Fink C, Toberer F, et al.: Man against machine reloaded: performance of a market-approved convolutional neural network in classifying a broad spectrum of skin lesions in comparison with 96 dermatologists working under less artificial conditions. Ann Oncol 2020; 31: 137–43 CrossRef MEDLINE
e2.
Blaha J, Barteczko-Grajek B, Berezowicz P, et al.: Space GlucoseControl system for blood glucose control in intensive care patients—a European multicentre observational study. BMC Anesthesiol 2016; 16: 8 CrossRef MEDLINE PubMed Central
e3.
Repici A, Badalamenti M, Maselli R, et al.: Efficacy of real-time computer-aided detection of colorectal neoplasia in a randomized trial. Gastroenterology 2020; 159: 512–20.e7 CrossRef MEDLINE
e4.
Romero-Martín S, Elías-Cabot E, Raya-Povedano JL, Gubern-Mérida A, Rodríguez-Ruiz A, Álvarez-Benito M: Stand-alone use of artificial intelligence for digital mammography and digital breast tomosynthesis screening: a retrospective evaluation. Radiology 2022; 302: 535–42 CrossRef MEDLINE
e5.
Meyer A, Zverinski D, Pfahringer B, et al.: Machine learning for real-time prediction of complications in critical care: a retrospective study. Lancet Respir Med 2018; 6: 905–14 CrossRef MEDLINE
e6.
Braun T, Spiliopoulos S, Veltman C, et al.: Detection of myocardial ischemia due to clinically asymptomatic coronary artery stenosis at rest using supervised artificial intelligence-enabled vectorcardiography—a five-fold cross validation of accuracy. J Electrocardiol 2020; 59: 100–5 CrossRef MEDLINE
e7.
Ipp E, Liljenquist D, Bode B, et al.: Pivotal evaluation of an artificial intelligence system for autonomous detection of referrable and vision-threatening diabetic retinopathy. JAMA Netw Open 2021; 4: e2134254 CrossRef MEDLINE PubMed Central
e8.
Franzke AW, Kristoffersen MB, Bongers RM, et al.: Users’ and therapists’ perceptions of myoelectric multi-function upper limb prostheses with conventional and pattern recognition control. PLoS One 2019; 14: e0220899 CrossRef MEDLINE PubMed Central
e9.
Daifalla K, Günther S: Eigener Report 2022: Mindpeak Breast HER2 RoI Clinical Performance Evaluation Summary. https://rb.gy/h25wv (last accessed on 20 November 2022).
e10.
Sun H, Depraetere K, Meesseman L, et al.: Machine learning-based prediction models for different clinical risks in different hospitals: evaluation of live performance. J Med Internet Res 2022; 24: e34295 CrossRef MEDLINE PubMed Central
e11.
Chamberlin J, Kocher MR, Waltz J, et al.: Automated detection of lung nodules and coronary artery calcium using artificial intelligence on low-dose CT scans for lung cancer screening: accuracy and prognostic value. BMC Med 2021; 19: 55 CrossRef MEDLINE PubMed Central
Department of Internal Medicine I, University Medical Center Schleswig-Holstein, Campus Lübeck, Kiel, Germany: Prof. Dr. med. Kai Wehkamp, Prof. Dr. med. Dr. h.c. Stefan Schreiber
Department for Medical Management, MSH Medical School Hamburg, Hamburg, Germany: Prof. Dr. med. Kai Wehkamp
Institute of Medical Informatics and Statistics, Christian-Albrechts-University of Kiel, University Medical Center Schleswig-Holstein Campus Kiel, Germany: Prof. Dr. med. Michael Krawczak
Institute of Clinical Molecular Biology, Christian-Albrechts-University of Kiel, University Medical Center Schleswig-Holstein Campus Kiel, Germany: Prof. Dr. med. Dr. h.c. Stefan Schreiber
Principal concepts of machine learning
Figure 1
Principal concepts of machine learning
Selected limitations and risks in terms of the quality of artificial intelligence (AI) applications at stages of the learning and application lifecycle of machine learning (ML)
Figure 2
Selected limitations and risks in terms of the quality of artificial intelligence (AI) applications at stages of the learning and application lifecycle of machine learning (ML)
Examples of AI-based applications in patient care that have been approved or are in the approval process
Table
Examples of AI-based applications in patient care that have been approved or are in the approval process
Explanations of selected terms and constructs
eBox
Explanations of selected terms and constructs
1.Hawkins J, Lewis M, Klukas M, Purdy S, Ahmad S: A framework for intelligence and cortical function based on grid cells in the neocortex. Front Neural Circuits 2019; 12: 121 CrossRef MEDLINE PubMed Central
2.Topol EJ: High-performance medicine: the convergence of human and artificial intelligence. Nat Med 2019; 25: 44–56 CrossRef MEDLINE
3.Katritsis DG: Artificial intelligence, superintelligence and intelligence. Arrhythm Electrophysiol Rev 2021; 10: 223–4 CrossRef MEDLINE PubMed Central
4.Zhang D, Yin C, Zeng J, Yuan X, Zhang P: Combining structured and unstructured data for predictive models: a deep learning approach. BMC Med Inform Decis Mak 2020; 20: 280 CrossRef MEDLINE PubMed Central
5.Rodriguez-Ruiz A, Lång K, Gubern-Merida A, et al.: Stand-alone artificial intelligence for breast cancer detection in mammography: comparison with 101 radiologists. J Natl Cancer Inst 2019; 111: 916–22 CrossRef MEDLINE PubMed Central
6.Acosta JN, Falcone GJ, Rajpurkar P, Topol EJ: Multimodal biomedical AI. Nat Med 2022; 28: 1773–84 CrossRef MEDLINE
7.Vidalis T: Artificial intelligence in biomedicine: a legal insight. BioTech (Basel) 2021; 10: 15 CrossRef MEDLINE PubMed Central
8.Eicher T, Kinnebrew G, Patt A, et al.: Metabolomics and multi-omics integration: a survey of computational methods and resources. Metabolites 2020; 10: 202 CrossRef MEDLINE PubMed Central
9.Wen A, Wang L, He H, et al.: An aberration detection-based approach for sentinel syndromic surveillance of COVID-19 and other novel influenza-like illnesses. J Biomed Inform 2021; 113: 103660 CrossRef MEDLINE PubMed Central
10.Tejedor M, Woldaregay AZ, Godtliebsen F: Reinforcement learning application in diabetes blood glucose control: a systematic review. Artif Intell Med 2020; 104: 101836 CrossRef MEDLINE
11.Rajpurkar P, Chen E, Banerjee O, Topol EJ: AI in health and medicine. Nat Med 2022; 28: 31–8 CrossRef MEDLINE
12.Haenssle HA, Fink C, Toberer F, et al.: Man against machine reloaded: performance of a market-approved convolutional neural network in classifying a broad spectrum of skin lesions in comparison with 96 dermatologists working under less artificial conditions. Ann Oncol 2020; 31: 137–43 CrossRef MEDLINE
13.Kocak B, Kus EA, Kilickesmez O: How to read and review papers on machine learning and artificial intelligence in radiology: a survival guide to key methodological concepts. Eur Radiol 2021; 31: 1819–30 CrossRef MEDLINE
14.Leslie D, Mazumder A, Peppin A, Wolters MK, Hagerty A: Does “AI“ stand for augmenting inequality in the era of covid-19 healthcare? BMJ 2021; 372: n304 CrossRef MEDLINE PubMed Central
15.Suresh H, Guttag J: A framework for understanding sources of harm throughout the machine learning life cycle. In: ACM International Conference Proceeding Series 2021. www.doi.org/10.1145/3465416.3483305 (last accessed on 16 March 2022) CrossRef
16.Celi LA, Cellini J, Charpignon ML, et al.: Sources of bias in artificial intelligence that perpetuate healthcare disparities—a global review. PLOS Digit Health 2022; 1: e0000022 CrossRef MEDLINE PubMed Central
17.Pierson E, Cutler DM, Leskovec J, Mullainathan S, Obermeyer Z: An algorithmic approach to reducing unexplained pain disparities in underserved populations. Nat Med 2021; 27: 136–140 CrossRef MEDLINE
18.Challen R, Denny J, Pitt M, Gompels L, Edwards T, Tsaneva-Atanasova K: Artificial intelligence, bias and clinical safety. BMJ Qual Saf 2019; 28: 231–7 CrossRef MEDLINE PubMed Central
19.Goh KH, Wang L, Yeow AYK, et al.: Artificial intelligence in sepsis early prediction and diagnosis using unstructured data in healthcare. Nat Commun 2021; 12: 711 CrossRef MEDLINE PubMed Central
20.van der Niet AG, Bleakley A: Where medical education meets artificial intelligence: ‘Does technology care?’ Med Educ 2021; 55: 30–6.
21.Barboi C, Tzavelis A, Muhammad LN: Comparison of severity of illness scores and artificial intelligence models that are predictive of intensive care unit mortality: meta-analysis and review of the literature. JMIR Med Inform 2022; 10: e35293 CrossRef MEDLINE PubMed Central
22.Loftus TJ, Tighe PJ, Ozrazgat-Baslanti T, et al.: Ideal algorithms in healthcare: explainable, dynamic, precise, autonomous, fair, and reproducible. PLOS Digit Health 2022; 1: e0000006 CrossRef MEDLINE PubMed Central
23.Amann J, Vetter D, Blomberg SN, et al.: To explain or not to explain?—Artificial intelligence explainability in clinical decision support systems. PLOS Digital Health 2022; 1: e0000016 CrossRef MEDLINE PubMed Central
24.Obermeyer Z, Topol EJ: Artificial intelligence, bias, and patients’ perspectives. Lancet 2021; 397(10289): 2038 CrossRef MEDLINE
25.Cabitza F, Campagner A, Balsano C: Bridging the “last mile” gap between AI implementation and operation: “data awareness” that matters. Ann Transl Med 2020; 8: 501 CrossRef MEDLINE PubMed Central
26.Gaube S, Suresh H, Raue M, et al.: Do as AI say: susceptibility in deployment of clinical decision-aids. NPJ Digit Med 2021; 4: 31 CrossRef MEDLINE PubMed Central
27.Nagy M, Sisk B: How will artificial intelligence affect patient-clinician relationships? AMA J Ethics 2020; 22: E395–400 CrossRef MEDLINE
28.Lu SC, Xu C, Nguyen CH, Geng Y, Pfob A, Sidey-Gibbons C: Machine learning-based short-term mortality prediction models for patients with cancer using electronic health record data: systematic review and critical appraisal. JMIR Med Inform 2022; 10: e33182 CrossRef MEDLINE PubMed Central
29.Wingfield LR, Ceresa C, Thorogood S, Fleuriot J, Knight S: Using artificial intelligence for predicting survival of individual grafts in liver transplantation: a systematic review. Liver Transpl 2020; 26: 922–34 CrossRef MEDLINE
30.Caliebe A, Leverkus F, Antes G, Krawczak M: Does big data require a methodological change in medical research? BMC Med Res Methodol 2019; 19: 125 CrossRef MEDLINE PubMed Central
31.Nagendran M, Chen Y, Lovejoy CA, et al.: Artificial intelligence versus clinicians: systematic review of design, reporting standards, and claims of deep learning studies. BMJ 2020; 368: m689 CrossRef MEDLINE PubMed Central
32.Zhou Q, Chen ZH, Cao YH, Peng S: Clinical impact and quality of randomized controlled trials involving interventions evaluating artificial intelligence prediction tools: a systematic review. NPJ Digit Med 2021; 4: 154 CrossRef MEDLINE PubMed Central
33.Ryan M: In AI we trust: ethics, artificial intelligence, and reliability. Sci Eng Ethics 2020; 26: 2749–67 CrossRef MEDLINE PubMed Central
34.Collins GS, Dhiman P, Andaur Navarro CL, et al.: Protocol for development of a reporting guideline (TRIPOD-AI) and risk of bias tool (PROBAST-AI) for diagnostic and prognostic prediction model studies based on artificial intelligence. BMJ Open 2021; 11: e048008 CrossRef MEDLINE PubMed Central
35.Wiens J, Saria S, Sendak M, et al.: Do no harm: a roadmap for responsible machine learning for health care. Nat Med 2019; 25: 1337–40 CrossRef CrossRef
36.He J, Baxter SL, Xu J, Xu J, Zhou X, Zhang K: The practical implementation of artificial intelligence technologies in medicine. Nat Med 2019; 25: 30–6 CrossRef MEDLINE PubMed Central
37.FDA US: Artificial Intelligence and Machine Learning (AI/ML)-Enabled Medical Devices. www.fda.gov/medical-devices/software-medical-device-samd/artificial-intelligence-and-machine-learning-aiml-enabled-medical-devices (last accessed on 5 May 2023).
38.Repici A, Spadaccini M, Antonelli G, et al.: Artificial intelligence and colonoscopy experience: lessons from two randomised trials. Gut 2022; 71: 757–65 CrossRef MEDLINE
39.Keane PA, Topol EJ: AI-facilitated health care requires education of clinicians. Lancet 2021; 397 (10281): 1254 CrossRef MEDLINE
40.Young AT, Amara D, Bhattacharya A, Wei ML: Patient and general public attitudes towards clinical artificial intelligence: a mixed methods systematic review. Lancet Digital Health 2021; 3: e599–e611 CrossRef MEDLINE
e1.Haenssle HA, Fink C, Toberer F, et al.: Man against machine reloaded: performance of a market-approved convolutional neural network in classifying a broad spectrum of skin lesions in comparison with 96 dermatologists working under less artificial conditions. Ann Oncol 2020; 31: 137–43 CrossRef MEDLINE
e2.Blaha J, Barteczko-Grajek B, Berezowicz P, et al.: Space GlucoseControl system for blood glucose control in intensive care patients—a European multicentre observational study. BMC Anesthesiol 2016; 16: 8 CrossRef MEDLINE PubMed Central
e3.Repici A, Badalamenti M, Maselli R, et al.: Efficacy of real-time computer-aided detection of colorectal neoplasia in a randomized trial. Gastroenterology 2020; 159: 512–20.e7 CrossRef MEDLINE
e4.Romero-Martín S, Elías-Cabot E, Raya-Povedano JL, Gubern-Mérida A, Rodríguez-Ruiz A, Álvarez-Benito M: Stand-alone use of artificial intelligence for digital mammography and digital breast tomosynthesis screening: a retrospective evaluation. Radiology 2022; 302: 535–42 CrossRef MEDLINE
e5.Meyer A, Zverinski D, Pfahringer B, et al.: Machine learning for real-time prediction of complications in critical care: a retrospective study. Lancet Respir Med 2018; 6: 905–14 CrossRef MEDLINE
e6.Braun T, Spiliopoulos S, Veltman C, et al.: Detection of myocardial ischemia due to clinically asymptomatic coronary artery stenosis at rest using supervised artificial intelligence-enabled vectorcardiography—a five-fold cross validation of accuracy. J Electrocardiol 2020; 59: 100–5 CrossRef MEDLINE
e7.Ipp E, Liljenquist D, Bode B, et al.: Pivotal evaluation of an artificial intelligence system for autonomous detection of referrable and vision-threatening diabetic retinopathy. JAMA Netw Open 2021; 4: e2134254 CrossRef MEDLINE PubMed Central
e8.Franzke AW, Kristoffersen MB, Bongers RM, et al.: Users’ and therapists’ perceptions of myoelectric multi-function upper limb prostheses with conventional and pattern recognition control. PLoS One 2019; 14: e0220899 CrossRef MEDLINE PubMed Central
e9.Daifalla K, Günther S: Eigener Report 2022: Mindpeak Breast HER2 RoI Clinical Performance Evaluation Summary. https://rb.gy/h25wv (last accessed on 20 November 2022).
e10.Sun H, Depraetere K, Meesseman L, et al.: Machine learning-based prediction models for different clinical risks in different hospitals: evaluation of live performance. J Med Internet Res 2022; 24: e34295 CrossRef MEDLINE PubMed Central
e11.Chamberlin J, Kocher MR, Waltz J, et al.: Automated detection of lung nodules and coronary artery calcium using artificial intelligence on low-dose CT scans for lung cancer screening: accuracy and prognostic value. BMC Med 2021; 19: 55 CrossRef MEDLINE PubMed Central