DÄ internationalArchive15/2024ChatGPT as a Source of Information on Pancreatic Cancer

Research letter

ChatGPT as a Source of Information on Pancreatic Cancer

Dtsch Arztebl Int 2024; 121: 505-6. DOI: 10.3238/arztebl.m2024.0081

Kneifel, F; Becker, F; Knipping, A; Katou, S; Andreou, A; Juratli, M; Houben, P; Morgul, H; Pascher, A; Strücker, B

LNSLNS

Online sources are a popular information outlet for patients and physicians, but they often lack accuracy and clarity, especially for laypersons. Patients facing highly malignant disease like pancreatic ductal adenocarcinoma (PDAC) require precise, guideline-aligned information to enable informed decision-making about treatment options and prognosis (1).

Large language models (LLMs), such as ChatGPT, simulate human language processing using extensive text data. Their integration in healthcare has sparked both, enthusiasm and concern, due to potential risks (2). Since ChatGPTs public release on 30 November 2022, its medical application has been contentious, particularly regarding the generation of convincing yet potentially inaccurate statements, especially for people lacking expertise in assessing content accuracy (2, 3).

This pilot study aimed to evaluate the quality and suitability of ChatGPT responses regarding surgical therapy for PDACs based on the current German guideline (4).

Methods

Key questions from the German PDAC guideline (4) were posed to ChatGPT (https://chat.openai.com, version GPT-3.5) in July 2023. Responses, along with citations and guideline recommendations, were recorded and transferred to an online survey tool (Survio). Eight experienced visceral surgeons, specialized in oncologic surgery, from the Department of General, Visceral and Transplant Surgery of the University Hospital in Münster individually assessed each response using a 5-point Likert scale.

Additionally, ChatGPT‘s provided references were checked for plausibility using the Pubmed database.

Results

ChatGPT provided detailed and layperson-friendly information. However, expert assessments varied, with 25% of questions receiving a wide range of ratings from “excellent” to “inadequate.” The interrater reliability of experts was calculated to be 0.16. The most frequently assigned grade was “inadequate” (32%), while only 10% received an “excellent” rating. Half of the answers were rated as “satisfactory” or “inadequate” by most experts (Table 1).

Expert assessment of ChatGPT‘s responses to surgical therapy of pancreatic ductal adenocarcinoma
Table
Expert assessment of ChatGPT‘s responses to surgical therapy of pancreatic ductal adenocarcinoma

ChatGPT excelled in accuracy regarding indications and techniques of surgical therapy, but inaccurately indicated the cutoff for the minimal resection margin (Table 1). There was diversity in ChatGPT’s provided sources, with only 15 out of 27 cited correctly. Three out of 27 provided links led to specified publications, while one source was attributed to the wrong author, and 11 cited sources did not exist. For the time period from 1985 to 2020, 73% of publications were outdated.

Discussion

While ChatGPT provided complex and partly accurate answers to onco-surgical questions, some responses were egregiously incorrect. Interestingly, all answers sounded accurate and professional. Despite the sources appearing genuinely reliable, a significant proportion of cited references was either non-existent or outdated, highlighting potential risks of misinformation, bias, and prejudice (2, 3).

Our pilot study revealed a wide range of assessments, with an inter-rater reliability of 0.16. The significant divergences among experts of similar qualifications and expertise could be a contributing factor to the inaccurate responses of ChatGPT, as these divergences may also be present in the training data of the LLM.

LLM’s “black box” processing makes interpretation challenging, relying on training data to anticipate probable words or phrases. LLMs can produce useful and coherent text but lack cognitive abilities for meaningful interpretation or moral control over generated context (2). This might explain the invented quotes, which, in a scientific context, correspond to a serious error (i.e. scientific misconduct) and, if originating from a human, would be interpreted as a deliberate deception (3).

However, this contrasts with the findings of Ayers et al., who demonstrated higher ratings of quality and empathy for chatbot responses than for physician responses (5). In contrast to Ayers et al.(5), however, our deliberate question selection aimed at queries with accurate guideline responses, optimizing the evaluation of the capability of articifial intelligence.

Caution and education are crucial, particularly for laypersons, in discerning the accuracy of ChatGPT‘s responses, as sources can be erroneous or fabricated.

Limitations

This pilot study has several limitations. First, ChatGPT is currently a research version and not intended for medical use. We used the version GPT-3.5, and future versions may yield different results. Second, subjective quality grading of ChatGPT responses makes these susceptible to bias given that the experts knew they were generated by ChatGPT. Lastly, each question was entered into ChatGPT only once, thus response reproducibility was not assessed.

Conclusion

So far, ChatGPT should not be relied on for important medical questions because of the risk of misinformation. Users should be cautioned to exercise care and be aware of the potential risks and limitations. However, future versions of LLM could be significantly beneficial for patients and healthcare professionals.

Acknowledgments

We acknowledge support from the Open Access Publication Fund of the University of Muenster.

Felicia Kneifel, Felix Becker, Alina Knipping, Shadi Katou,
Andreas Andreou, Mazen Juratli, Philipp Houben, Haluk Morgul,
Andreas Pascher, Benjamin Strücker
Klinik für Allgemein-, Viszeral- und Transplantationschirurgie
Universitätsklinikum Münster, felicia.kneifel@ukmuenster.de

Conflict of interest statement
The authors declare that no conflict of interest exists.

Manuscript received on 9 December 2023, revised version accepted on 17 April 2024

Cite this as:
Kneifel F, Becker F, Knipping A, Katou S, Andreou A,
Mazen J, Houben P, Morgul H, Pascher A, Strücker B: ChatGPT as a source of information on pancreatic cancer. Dtsch Arztebl Int 2024; 121: 505–6. DOI: 10.3238/arztebl.m2024.0081

1.
De Groot L, Harris I, Regehr G, Tekian A, Ingledew PA: Quality of online resources for pancreatic cancer patients. J Cancer Educ 2019; 34: 223–8 CrossRef MEDLINE
2.
Iannantuono GM, Bracken-Clarke D, Floudas CS, Roselli M, Gulley JL, Karzai F: Applications of large language models in cancer care: current evidence and future perspectives. Front Oncol 2023; 13: 1268915 CrossRef MEDLINE PubMed Central
3.
Kothari AN: ChatGPT, large language models, and generative AI as future augments of surgical cancer care. Ann Surg Oncol 2023; 30:3174–6 CrossRef MEDLINE
4.
S3-Leitlinie zum exokrinen Pankreaskarzinom. https://www.leitlinienprogramm-onkologie.de/fileadmin/user_upload/Downloads/Leitlinien/Pankreaskarzinom/Version_2/LL_Pankreaskarzinom_Langversion_2.0.pdf (last accessed on 4 July 2024.).
5.
Ayers JW, Poliak A, Dredze M, et al.: Comparing physician and artificial intelligence chatbot responses to patient questions posted to a public social media forum. JAMA Intern Med 2023; 183: 589–96 CrossRef MEDLINE PubMed Central
Expert assessment of ChatGPT‘s responses to surgical therapy of pancreatic ductal adenocarcinoma
Table
Expert assessment of ChatGPT‘s responses to surgical therapy of pancreatic ductal adenocarcinoma
1.De Groot L, Harris I, Regehr G, Tekian A, Ingledew PA: Quality of online resources for pancreatic cancer patients. J Cancer Educ 2019; 34: 223–8 CrossRef MEDLINE
2.Iannantuono GM, Bracken-Clarke D, Floudas CS, Roselli M, Gulley JL, Karzai F: Applications of large language models in cancer care: current evidence and future perspectives. Front Oncol 2023; 13: 1268915 CrossRef MEDLINE PubMed Central
3.Kothari AN: ChatGPT, large language models, and generative AI as future augments of surgical cancer care. Ann Surg Oncol 2023; 30:3174–6 CrossRef MEDLINE
4.S3-Leitlinie zum exokrinen Pankreaskarzinom. https://www.leitlinienprogramm-onkologie.de/fileadmin/user_upload/Downloads/Leitlinien/Pankreaskarzinom/Version_2/LL_Pankreaskarzinom_Langversion_2.0.pdf (last accessed on 4 July 2024.).
5.Ayers JW, Poliak A, Dredze M, et al.: Comparing physician and artificial intelligence chatbot responses to patient questions posted to a public social media forum. JAMA Intern Med 2023; 183: 589–96 CrossRef MEDLINE PubMed Central