EXPLORING THE ROLE OF CHATBOTS IN HEALTHCARE: IS AI SAFETY READY?

Just to be absolutely frank, this blog is not being prepared by a chatbot, as I'm not entirely sure A) what a chatbot is and B) where I might be able to locate one. There certainly seems to be a lot of noise at the moment about the use of such tools by students to prepare dissertations and the like, leading many to suggest a return to more traditional ways of assessing a student's knowledge would be advantageous. Many years ago, whilst lecturing at a medical school we were asked by some of our hospital colleagues to return to written essay type assessments rather than the then fashionable multiple-choice examinations. The reason for this request was that these senior and frequently eminent doctors felt that their newly arrived house physicians couldn't write coherent English. Given that there's a generally held opinion that most doctors handwriting is so appalling that one wouldn't know how bad their language abilities were, that caused some amusement! Certainly, one doctor I worked with for many years had to be forced to write the diagnoses and treatments in his notes in capital letters in the face of mutiny by the colleagues who had to decipher his scrawl.

These musings were prompted by reading a recent study from the American Journal of Gastroenterology on using ChatGPT to answer the American College of Gastroenterology (ACG) Self-Assessment Test. (As a brief aside does anyone else think that a chat bot sounds like a talkative proctologist?). The results made interesting reading. Two versions of ChatGPT (GPT 3 and 4) were used to answer the Self-Assessment Tests for the last two years. Both failed to provide answers that would meet the minimum pass level of 70%, with scores of 65% and 62% for versions 3 and 4 respectively. The authors suggested that a practising gastroenterologist should be scoring in the high 90s for these assessments. Of course, questions requiring image recognition were not entered but for all others the questions and answer choices were entered directly into ChatGPT. The authors concluded "Based on our research, ChatGPT should not be used for medical education in gastroenterology at this time, and it has a way to go before it should be implemented into the healthcare field.".

Whilst there is a superficial attractiveness in using these tools there is also a growing concern amongst many on the potential for misuse of them, which is driving a call from some academics for a return to other assessment tools as happened in my department all those years ago. Though the viva voce examination has the obvious attraction that it can rapidly identify lacunes in a candidate's knowledge, it is probably impractical on a very large scale, and as one of the many who has sat outside the examination room with dry mouth and sweaty palms waiting to be called for inquisition, I wouldn't want to go through it again. I am sure that advances in the technology will lead to a greater and more accurate use of chatbot tools but, as the authors of the paper concluded, I don't think we're there yet in medicine.