Researchers warn against relying on AI chatbots for drug safety information

On Oct 15, 2024

Patients shouldn’t rely on AI powered search engines and chatbots to always give them accurate and safe information on drugs, conclude researchers in the journal BMJ Quality & Safety, after finding a considerable number of answers were wrong or potentially harmful.

What’s more, the complexity of the answers provided might make it difficult for patients to fully understand them without a degree level education, add the researchers.

In February 2023, search engines underwent a significant shift thanks to the introduction of AI-powered chatbots, offering the promise of enhanced search results, comprehensive answers, and a new type of interactive experience, explain the researchers.

While these chatbots can be trained on extensive datasets from the entire internet, enabling them to converse on any topic, including healthcare-related queries, they are also capable of generating disinformation and nonsensical or harmful content, they add.

Previous studies looking at the implications of these chatbots have primarily focused on the perspective of healthcare professionals rather than that of patients. To address this, the researchers explored the readability, completeness, and accuracy of chatbot answers for queries on the top 50 most frequently prescribed drugs in the US in 2020, using Bing copilot, a search engine with AI-powered chatbot features.

To simulate patients consulting chatbots for drug information, the researchers reviewed research databases and consulted with a clinical pharmacist and doctors with expertise in pharmacology to identify the medication questions that patients most frequently ask their healthcare professionals.

The chatbot was asked 10 questions for each of the 50 drugs, generating 500 answers in total. The questions covered what the drug was used for, how it worked, instructions for use, common side effects, and contraindications.

Readability of the answers provided by the chatbot was assessed by calculating the Flesch Reading Ease Score which estimates the educational level required to understand a particular text.

Text that scores between 0 and 30 is considered very difficult to read, necessitating degree level education. At the other end of the scale, a score of 91–100 means the text is very easy to read and appropriate for 11 year-olds.

To assess the completeness and accuracy of chatbot answers,responses were compared with the drug information provided by a peer-reviewed and up-to-date drug information website for both healthcare professionals and patients (drugs.com)

Current scientific consensus, and likelihood and extent of possible harm if the patient followed the chatbot’s recommendations, were assessed by seven experts in medication safety, using a subset of 20 chatbot answers displaying low accuracy or completeness, or a potential risk to patient safety.

The Agency for Healthcare Research and Quality (AHRQ) harm scales were used to rate patient safety events and the likelihood of possible harm was estimated by the experts in accordance with a validated framework.

The overall average Flesch Reading Ease Score was just over 37, indicating that degree level education would be required of the reader. Even the highest readability of chatbot answers still required an educational level of high (secondary) school.

Overall, the highest average completeness of chatbot answers was 100%, with an average of 77%. Five of the 10 questions were answered with the highest completeness, while question 3 (What do I have to consider when taking the drug?) was answered with the lowest average completeness of only 23%.

Chatbot statements didn’t match the reference data in 126 of 484 (26%) answers, and were fully inconsistent in 16 of 484 (just over 3%).

Evaluation of the subset of 20 answers revealed that only 54% were rated as aligning with scientific consensus. And 39% contradicted the scientific consensus, while there was no established scientific consensus for the remaining 6%.

Possible harm resulting from a patient following the chatbot’s advice was rated as highly likely in 3% and moderately likely in 29% of these answers. And a third (34%) were judged as either unlikely or not at all likely to result in harm, if followed.

But irrespective of the likelihood of possible harm, 42% of these chatbot answers were considered to lead to moderate or mild harm, and 22% to death or severe harm. Around a third (36%) were considered to lead to no harm.

The researchers acknowledge that their study didn’t draw on real patient experiences and that prompts in different languages or from different countries may affect the quality of chatbot answers.

“In this cross-sectional study, we observed that search engines with an AI-powered chatbot produced overall complete and accurate answers to patient questions,” they write.

“However, chatbot answers were largely difficult to read and answers repeatedly lacked information or showed inaccuracies, possibly threatening patient and medication safety,” they add.

A major drawback was the chatbot’s inability to understand the underlying intent of a patient question, they suggest.

“Despite their potential, it is still crucial for patients to consult their healthcare professionals, as chatbots may not always generate error-free information. Caution is advised in recommending AI-powered search engines until citation engines with higher accuracy rates are available,” they conclude.

Source:

Journal reference:

Andrikyan, W., et al. (2024). Artificial intelligence-powered chatbots in search engines: a cross-sectional study on the quality and risks of drug information for patients. BMJ Quality & Safety. doi.org/10.1136/bmjqs-2024-017476.

Source link : News-Medica