AI in medicine: Revolutionary tools, uncertain results

Last updated Dec 5, 2024

Can AI truly revolutionize healthcare? A systematic review reveals the hidden gaps in patient benefits and the roadblocks to meaningful clinical integration.

Study: Benefits and harms associated with the use of AI-related algorithmic decision-making systems by healthcare professionals: a systematic review. Image Credit: Antonio Marca / Shutterstock

In a recent study published in The Lancet Regional Health—Europe, a group of researchers evaluated the benefits and harms of artificial intelligence (AI)-related algorithmic decision-making (ADM) systems used by healthcare professionals compared to standard care, focusing on patient-relevant outcomes.

Background

Advances in AI have enabled systems to outperform medical experts in tasks like diagnosis, personalized medicine, patient monitoring, and drug development. Despite these advancements, it remains unclear whether improved diagnostic accuracy and performance metrics translate into tangible patient benefits, such as reduced mortality or morbidity.

Current research often prioritizes analytical performance over clinical outcomes, and many AI-based medical devices are approved without proper evidence from randomized controlled trials (RCTs).

Moreover, the lack of transparency and standardized assessments of harms associated with these technologies raises ethical and practical concerns. This highlights a critical gap in AI research and development, necessitating the need for further evaluations focusing on patient-relevant outcomes to ensure meaningful and safe integration into healthcare.

About the Study

This systematic review followed the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines to ensure methodological rigor. Searches were conducted in the Medical Literature Analysis and Retrieval System Online (MEDLINE), Excerpta Medica Database (EMBASE), Public/Publisher MEDLINE (PubMed), and Institute of Electrical and Electronics Engineers (IEEE) Xplore, covering a 10-year period up to March 27, 2024, when AI-related ADM systems became relevant in healthcare studies. The search included terms related to AI, machine learning (ML), decision-making algorithms, healthcare professionals, and patient outcomes.

Eligible studies included interventional or observational designs involving AI decision support systems developed with or utilizing ML. Studies had to report patient-relevant outcomes, such as mortality, morbidity, hospital length of stay, readmission, or health-related quality of life. Exclusion criteria included studies without preregistration, lacking a standard-of-care control, or focusing on robotics or other systems unrelated to AI-based decision-making. The protocol for this review was preregistered on the International Prospective Register of Systematic Reviews (PROSPERO), with any amendments documented.

Reviewers screened titles, abstracts, and full texts using predefined criteria. Data extraction and quality assessment were conducted independently using standardized forms. The risk of bias was evaluated with Cochrane’s Risk of Bias 2 (RoB 2) tool and the Risk of Bias in Non-Randomized Studies of Interventions (ROBINS-I) tool to address potential confounding factors, while reporting transparency was assessed using the Consolidated Standards of Reporting Trials–Artificial Intelligence (CONSORT-AI) extension and the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis–Artificial Intelligence (TRIPOD-AI) framework.

Data extracted included study settings, design, intervention and comparator details, patient and professional demographics, algorithm characteristics, and outcome measures. Studies were also classified by AI system type, clinical area, prediction goals, and regulatory and funding information. The analysis also examined whether the unique contributions of AI systems to outcomes were isolated and validated.

Study Results

The systematic review included 19 studies, comprising 18 RCTs and one prospective cohort study, selected after screening 3,000 records. These studies were conducted across various regions, with nine in the United States, four in Europe, three in China, and others distributed globally. Settings included 14 hospital-based studies, three in outpatient clinics, one in a nursing home, and one in a mixed environment.

The studies covered a range of medical specialties, including oncology (4 studies), psychiatry (3 studies), internal hospital medicine, neurology, and anesthesiology (2 studies each), and single studies in diabetology, pulmonology, critical care, and other specialties.

The median number of participants across studies was 243, with a median age of 59.3 years. Female representation averaged 50.5%, and racial or ethnic composition was reported in 10 studies, with a median of 71.4% White participants. Twelve studies described the intended medical professional users, such as charge nurses or primary care providers, and nine detailed training protocols, ranging from brief platform introductions to multi-day supervised sessions.

AI systems varied in type and function, with seven studies utilizing surveillance systems for real-time monitoring and predictive alerts, six employing treatment personalization systems, and four integrating multiple functionalities. Examples included algorithms for glycemic control in diabetes, personalized psychiatric care, and monitoring venous thromboembolism. Development data sources ranged from large in-house datasets to pooled multi-institutional data, with diverse ML models applied, such as gradient boosting, neural networks, Bayesian classifiers, and regression-based models. Despite these developments, external validation of algorithms was limited in most studies, raising concerns about their generalizability to broader patient populations.

The risk of bias was assessed as low in four RCTs, moderate in seven, and high in another seven, while the cohort study demonstrated a serious risk of bias. Compliance with CONSORT-AI and TRIPOD-AI guidelines was variable, with three studies achieving full adherence, while others ranged from high to low compliance. Most studies conducted before the introduction of these guidelines showed moderate adherence, though explicit references to the guidelines were rare.

Outcomes highlighted a mix of benefits and harms. Twelve studies reported patient-relevant benefits, including reductions in mortality, improved depression and pain management, and enhanced quality of life. However, only eight studies included standardized harm assessments, and most failed to document adverse events comprehensively. Despite six AI systems receiving regulatory approvals, associations between regulatory status, study quality, and patient outcomes remained inconclusive.

Conclusions

This systematic review underscores the scarcity of high-quality studies evaluating patient-relevant outcomes of AI-related ADM systems in healthcare. While psychiatry consistently showed benefits, other fields yielded mixed results, with limited evidence on mortality, anxiety, and hospital stay improvements. Most studies lacked balanced harm-benefit assessments and failed to isolate AI’s unique contributions.

The findings highlight the urgent need for transparent reporting, robust validation practices, and standardized frameworks to guide the safe and effective integration of AI in clinical settings.

Source link : News-Medica