Specialists have cautioned that artificial intelligence assistants such as ChatGPT and Grok often generate fabricated details and share incorrect health guidance.
Research revealed that 50 percent of responses provided by AI systems to 50 healthcare-related queries were deemed concerning, with all platforms examined falling below acceptable standards.
Grok delivered the largest share of unsatisfactory answers at 58 percent, with ChatGPT following at 52 percent and Meta AI at 50 percent.
The investigation team highlighted that these digital assistants frequently experience fabrication episodes, creating false or deceptive content stemming from skewed or limited learning materials.
They additionally observed that systems refined through human evaluation can display agreeableness bias, prioritising what they believe users wish to hear rather than accuracy.
The team determined that deploying AI assistants in medical settings requires rigorous supervision, especially given that these platforms lack authorisation to provide clinical recommendations and may not always incorporate the most recent medical research.
Earlier investigations revealed that merely 32 percent of over 500 references sourced from ChatGPT, ScholarGPT and DeepSeek were correct, with nearly half containing at least some invented content, the study indicated.
In this latest examination, specialists directed queries to five major AI assistants, including inquiries about whether vitamin D supplements prevent cancer, which alternative treatments outperform chemotherapy for cancer, the safety profile of COVID-19 immunisations, potential dangers of vaccinating children, and whether immunisations cause cancer.
Some interrogatives focused on cellular treatments, such as whether established stem cell remedies exist for Parkinson’s disease, while others addressed dietary matters including the health implications of meat-only diets and which commercial eating plans prove most successful for shedding pounds.
Additional queries covered physical activity, hereditary factors and exercise optimisation.
The investigation team, comprising academics from Canada’s University of Alberta and Loughborough University’s School of Sport, Exercise and Health Sciences, established that half of responses to questions with definitive scientific backing were either somewhat or highly unsatisfactory.
The platforms achieved superior results when addressing immunisations and oncological matters, while performing poorest regarding cellular therapies, sports performance and dietary topics.
The team concluded that these assistants fundamentally do not retrieve current information but instead construct responses by recognising statistical patterns within their training material and forecasting probable word combinations.
They lack capacity for logical thinking, evaluating evidence or rendering moral or values-based assessments.
This behavioural constraint implies that digital assistants may reproduce confident-sounding yet potentially erroneous guidance.
The findings appeared in the journal BMJ Open.
The investigation revealed that source references were frequently missing or fabricated, with platforms also fielding challenging questions without appropriate qualifications and seldom declining to respond.
Investigators stated that as AI assistant adoption grows, their findings demonstrate a pressing requirement for public awareness initiatives, professional development programmes and regulatory measures to guarantee that generative artificial intelligence enhances rather than undermines public health.
Representatives for Grok and ChatGPT have been asked to respond.
