台灣公共衛生雜誌 Taiwan Journal of Public Health

2024/03/02 [最新預刊]

AI聊天機器人Bing Chat能否準確回答PICO臨床問題？—以術前口服碳水化合物為例

目標：新一波人工智慧熱潮－GPT（Generative Pre-trained Transformer）是基於大型語言模型（Large Language Model, LLM）的科技，能夠生成不同主題和語境的文本，然而，它在回答醫學問題的表現尚未被充分評估。本研究旨在評估聊天機器人在回答複雜臨床問題時的表現。方法：於2023年6月12日使用微軟開發的Bing Chat作為GPT的代表，它能在Edge瀏覽器和Bing搜尋引擎中使用。我們從考科藍台灣研究中心的網站上隨機抽取了一個臨床問題，即術前口服碳水化合物是否能減少術後不適感？我們要求Bing Chat使用PICO框架（Population、Intervention、Comparison、Outcome）來回答，並提供相關的參考文獻和證據摘要。我們將英文提示輸入到Bing Chat的會話框中，並記錄它的回答。然後，經由與PubMed中的真實文獻進行比對，以審閱證據的完整性、準確性。結果：Bing Chat能夠快速地識別PICO框架的各要素，並根據一些文獻來總結出簡明的答案。然而，Bing Chat提供有限的參考列表（7篇），其中有些文獻的刊名和作者不正確（所謂GPT的幻覺（hallucination）），而且它提供的證據摘要不夠詳盡，沒有涵蓋研究方法和結果數據。Bing Chat在經過多次提示後，能提供更具體的重點數據。結論：Bing Chat在回答醫學問題時有一定的能力，但仍需改進其資料收集和處理的準確度、完整度，醫療提供者在使用它時也應該審慎檢查其提供的信息，並了解其局限性。

預定刊載卷期：台灣衛誌 2024；43(1)
實務 Practice
汪秀玲
Hsiu-Ling Wang
聊天機器人、人工智慧、ChatGPT、大型語言模型、實證醫學
chatbot, artificial intelligence (AI), GPT (Generative Pre-trained Transformer), large language model (LLM), evidence-based medicine
目標：新一波人工智慧熱潮－GPT（Generative Pre-trained Transformer）是基於大型語言模型（Large Language Model, LLM）的科技，能夠生成不同主題和語境的文本，然而，它在回答醫學問題的表現尚未被充分評估。本研究旨在評估聊天機器人在回答複雜臨床問題時的表現。方法：於2023年6月12日使用微軟開發的Bing Chat作為GPT的代表，它能在Edge瀏覽器和Bing搜尋引擎中使用。我們從考科藍台灣研究中心的網站上隨機抽取了一個臨床問題，即術前口服碳水化合物是否能減少術後不適感？我們要求Bing Chat使用PICO框架（Population、Intervention、Comparison、Outcome）來回答，並提供相關的參考文獻和證據摘要。我們將英文提示輸入到Bing Chat的會話框中，並記錄它的回答。然後，經由與PubMed中的真實文獻進行比對，以審閱證據的完整性、準確性。結果：Bing Chat能夠快速地識別PICO框架的各要素，並根據一些文獻來總結出簡明的答案。然而，Bing Chat提供有限的參考列表（7篇），其中有些文獻的刊名和作者不正確（所謂GPT的幻覺（hallucination）），而且它提供的證據摘要不夠詳盡，沒有涵蓋研究方法和結果數據。Bing Chat在經過多次提示後，能提供更具體的重點數據。結論：Bing Chat在回答醫學問題時有一定的能力，但仍需改進其資料收集和處理的準確度、完整度，醫療提供者在使用它時也應該審慎檢查其提供的信息，並了解其局限性。

Objectives: The recent surge in enthusiasm for artificial intelligence has revolved around GPT (Generative Pretrained Transformer) technology, a technology that relies on large language models to generate diverse texts across topics and contexts. However, a GPT’s performance in answering clinical questions remains to be comprehensively evaluated. Thus, this study was conducted to evaluate the performance of chatbots in answering complex clinical questions. Methods: In this study, Bing Chat (developed by Microsoft) served as a representative of GPT technology. This chatbot can be used in the Edge browser or Bing search engine. We randomly extracted a clinical question from the website of the Cochrane Taiwan Research Center (regarding whether preoperative oral carbohydrate could reduce postoperative discomfort) and asked Bing Chat to answer it using the PICO (Population, Intervention, Comparison, Outcome) framework and to provide relevant references and evidence summaries. A relevant English prompt was entered into the conversation box of Bing Chat, and its answer was recorded. The completeness and accuracy of the chatbot-provided evidence were analyzed through a comparison with the literature (PubMed). Results: Bing Chat rapidly identified the elements of the PICO framework and provided a concise answer on the basis of the literature. However, it offered a limited number of references (seven articles), some of which had incorrect names of journals and authors (a phenomenon known as GPT hallucination). Furthermore, the chatbot provided insufficient evidence summaries that did not cover research methods or results. Nonetheless, after receiving multiple prompts, Bing Chat provided relatively specific information. Conclusions: Bing Chat can answer medical questions to some extent. However, the accuracy and completeness of its data collection and processing methods require further improvement. Therefore, when using this chatbot, health-care providers are recommended to carefully check the information that it provides and to remain aware of its limitations.
82 - 92
http://bit.ly/3r4HS9R