使用SpaCy NER標記胸部放射檢查報告：與 CheXpert Labeler 的比較;Using SpaCy NER to Label Chest Radiography Reports: Comparison with CheXpert Labele

NCUIR > college of Health Sciences and Technology > Institute of Biomedical Engineering > Electronic Thesis & Dissertation > Item 987654321/94786

Please use this identifier to cite or link to this item: http://ir.lib.ncu.edu.tw/handle/987654321/94786

Title:	使用SpaCy NER標記胸部放射檢查報告：與 CheXpert Labeler 的比較;Using SpaCy NER to Label Chest Radiography Reports: Comparison with CheXpert Labele
Authors:	張維辰;CHANG, WEI-CHEN
Contributors:	生物醫學工程研究所
Keywords:	自然語言處理;胸部X光報告;健康資訊學;命名實體識別;遷移學習;變換神經網路;Natural Language Processing (NLP);Chest X-Ray Reports (CXR);Health Informatics;Named Entity Recognition;Transfer Learning;Transformer
Date:	2024-07-26
Issue Date:	2024-10-09 15:30:09 (UTC+8)
Publisher:	國立中央大學
Abstract:	胸部X光檢查是醫療中最常用的檢查之一。 X光攝影的優點是操作簡單、非侵入性、輻射劑量低，能快速總結胸部、肺組織、血管、心臟等胸內器官的狀況。由於胸部X光檢查包含許多臨床的信息，可以判讀許多病情和疾病，臨床醫生和放射科醫生往往需要花費大量的時間和精力進行判讀，並在診斷時盡量避免遺漏胸部病灶。目前有許多人工智慧（AI）輔助胸部X射線判讀系統正在開發中。資料科學家在訓練 AI 模型時面臨的最大挑戰是產生高品質的標記 X 光影像非常耗時。它需要具有放射學專業知識的專業人員來充分理解X射線影像的內容，這對於醫學影像領域之外的人員來說極具挑戰性。由於放射科醫生通常以自由文本形式記錄每次X射線檢查的檢查報告，因此提出了自然語言處理（NLP）技術來捕獲原始文本報告中的診斷結果。因此，此 NLP 處理資訊可以自動轉換為 X 光影像標籤作為真實標籤。這將節省大量人力，並且可以快速標記更多圖像來訓練AI並提高診斷的準確性。命名實體識別 (NER) 是一種流行的 NLP 技術，可協助擷取 X 光檢查報告中使用的關鍵字。透過訓練 NER 機器識別特定疾病術語並解釋其陽性/陰性指示，自由文字胸部 X 光報告可以快速自動轉換為高品質的胸部 X 光影像標籤，用於訓練 AI 模型進行分類。在這項研究中，我們實作了一個 Python NER 程序，它可以識別胸部 X 光報告中使用的常見關鍵字。這些常用關鍵字參考了美國史丹佛大學開發的CheXpert（Chest eXpert）Labeler的14類詞庫，用於標記胸部X光影像。我們的自動 NER 功能是使用 SpaCy 實現的，正/負指示是使用我們微調的 sBERT（句子 BERT）實現的。我們使用美國國家醫學圖書館（MeSH）資料集（放射科醫師標記的 3,955 份胸部 X 光報告）作為評估的基準資料集。我們將新開發的軟體的標記結果與 CheXpert Labeler 的標記進行了比較。NER偵測正確率與 CheXpert Labeler 相當，而且執行速度提高了 6 倍，並視覺化NER 標籤，在自由文字報告中顯示檢測到的關鍵字及其特定疾病類別。;Chest X-ray is one of the most commonly used examinations in medical treatment. The advantages of X-ray photography are that it is simple, non-invasive, has a low radiation dose, and can quickly summarize the status of chest, lung tissue, blood vessels, the heart and other intrathoracic organs. Since chest X-ray examination contains information of many clinical indications and can diagnose many conditions and diseases, clinicians and radiologists often need to spend a lot of time and energy on interpretation and try to avoid missing chest lesions in diagnosis. There are currently many artificial intelligence (AI)-assisted chest X-ray interpretation systems under development. The biggest challenge for data scientists in training AI models is that producing high-quality labeled X-ray images is time-consuming. It requires specialized personnel with expertise in radiology to fully understand the content of X-ray images, which is extremely challenging for people outside of the medical imaging field. Since radiologists generally record the examination report in free-text form for each X-ray examination, the nature language processing (NLP) technology has been proposed to capture the diagnostic results in the original text report. Consequently, this NLP processing information can be automatically converted into X-ray image tags as the ground truth labels. This will save a lot of manpower, and more images can be quickly labeled to train AI and improve the accuracy of diagnosis. Named entity recognition (NER) is a popular NLP technology that helps extract the keywords used in X-ray examination reports. By training a NER machine to recognize specific-disease terms and to interpret their positive/negative indications, free-text chest X-ray reports can be quickly and automatically converted into high quality chest X-ray image labels for training AI models for classification. In this study we have implemented a Python NER program that can recognize the common keywords used in chest X-ray reports. These common keywords refer to the 14-category lexicon of CheXpert (Chest eXpert) Labeler developed by Stanford University in the United States for label chest X-ray images. Our automatic NER functionality was implemented using SpaCy and the positive/negative indications were implemented using sBERT (sentence BERT) that was fine-tuned by our group. We used the U.S. National Library of Medicine (MeSH) dataset (3,955 chest X-ray reports labeled by radiologists) as the benchmark dataset for evaluation. We compared the labeling results of the newly developed software with the labels of CheXpert Labeler. Our software has achieved comparable accuracy in NER performance, improved the execution speed by 6 folds compared to CheXpert Labeler, and generated NER labels within the free-text reports to highlight the detected keywords and their specific disease categories.
Appears in Collections:	[Institute of Biomedical Engineering] Electronic Thesis & Dissertation

Files in This Item:

File	Description	Size	Format
index.html		0Kb	HTML	34	View/Open

社群 sharing

Loading...