運用句嵌入向量重排序器 增進中文醫療問答系統效能;Enhancing Chinese Medical Question-Answering Performance with Sentence Embedding Reranker

NCUIR > College of Electrical Engineering & Computer Science > Graduate Institute of Electrical Engineering > Electronic Thesis & Dissertation > Item 987654321/90052

Please use this identifier to cite or link to this item: http://ir.lib.ncu.edu.tw/handle/987654321/90052

Title:	運用句嵌入向量重排序器增進中文醫療問答系統效能;Enhancing Chinese Medical Question-Answering Performance with Sentence Embedding Reranker
Authors:	曾昱翔;Zeng, Yu-Xiang
Contributors:	電機工程學系
Keywords:	醫療問答系統;資訊檢索;預訓練語言模型;語義匹配;Medical question-answering;information retrieval;pre-trained language models;semantic search
Date:	2022-08-25
Issue Date:	2022-10-04 12:09:21 (UTC+8)
Publisher:	國立中央大學
Abstract:	真實生活中人們遇到醫療問題，經常藉由不同的管道，尋求醫生的建議與解答，而自動問答系統提供一個即時回覆答案的解決方案。本研究的主要目標為建立中文醫療問答系統，將問題輸入問答系統，從醫療問答資料集中，匹配找出最佳的答案返回給使用者。近年來，不同於傳統的詞彙匹配，深度學習的興起帶動了語義匹配的方式，深度語言模型能有效學習文本的語義訊息，並藉此找出相近的文本。許多研究均顯示出語義匹配的方法較傳統的方法得到更好的效果，因此，我們提出句嵌入向量重排序器 (Sentence Embedding Reranker, SER) 模型。中文問答資料來自於醫聯網 (https://med-net.com/)，資料集共有 26,816 筆醫療問答，我們使用 Pooling method 建立系統測試集，從 26,816 筆問題中取 120 筆問題作為測試問題，每個問題分別經過兩個不同的檢索系統 (BM25 以及 Sentence-BERT)，返回100 筆答案，並人工標註其答案的正確性，最後取兩系統的聯集作為系統測試集。藉由實驗結果得知，我們提出的 SER 重排序器模型，在 MAP、NDCG 效能指標達到最好的分數，有效增進中文問答系統的檢索效能。 ;In the digital era, users usually search and browse web content to obtain healthcare related information before making a doctor’s appointment for diagnosis and treatment. The automatic question-answering system can provide a solution to address this need in real-time. Our main research objective is to design and implement a Chinese medical question answering system. In such a medical QA system, users issue a question as a query and then obtain relevant doctors’ answers in the ranked list. Different from traditional lexical matching methods, the deep learning-based semantic matching model can effectively learn the semantic features to retrieve similar texts. Therefore, we propose a Sentence Embedding Reranker (SER) model to enhance the question-answering performance. The Pooling method was used to combine the top 100 results returned by BM25 and Sentence-BERT retrieve systems for answer relevance annotation. Based on experimental results from these manual-annotated question-answer pairs, our proposed SER re-ranking model achieved the best results in MAP and NDCG, which can enhance the performance of the Chinese medical question-answering system.
Appears in Collections:	[Graduate Institute of Electrical Engineering] Electronic Thesis & Dissertation

Files in This Item:

File	Description	Size	Format
index.html		0Kb	HTML	80	View/Open

社群 sharing

Loading...