憂鬱症是一種影響患者、家庭成員和社會的心理健康疾患,而患者的高度憂鬱嚴重度往往代表著更高的自殺風險,這突顯了準確診斷和治療的重要性。然而,在臨床上,不僅評估過程耗時,醫生與患者的態度也可能影響診療結果;在學術上,目前心理健康領域的研究多採用英語或公開資料集,且與其他語音分類領域相比,其所用的機器學習(Machine Learning, ML)技術仍有待改進。為擴大ML模型的適用性,本研究建立多模態多任務學習框架以同時預測憂鬱嚴重度和自殺風險分類任務,並設計三個實驗來辨識出兩任務最適合的預訓練嵌入,同時也探索了多任務學習(Multitask Learning, MTL)在不同嵌入下模型性能的優劣。本研究採用了中文語音資料集,內含100名未看過身心科的非憂鬱者、100名來自台灣南部某醫院的憂鬱患者之口述音檔和量表分數,前處理後所產生的語音和文本資料會被轉為預訓練嵌入以傳入模型,而在實踐多模態融合、多任務學習架構上,分別選用了串接和硬參數共享來實作。實驗結果顯示,在憂鬱嚴重度任務上以wav2vec 2.0和eHealth嵌入作為輸入的MTL模型表現最佳,AUC達0.887;而自殺風險任務中則是以HuBERT和eHealth嵌入作為輸入的MTL模型表現最佳,AUC達0.883。本研究證明了在這兩任務中採用多模態嵌入能有效提高模型性能,而MTL雖具有進一步提升性能的潛力,但在應用時需謹慎以避免負面遷移,未來有望將本文模型整合至軟體中,以快速幫助醫師進行準確診斷,並成為民眾自我評估的工具。;Depression is a psychological disorder that impacts patients, their families, and society. Accurate diagnosis and treatment are crucial due to the strong correlation between high depression severity and increased suicide risk. However, clinical evaluations can be time-consuming and influenced by doctors’ and patients’ attitudes. Compared to other audio classification fields, current machine learning (ML) techniques for mental health classification still require improvement. Our study aims to enhance the applicability of ML models by establishing a multimodal multitask learning (MTL) framework for classifying depression severity and suicide risk simultaneously. Three experiments were conducted to identify the most suitable pretrained embeddings for these two tasks and explore the performance of MTL with different embeddings. The dataset utilized in this study comprises Chinese audio recordings and clinical questionnaire scores collected from a sample of 100 non-depressed individuals who had never visited a psychosomatic clinic and 100 depressed patients from a hospital in southern Taiwan. After preprocessing, the audio and text data were transformed into pretrained embeddings and fed into the models. The models employed concatenation and hard parameter sharing to implement multimodal and MTL architecture. The MTL model using wav2vec 2.0 and eHealth embeddings achieved the highest performance in depression severity classification (AUC=0.887), while the model utilizing HuBERT and eHealth embeddings excelled in suicide risk classification (AUC=0.883). Our research demonstrates that employing multimodal embeddings significantly enhances model performance in these tasks. While MTL has the potential for further improvement, caution must be exercised to avoid negative transfer during its application. Integrating our proposed model into software tools can aid physicians in accurate diagnosis and serve as a self-assessment tool for the general public.