| アイテムタイプ |
会議発表論文 / Conference Paper(1) |
| 公開日 |
2025-06-11 |
| タイトル |
|
|
タイトル |
Generating Distributable Surrogate Corpus for Medical Multi-label Classification |
| 言語 |
|
|
言語 |
eng |
| キーワード |
|
|
主題Scheme |
Other |
|
主題 |
Text Generation |
| キーワード |
|
|
主題Scheme |
Other |
|
主題 |
Language Model |
| キーワード |
|
|
主題Scheme |
Other |
|
主題 |
Privacy Protection |
| キーワード |
|
|
主題Scheme |
Other |
|
主題 |
Social Media |
| 資源タイプ |
|
|
資源タイプ |
conference paper |
| アクセス権 |
|
|
アクセス権 |
open access |
| 著者 |
Shimizu, Seiji
矢田, 竣太郎
若宮, 翔子
荒牧, 英治
|
| 抄録 |
|
|
内容記述タイプ |
Abstract |
|
内容記述 |
In medical and social media domains, annotated corpora are often hard to distribute due to copyrights and privacy issues. To overcome this situation, we propose a new method to generate a surrogate corpus for a downstream task by using a text generation model. We chose a medical multi-label classification task, MedWeb, in which patient-generated short messages express multiple symptoms. We first fine-tuned text generation models with different prompting designs on the original corpus to obtain synthetic versions of that corpus. To assess the viability of the generated corpora for the downstream task, we compared the performance of multi-label classification models trained either on the original or the surrogate corpora. The results and the error analysis showed the difficulty of generating surrogate corpus in multi-label settings, suggesting text generation under complex conditions is not trivial. On the other hand, our experiment demonstrates that the generated corpus with a sentinel-based prompting is comparatively viable in a single-label (multiclass) classification setting. |
| 書誌情報 |
en : Proceedings of the First Workshop on Patient-Oriented Language Processng (CL4Health) @ LREC-COLING 2024
p. 153-162,
発行日 2024-05-20
|
| 会議情報 |
|
|
|
会議名 |
LREC-COLING 2024 |
|
|
主催機関 |
ELRA Language Resources Association (ELRA), International Committee on Computational Linguistics (ICCL) |
|
|
開始年 |
2024 |
|
|
開始月 |
05 |
|
|
開始日 |
20 |
|
|
終了年 |
2024 |
|
|
終了月 |
05 |
|
|
終了日 |
25 |
|
|
開催期間 |
2024-05-20 - 2024-05-25 |
|
|
開催会場 |
Lingotto Conference Centre |
|
|
開催地 |
Torino, Italia |
|
開催国 |
ITA |
| 出版者 |
|
|
出版者 |
ELRA and ICCL |
| 出版者版URI |
|
|
関連タイプ |
isReplacedBy |
|
|
識別子タイプ |
URI |
|
|
関連識別子 |
https://aclanthology.org/2024.cl4health-1.19/ |
| 権利 |
|
|
権利情報Resource |
https://creativecommons.org/licenses/by-nc/4.0/ |
|
権利情報 |
Copyright ELRA Language Resources Association (ELRA), 2024 These proceedings are licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0) |
| 著者版フラグ |
|
|
出版タイプ |
NA |