| アイテムタイプ |
会議発表論文 / Conference Paper(1) |
| 公開日 |
2025-10-09 |
| タイトル |
|
|
タイトル |
RecordTwin: Towards Creating Safe Synthetic Clinical Corpora |
| 言語 |
|
|
言語 |
eng |
| 資源タイプ |
|
|
資源タイプ |
conference paper |
| アクセス権 |
|
|
アクセス権 |
open access |
| 著者 |
Shimizu, Seiji
Baroud, Ibrahim
Raithel, Lisa
矢田, 竣太郎
若宮, 翔子
荒牧, 英治
|
| 抄録 |
|
|
内容記述タイプ |
Abstract |
|
内容記述 |
The scarcity of publicly available clinical corpora hinders developing and applying NLP tools in clinical research. While existing work tackles this issue by utilizing generative models to create high-quality synthetic corpora, their methods require learning from the original in-hospital clinical documents, turning them unfeasible in practice. To address this problem, we introduce RecordTwin, a novel synthetic corpus creation method designed to generate synthetic documents from anonymized clinical entities. In this method, we first extract and anonymize entities from in-hospital documents to ensure the information contained in the synthetic corpus is restricted. Then, we use a large language model to fill the context between anonymized entities. To do so, we use a small, privacy-preserving subset of the original documents to mimic their formatting and writing style. This approach only requires anonymized entities and a small subset of original documents in the generation process, making it more feasible in practice. To evaluate the synthetic corpus created with our method, we conduct a proof-of-concept study using a publicly available clinical database. Our results demonstrate that the synthetic corpus has a utility comparable to the original data and a safety advantage over baselines, highlighting the potential of RecordTwin for privacy-preserving synthetic corpus creation. |
| 書誌情報 |
en : Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics
p. 14714-14726,
ページ数 13,
発行日 2025-07
|
| 会議情報 |
|
|
|
会議名 |
The 63rd Annual Meeting of the Association for Computational Linguistics (ACL 2025) |
|
|
開始年 |
2025 |
|
|
開始月 |
07 |
|
|
開始日 |
27 |
|
|
終了年 |
2025 |
|
|
終了月 |
08 |
|
|
終了日 |
01 |
|
|
開催期間 |
2025-07-27 - 2025-08-01 |
|
|
開催地 |
Vienna, Austria |
|
開催国 |
AUT |
| 出版者 |
|
|
出版者 |
Association for Computational Linguistics |
| 出版者版DOI |
|
|
関連タイプ |
isReplacedBy |
|
|
識別子タイプ |
DOI |
|
|
関連識別子 |
https://doi.org/10.18653/v1/2025.findings-acl.759 |
| 出版者版URI |
|
|
関連タイプ |
isReplacedBy |
|
|
識別子タイプ |
URI |
|
|
関連識別子 |
https://aclanthology.org/2025.findings-acl.759/ |
| 権利 |
|
|
権利情報Resource |
https://creativecommons.org/licenses/by/4.0/ |
|
権利情報 |
ACL materials are Copyright © 1963–2025 ACL; other materials are copyrighted by their respective copyright holders. Materials prior to 2016 here are licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 International License. Permission is granted to make copies for the purposes of teaching and research. Materials published in or after 2016 are licensed on a Creative Commons Attribution 4.0 International License. |
| 著者版フラグ |
|
|
出版タイプ |
NA |
| 助成情報 |
|
|
|
助成機関名 |
National Center for Global Health and Medicine (NCGM) |
|
|
研究課題番号 |
JPJ012425 |
|
|
研究課題名 |
Cross-ministerial Strategic Innovation Promotion Program (SIP) on “Integrated Health Care System” |
| 助成情報 |
|
|
|
助成機関名 |
Japan Science and Technology Agency (JST) |
|
|
研究課題番号 |
JPMJCR22N1 |
|
|
研究課題番号URI |
https://projectdb.jst.go.jp/grant/JST-PROJECT-22717060/ |
|
|
研究課題名 |
リアルワールドテキスト処理の深化によるデータ駆動型探薬 |
| 助成情報 |
|
|
|
助成機関名 |
German Federal Ministry of Education and Research (BMBF) |
|
|
研究課題番号 |
16KISA006 |
|
|
研究課題名 |
project Medinym |
| 助成情報 |
|
|
|
助成機関名 |
German Federal Ministry of Education and Research (BMBF) |
|
|
研究課題番号 |
16KISA007 |
|
|
研究課題名 |
project Medinym |
| 助成情報 |
|
|
|
助成機関名 |
German Federal Ministry of Education and Research (BMBF) |
|
|
研究課題番号 |
BIFOLD24B |