| アイテムタイプ |
学術雑誌論文 / Journal Article(1) |
| 公開日 |
2025-06-30 |
| タイトル |
|
|
タイトル |
Cross-lingual Natural Language Processing on Limited Annotated Case/Radiology Reports in English and Japanese: Insights from the Real-MedNLP Workshop |
| 言語 |
|
|
言語 |
eng |
| キーワード |
|
|
主題Scheme |
Other |
|
主題 |
natural language processing |
| キーワード |
|
|
主題Scheme |
Other |
|
主題 |
machine learning |
| キーワード |
|
|
主題Scheme |
Other |
|
主題 |
adverse drug events |
| 資源タイプ |
|
|
資源タイプ |
journal article |
| アクセス権 |
|
|
アクセス権 |
open access |
| 著者 |
矢田, 竣太郎
Nakamura, Yuta
若宮, 翔子
荒牧, 英治
|
| 抄録 |
|
|
内容記述タイプ |
Abstract |
|
内容記述 |
Background$2003Textual datasets (corpora) are crucial for the application of natural language processing (NLP) models. However, corpus creation in the medical field is challenging, primarily because of privacy issues with raw clinical data such as health records. Thus, the existing clinical corpora are generally small and scarce. Medical NLP (MedNLP) methodologies perform well with limited data availability. Objectives$2003We present the outcomes of the Real-MedNLP workshop, which was conducted using limited and parallel medical corpora. Real-MedNLP exhibits three distinct characteristics: (1) limited annotated documents: the training data comprise only a small set ($223C100) of case reports (CRs) and radiology reports (RRs) that have been annotated. (2) Bilingually parallel: the constructed corpora are parallel in Japanese and English. (3) Practical tasks: the workshop addresses fundamental tasks, such as named entity recognition (NER) and applied practical tasks. Methods$2003We propose three tasks: NER of $223C100 available documents (Task 1), NER based only on annotation guidelines for humans (Task 2), and clinical applications (Task 3) consisting of adverse drug effect (ADE) detection for CRs and identical case identification (CI) for RRs. Results$2003Nine teams participated in this study. The best systems achieved 0.65 and 0.89 F1-scores for CRs and RRs in Task 1, whereas the top scores in Task 2 decreased by 50 to 70%. In Task 3, ADE reports were detected by up to 0.64 F1-score, and CI scored up to 0.96 binary accuracy. Conclusion$2003Most systems adopt medical-domain$2013specific pretrained language models using data augmentation methods. Despite the challenge of limited corpus size in Tasks 1 and 2, recent approaches are promising because the partial match scores reached $223C0.8$20130.9$2009F1-scores. Task 3 applications revealed that the different availabilities of external language resources affected the performance per language. |
| 書誌情報 |
en : Methods of Information in Medicine
巻 63,
号 5$20136,
p. 145-163,
発行日 2024-10-29
|
| 出版者 |
|
|
出版者 |
Georg Thieme Verlag KG |
| ISSN |
|
|
収録物識別子タイプ |
EISSN |
|
収録物識別子 |
2511-705X |
| 出版者版DOI |
|
|
関連タイプ |
isReplacedBy |
|
|
識別子タイプ |
DOI |
|
|
関連識別子 |
https://doi.org/10.1055/a-2405-2489 |
| 出版者版URI |
|
|
関連タイプ |
isReplacedBy |
|
|
識別子タイプ |
URI |
|
|
関連識別子 |
https://www.thieme-connect.com/products/ejournals/abstract/10.1055/a-2405-2489 |
| 権利 |
|
|
権利情報Resource |
https://creativecommons.org/licenses/by-nc-nd/4.0/ |
|
権利情報 |
$00A9 2024. The Author(s). This is an open access article published by Thieme under the terms of the Creative Commons Attribution-NonDerivative-NonCommercial License, permitting copying and reproduction so long as the original work is given appropriate credit. Contents may not be used for commercial purposes, or adapted, remixed, transformed or built upon. (https://creativecommons.org/licenses/by-nc-nd/4.0/) |
| 著者版フラグ |
|
|
出版タイプ |
NA |
| 助成情報 |
|
|
|
助成機関名 |
Japan Science and Technology Agency (JST) |
|
|
研究課題番号 |
JPMJCR20G9 |
|
|
研究課題名 |
医薬品安全性監視のための言語を超えた知識強化情報抽出 |
| 助成情報 |
|
|
|
助成機関名 |
Ministry of Health, Labour and Welfare |
|
|
研究課題番号 |
JPMH21AC500111 |
|
|
研究課題名 |
MHLW Program |
| 助成情報 |
|
|
|
助成機関名 |
Japan Science and Technology Agency (JST) |
|
|
研究課題番号 |
JPMJCR18Y1 |
|
|
研究課題名 |
疾患知識ベースの構築と医療テキストの知識処理 |