WEKO3
アイテム
mCSQA: Multilingual Commonsense Reasoning Dataset with Unified Creation Strategy by Language Models and Humans
http://hdl.handle.net/10061/0002001117
http://hdl.handle.net/10061/0002001117608c5e1a-ecf5-47fb-b5aa-e2e6b628ef14
| アイテムタイプ | 会議発表論文 / Conference Paper(1) | |||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 公開日 | 2025-08-08 | |||||||||||||||||||
| タイトル | ||||||||||||||||||||
| タイトル | mCSQA: Multilingual Commonsense Reasoning Dataset with Unified Creation Strategy by Language Models and Humans | |||||||||||||||||||
| 言語 | ||||||||||||||||||||
| 言語 | eng | |||||||||||||||||||
| 資源タイプ | ||||||||||||||||||||
| 資源タイプ | conference paper | |||||||||||||||||||
| アクセス権 | ||||||||||||||||||||
| アクセス権 | open access | |||||||||||||||||||
| 著者 |
Sakai, Yusuke
× Sakai, Yusuke
× 上垣外, 英剛
× 渡辺, 太郎
|
|||||||||||||||||||
| 抄録 | ||||||||||||||||||||
| 内容記述タイプ | Abstract | |||||||||||||||||||
| 内容記述 | It is very challenging to curate a dataset for language-specific knowledge and common sense in order to evaluate natural language understanding capabilities of language models. Due to the limitation in the availability of annotators, most current multilingual datasets are created through translation, which cannot evaluate such language-specific aspects. Therefore, we propose Multilingual CommonsenseQA (mCSQA) based on the construction process of CSQA but leveraging language models for a more efficient construction, e.g., by asking LM to generate questions/answers, refine answers and verify QAs followed by reduced human efforts for verification. Constructed dataset is a benchmark for cross-lingual language-transfer capabilities of multilingual LMs, and experimental results showed high language-transfer capabilities for questions that LMs could easily solve, but lower transfer capabilities for questions requiring deep knowledge or commonsense. This highlights the necessity of language-specific datasets for evaluation and training. Finally, our method demonstrated that multilingual LMs could create QA including language-specific knowledge, significantly reducing the dataset creation cost compared to manual creation. The datasets are available at https://huggingface.co/datasets/yusuke1997/mCSQA. | |||||||||||||||||||
| 書誌情報 |
en : Findings of the Association for Computational Linguistics: ACL 2024 p. 14182-14214, 発行日 2024-08-11 |
|||||||||||||||||||
| 会議情報 | ||||||||||||||||||||
| 会議名 | The 62nd Annual Meeting of the Association for Computational Linguistics | |||||||||||||||||||
| 開始年 | 2024 | |||||||||||||||||||
| 開始月 | 08 | |||||||||||||||||||
| 開始日 | 11 | |||||||||||||||||||
| 終了年 | 2024 | |||||||||||||||||||
| 終了月 | 08 | |||||||||||||||||||
| 終了日 | 16 | |||||||||||||||||||
| 開催期間 | 2024-08-11 - 2024-08-16 | |||||||||||||||||||
| 開催地 | Bangkok, Thailand | |||||||||||||||||||
| 開催国 | THA | |||||||||||||||||||
| 出版者 | ||||||||||||||||||||
| 出版者 | Association for Computational Linguistics | |||||||||||||||||||
| 出版者版DOI | ||||||||||||||||||||
| 関連タイプ | isReplacedBy | |||||||||||||||||||
| 識別子タイプ | DOI | |||||||||||||||||||
| 関連識別子 | https://doi.org/10.18653/v1/2024.findings-acl.844 | |||||||||||||||||||
| 出版者版URI | ||||||||||||||||||||
| 関連タイプ | isReplacedBy | |||||||||||||||||||
| 識別子タイプ | URI | |||||||||||||||||||
| 関連識別子 | https://aclanthology.org/2024.findings-acl.844/ | |||||||||||||||||||
| 権利 | ||||||||||||||||||||
| 権利情報Resource | https://creativecommons.org/licenses/by/4.0/ | |||||||||||||||||||
| 権利情報 | $00A92024 Association for Computational Linguistics. ACL materials are Copyright $00A9 1963$20132025 ACL; other materials are copyrighted by their respective copyright holders. Materials prior to 2016 here are licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 International License. Permission is granted to make copies for the purposes of teaching and research. Materials published in or after 2016 are licensed on a Creative Commons Attribution 4.0 International License. | |||||||||||||||||||
| 著者版フラグ | ||||||||||||||||||||
| 出版タイプ | NA | |||||||||||||||||||