ログイン
Language:

WEKO3

  • トップ
  • ランキング
To
lat lon distance
To

Field does not validate



インデックスリンク

インデックスツリー

メールアドレスを入力してください。

WEKO

One fine body…

WEKO

One fine body…

アイテム

  1. 02 情報科学
  2. 02 国際会議論文

mCSQA: Multilingual Commonsense Reasoning Dataset with Unified Creation Strategy by Language Models and Humans

http://hdl.handle.net/10061/0002001117
http://hdl.handle.net/10061/0002001117
608c5e1a-ecf5-47fb-b5aa-e2e6b628ef14
アイテムタイプ 会議発表論文 / Conference Paper(1)
公開日 2025-08-08
タイトル
タイトル mCSQA: Multilingual Commonsense Reasoning Dataset with Unified Creation Strategy by Language Models and Humans
言語
言語 eng
資源タイプ
資源タイプ conference paper
アクセス権
アクセス権 open access
著者 Sakai, Yusuke

× Sakai, Yusuke

en Sakai, Yusuke

Search repository
上垣外, 英剛

× 上垣外, 英剛

ja 上垣外, 英剛

ja-Kana カミガイト, ヒデタカ

en Kamigaito, Hidetaka

Search repository
渡辺, 太郎

× 渡辺, 太郎

ja 渡辺, 太郎

ja-Kana ワタナベ, タロウ

en Watanabe, Taro

Search repository
抄録
内容記述タイプ Abstract
内容記述 It is very challenging to curate a dataset for language-specific knowledge and common sense in order to evaluate natural language understanding capabilities of language models. Due to the limitation in the availability of annotators, most current multilingual datasets are created through translation, which cannot evaluate such language-specific aspects. Therefore, we propose Multilingual CommonsenseQA (mCSQA) based on the construction process of CSQA but leveraging language models for a more efficient construction, e.g., by asking LM to generate questions/answers, refine answers and verify QAs followed by reduced human efforts for verification. Constructed dataset is a benchmark for cross-lingual language-transfer capabilities of multilingual LMs, and experimental results showed high language-transfer capabilities for questions that LMs could easily solve, but lower transfer capabilities for questions requiring deep knowledge or commonsense. This highlights the necessity of language-specific datasets for evaluation and training. Finally, our method demonstrated that multilingual LMs could create QA including language-specific knowledge, significantly reducing the dataset creation cost compared to manual creation. The datasets are available at https://huggingface.co/datasets/yusuke1997/mCSQA.
書誌情報 en : Findings of the Association for Computational Linguistics: ACL 2024

p. 14182-14214, 発行日 2024-08-11
会議情報
会議名 The 62nd Annual Meeting of the Association for Computational Linguistics
開始年 2024
開始月 08
開始日 11
終了年 2024
終了月 08
終了日 16
開催期間 2024-08-11 - 2024-08-16
開催地 Bangkok, Thailand
開催国 THA
出版者
出版者 Association for Computational Linguistics
出版者版DOI
関連タイプ isReplacedBy
識別子タイプ DOI
関連識別子 https://doi.org/10.18653/v1/2024.findings-acl.844
出版者版URI
関連タイプ isReplacedBy
識別子タイプ URI
関連識別子 https://aclanthology.org/2024.findings-acl.844/
権利
権利情報Resource https://creativecommons.org/licenses/by/4.0/
権利情報 $00A92024 Association for Computational Linguistics. ACL materials are Copyright $00A9 1963$20132025 ACL; other materials are copyrighted by their respective copyright holders. Materials prior to 2016 here are licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 International License. Permission is granted to make copies for the purposes of teaching and research. Materials published in or after 2016 are licensed on a Creative Commons Attribution 4.0 International License.
著者版フラグ
出版タイプ NA
戻る
0
views
See details
Views

Versions

Ver.1 2025-08-08 05:42:23.950978
Show All versions

Share

Share
tweet

Cite as

Other

print

エクスポート

OAI-PMH
  • OAI-PMH JPCOAR 2.0
  • OAI-PMH JPCOAR 1.0
  • OAI-PMH DublinCore
  • OAI-PMH DDI
Other Formats
  • JSON
  • BIBTEX
  • ZIP

コミュニティ

確認

確認

確認


Powered by WEKO3


Powered by WEKO3