| アイテムタイプ |
学術雑誌論文 / Journal Article(1) |
| 公開日 |
2025-12-26 |
| タイトル |
|
|
タイトル |
End-to-end Simultaneous Speech Translation with Style Tags using Human Simultaneous Interpretation Data |
| 言語 |
|
|
言語 |
eng |
| キーワード |
|
|
主題Scheme |
Other |
|
主題 |
Simultaneous Speech Translation |
| キーワード |
|
|
主題Scheme |
Other |
|
主題 |
Simultaneous Interpretation |
| キーワード |
|
|
主題Scheme |
Other |
|
主題 |
Domain Adaptation |
| 資源タイプ |
|
|
資源タイプ |
journal article |
| アクセス権 |
|
|
アクセス権 |
open access |
| 著者 |
Ko, Yuka
Fukuda, Ryo
Nishikawa, Yuta
Kano, Yasumasa
須藤, 克仁
Sakti, Sakriani
中村, 哲
|
| 抄録 |
|
|
内容記述タイプ |
Abstract |
|
内容記述 |
Simultaneous speech translation (SimulST) translates speech incrementally, requiring a monotonic input-output correspondence to reduce latency. This is particularly challenging for distant language pairs, such as English and Japanese, as most SimulST models are trained using offline speech translation (ST) data, where the entire speech input is observed during translation. In simultaneous interpretation (SI), a simultaneous interpreter translates source language speech into target language speech without waiting for the speaker to finish speaking. Therefore, the SimulST model can learn SI-style translations using SI data. However, owing to the limited availability of SI data, fine-tuning an offline ST model using SI data may result in overfitting. To address this problem, we propose an efficient training method for the speech-to-text SimulST model using a combination of small SI and relatively large offline ST data. We trained a single model with mixed data by incorporating style tags to instruct the model to generate either SI or offline-style outputs. This approach, called mixed fine-tuning with style tags, can be extended further using the multistage self-training approach. In this case, we use the trained model to generate pseudo-SI data. Our experimental results for several test sets demonstrated that our models trained using mixed fine-tuning and multistage self-training outperformed baselines across various latency ranges. |
| 書誌情報 |
en : Journal of Natural Language Processing
巻 32,
号 2,
p. 404-437,
ページ数 34,
発行日 2025-06-15
|
| 出版者 |
|
|
出版者 |
The Association for Natural Language Processing |
| ISSN |
|
|
収録物識別子タイプ |
EISSN |
|
収録物識別子 |
2185-8314 |
| 出版者版DOI |
|
|
関連タイプ |
isReplacedBy |
|
|
識別子タイプ |
DOI |
|
|
関連識別子 |
https://doi.org/10.5715/jnlp.32.404 |
| 出版者版URI |
|
|
関連タイプ |
isReplacedBy |
|
|
識別子タイプ |
URI |
|
|
関連識別子 |
https://www.jstage.jst.go.jp/article/jnlp/32/2/32_404/_article/-char/ja/ |
| 権利 |
|
|
権利情報Resource |
https://creativecommons.org/licenses/by/4.0/ |
|
権利情報 |
Yuka Ko, Ryo Fukuda, Yuta Nishikawa, Yasumasa Kano, Katsuhito Sudoh, Sakriani Sakti, and Satoshi Nakamura.(2025) End-to-end Simultaneous Speech Translation with Style Tags using Human Simultaneous Interpretation Data. 32:2, 404--437. (C) The Association for Natural Language Processing, (Licensed under CC BY 4.0)https://creativecommons.org/licenses/by/4.0/ |
| 著者版フラグ |
|
|
出版タイプ |
NA |
| 助成情報 |
|
|
|
助成機関名 |
Japan Society for the Promotion of Science (JSPS) |
|
|
研究課題番号 |
JP21H05054 |
|
|
研究課題番号URI |
https://kaken.nii.ac.jp/grant/KAKENHI-PROJECT-21H05054/ |
|
|
研究課題名 |
多元自動通訳システムと評価法に関する研究とその応用展開 |
| 助成情報 |
|
|
|
助成機関名 |
Japan Society for the Promotion of Science (JSPS) |
|
|
研究課題番号 |
JP23K21681 |
|
|
研究課題番号URI |
https://kaken.nii.ac.jp/grant/KAKENHI-PROJECT-23K21681/ |
|
|
研究課題名 |
言語の壁を超える低資源多言語Machine Speech Chain技術の構築 |
| 助成情報 |
|
|
|
助成機関名 |
Japan Society for the Promotion of Science (JSPS) |
|
|
研究課題番号 |
JP24KJ1695 |
|
|
研究課題番号URI |
https://kaken.nii.ac.jp/grant/KAKENHI-PROJECT-24KJ1695/ |
|
|
研究課題名 |
省略と言い換えによる訳出時間の短縮機能を備えた同時通訳機の研究 |