| アイテムタイプ |
学術雑誌論文 / Journal Article(1) |
| 公開日 |
2025-12-24 |
| タイトル |
|
|
タイトル |
Toward fast meeting transcription: NAIST system for CHiME-8 NOTSOFAR-1 task and its analysis |
| 言語 |
|
|
言語 |
eng |
| キーワード |
|
|
主題Scheme |
Other |
|
主題 |
CHiME-8 |
| キーワード |
|
|
主題Scheme |
Other |
|
主題 |
Meeting transcription |
| キーワード |
|
|
主題Scheme |
Other |
|
主題 |
Multi-talker speech recognition |
| 資源タイプ |
|
|
資源タイプ |
journal article |
| アクセス権 |
|
|
アクセス権 |
open access |
| 著者 |
Hirano, Yuta
Nguyen, Mau
Azuma, Kakeru
Saragih, Jan Meyer
Sakti, Sakriani
|
| 抄録 |
|
|
内容記述タイプ |
Abstract |
|
内容記述 |
This paper reports on the NAIST system submitted to the CHIME-8 challenge’s NOTSOFAR-1 (Natural Office Talkers in Settings of Far-field Audio Recordings) task, including results and analyses from several additional experiments. While fast processing is crucial for real-world applications, the CHIME-7 challenge focused solely on reducing error rate, neglecting the practical aspects of system performance such as inference speed. Therefore, this research aims to develop a practical system by improving recognition accuracy while simultaneously reducing inference speed. To address this challenge, we propose enhancing the baseline module architecture by modifying both the CSS and ASR modules. Specifically, the ASR module was built based on a WavLM large feature extractor and a Zipformer transducer. Furthermore, we employed reverberation removal using block-wise weighted prediction error (WPE) as preprocessing for the speech separation module. The proposed system achieved a relative reduction in tcpWER of 11.6% for single-channel tracks and 18.7% for multi-channel tracks compared to the baseline system. Moreover, the proposed system operates up to six times faster than the baseline system while achieving superior tcpWER results. We also report on the observed changes in system performance due to variations in the amount of training data for the ASR model, as well the impact of the maximum word-length setting in the transducer-based ASR module on the subsequent diarization system, based on findings from our system development. |
| 書誌情報 |
en : Computer Speech & Language
巻 95,
p. 1-13,
ページ数 13,
発行日 2025-07-16
|
| 出版者 |
|
|
出版者 |
Elsevier |
| ISSN |
|
|
収録物識別子タイプ |
EISSN |
|
収録物識別子 |
0885-2308 |
| 出版者版DOI |
|
|
関連タイプ |
isReplacedBy |
|
|
識別子タイプ |
DOI |
|
|
関連識別子 |
https://doi.org/10.1016/j.csl.2025.101836 |
| 出版者版URI |
|
|
関連タイプ |
isReplacedBy |
|
|
識別子タイプ |
URI |
|
|
関連識別子 |
https://www.sciencedirect.com/science/article/pii/S0885230825000610 |
| 権利 |
|
|
権利情報Resource |
https://creativecommons.org/licenses/by-nc/4.0/ |
|
権利情報 |
© 2025 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC license(http://creativecommons.org/licenses/by-nc/4.0/). |
| 著者版フラグ |
|
|
出版タイプ |
NA |
| 助成情報 |
|
|
|
助成機関名 |
Japan Society for the Promotion of Science (JSPS) |
|
|
研究課題番号 |
JP21H05054 |
|
|
研究課題番号URI |
https://kaken.nii.ac.jp/grant/KAKENHI-PROJECT-21H05054/ |
|
|
研究課題名 |
多元自動通訳システムと評価法に関する研究とその応用展開 |
| 助成情報 |
|
|
|
助成機関名 |
Japan Society for the Promotion of Science (JSPS) |
|
|
研究課題番号 |
JP23K21681 |
|
|
研究課題番号URI |
https://kaken.nii.ac.jp/grant/KAKENHI-PROJECT-23K21681/ |
|
|
研究課題名 |
言語の壁を超える低資源多言語Machine Speech Chain技術の構築 |