ログイン
Language:

WEKO3

  • トップ
  • ランキング
To
lat lon distance
To

Field does not validate



インデックスリンク

インデックスツリー

メールアドレスを入力してください。

WEKO

One fine body…

WEKO

One fine body…

アイテム

  1. 02 情報科学
  2. 01 学術雑誌論文

Improving Speech Translation Accuracy and Time Efficiency With Fine-Tuned wav2vec 2.0-Based Speech Segmentation

http://hdl.handle.net/10061/0002000125
http://hdl.handle.net/10061/0002000125
0491d761-00d3-4012-bd40-e045f3c509fe
名前 / ファイル ライセンス アクション
paper_202312_TASLP_ryo-fu.paper_20240130_182746_mE.pdf fulltext (2.4 MB)
アイテムタイプ 学術雑誌論文 / Journal Article(1)
公開日 2024-02-13
タイトル
タイトル Improving Speech Translation Accuracy and Time Efficiency With Fine-Tuned wav2vec 2.0-Based Speech Segmentation
言語
言語 eng
資源タイプ
資源タイプ journal article
アクセス権
アクセス権 open access
著者 Fukuda, Ryo

× Fukuda, Ryo

en Fukuda, Ryo

Search repository
須藤, 克仁

× 須藤, 克仁

WEKO 174
e-Rad_Researcher 00396152

ja 須藤, 克仁

ja-Kana スドウ, カツヒト

en Sudoh, Katsuhito

Search repository
中村, 哲

× 中村, 哲

WEKO 171

ja 中村, 哲

ja-Kana ナカムラ, サトシ

en Nakamura, Satoshi

Search repository
抄録
内容記述タイプ Abstract
内容記述 Speech translation (ST) automatically converts utterances in a source language into text in another language. Splitting continuous speech into shorter segments, known as speech segmentation, plays an important role in ST. Recent segmentation methods trained to mimic the segmentation of ST corpora have surpassed traditional approaches. Tsiamas et al. [1] proposed a segmentation frame classifier (SFC) based on a pre-trained speech encoder called wav2vec 2.0. Their method, named SHAS, retains 95$201398% of the BLEU score for ST corpus segmentation. However, the segments generated by SHAS are very different from ST corpus segmentation and tend to be longer with multiple combined utterances. This is due to SHAS's reliance on length heuristics, i.e., it splits speech into segments of easily translatable length without fully considering the potential for ST improvement by splitting them into even shorter segments. Longer segments often degrade translation quality and ST's time efficiency. In this study, we extended SHAS to improve ST translation accuracy and efficiency by splitting speech into shorter segments that correspond to sentences. We introduced a simple segmentation avlgorithm using the moving average of SFC predictions without relying on length heuristics and explored wav2vec 2.0 fine-tuning for improved speech segmentation prediction. Our experimental results reveal that our speech segmentation method significantly improved the quality and the time efficiency of speech translation compared to SHAS.
書誌情報 en : IEEE/ACM Transactions on Audio, Speech, and Language Processing

巻 32, p. 906-916, 発行日 2023-12-15
出版者
出版者 IEEE
ISSN
収録物識別子タイプ EISSN
収録物識別子 2329-9304
出版者版DOI
関連タイプ isVersionOf
識別子タイプ DOI
関連識別子 https://doi.org/10.1109/TASLP.2023.3343614
出版者版URI
関連タイプ isVersionOf
識別子タイプ URI
関連識別子 https://ieeexplore.ieee.org/document/10361556
権利
権利情報 $00A9 2023 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. 出版社許諾条件により、本文は2025年12月15日以降に公開
著者版フラグ
出版タイプ AM
戻る
0
views
See details
Views

Versions

Ver.1 2024-02-13 02:16:24.559996
Show All versions

Share

Share
tweet

Cite as

Other

print

エクスポート

OAI-PMH
  • OAI-PMH JPCOAR 2.0
  • OAI-PMH JPCOAR 1.0
  • OAI-PMH DublinCore
  • OAI-PMH DDI
Other Formats
  • JSON
  • BIBTEX
  • ZIP

コミュニティ

確認

確認

確認


Powered by WEKO3


Powered by WEKO3