ログイン
Language:

WEKO3

  • トップ
  • ランキング
To
lat lon distance
To

Field does not validate



インデックスリンク

インデックスツリー

メールアドレスを入力してください。

WEKO

One fine body…

WEKO

One fine body…

アイテム

  1. 02 情報科学
  2. 01 学術雑誌論文

Zero-Shot Cross-Lingual Text-to-Speech With Style-Enhanced Normalization and Auditory Feedback Training Mechanism

http://hdl.handle.net/10061/0002001296
http://hdl.handle.net/10061/0002001296
c17fb058-8422-454d-9504-8c23d0e0d0a8
アイテムタイプ 学術雑誌論文 / Journal Article(1)
公開日 2025-12-16
タイトル
タイトル Zero-Shot Cross-Lingual Text-to-Speech With Style-Enhanced Normalization and Auditory Feedback Training Mechanism
言語
言語 eng
キーワード
主題Scheme Other
主題 Adaptation models
キーワード
主題Scheme Other
主題 Data models
キーワード
主題Scheme Other
主題 Training
キーワード
主題Scheme Other
主題 Multilingual
キーワード
主題Scheme Other
主題 Diffusion models
キーワード
主題Scheme Other
主題 Decoding
キーワード
主題Scheme Other
主題 Data mining
キーワード
主題Scheme Other
主題 Vectors
キーワード
主題Scheme Other
主題 Speech enhancement
キーワード
主題Scheme Other
主題 Text to speech
キーワード
主題Scheme Other
主題 Zero-shot adaptive TTS
キーワード
主題Scheme Other
主題 cross-lingual TTS
キーワード
主題Scheme Other
主題 diffusion model
キーワード
主題Scheme Other
主題 high-resource languages
キーワード
主題Scheme Other
主題 low-resource languages
資源タイプ
資源タイプ journal article
アクセス権
アクセス権 open access
著者 Tran, Chung

× Tran, Chung

en Tran, Chung

Search repository
Luong, Chi Mai

× Luong, Chi Mai

en Luong, Chi Mai

Search repository
Sakti, Sakriani

× Sakti, Sakriani

en Sakti, Sakriani

Search repository
抄録
内容記述タイプ Abstract
内容記述 In an increasingly globalized and interconnected world, the ability to communicate in more than one language is a vital skill that can reduce language barriers and promote cultural interaction. However, mastering multiple languages requires a significant investment of time and effort. Here, zero-shot cross-lingual text-to-speech synthesis (TTS) offers benefits to augment human communication by producing high-quality speech in multiple languages while preserving the original speaker's vocal characteristics. However, building such a system presents several challenges, including ensuring high-quality synthesis and achieving similarity between the synthesized speaker and the reference speaker, especially when training a model for low-resource languages. In this study, we propose a novel technique known as Style-Enhanced Normalization TTS (STEN-TTS) to achieve two objectives: preserving synthesis quality while simultaneously enhancing the ability of zero-shot adaptation with just a few seconds of reference for the purpose of cross-lingual synthesis. The model itself can also be trained with low-resource data, but using data of only 10 or 20 minutes is a major challenge. To improve the quality of synthesized audio in low-resource languages, we propose a combination of STEN-TTS with different training methods, including unsupervised text encoding, knowledge distillation, and an auditory feedback mechanism. An experimental evaluation was conducted in five languages (English, Chinese, Indonesian, Japanese, and Vietnamese), considering high- and low-resource training data as well as seen and unseen speakers. The proposed approach has shown its effectiveness in a high-resource setting, achieving a remarkable similarity (SMOS) of 3.44±0.17 for cross-lingual conversion as well as verification scores of 93.4% and 80.5% for seen and unseen speakers, respectively. The results in a low-resource setting, measured by phoneme error rates, also indicate a substantial improvement, with enhancements of approximately 3-4% . In this case, the quality of speaker verification remains consistently high, achieving scores of 90.0% and 78.0% for seen and unseen speakers.
書誌情報 en : IEEE Transactions on Audio, Speech and Language Processing

巻 33, p. 1479-1492, ページ数 14, 発行日 2025-03-05
出版者
出版者 IEEE
ISSN
収録物識別子タイプ EISSN
収録物識別子 2998-4173
出版者版DOI
関連タイプ isReplacedBy
識別子タイプ DOI
関連識別子 https://doi.org/10.1109/TASLPRO.2025.3548429
出版者版URI
関連タイプ isReplacedBy
識別子タイプ URI
関連識別子 https://ieeexplore.ieee.org/abstract/document/10910244
権利
権利情報Resource https://creativecommons.org/licenses/by/4.0/
権利情報 © 2025 The Authors. This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
著者版フラグ
出版タイプ NA
助成情報
助成機関名 Japan Society for the Promotion of Science (JSPS)
研究課題番号 JP21H05054
研究課題番号URI https://kaken.nii.ac.jp/grant/KAKENHI-PROJECT-21H05054/
研究課題名 多元自動通訳システムと評価法に関する研究とその応用展開
助成情報
助成機関名 Japan Society for the Promotion of Science (JSPS)
研究課題番号 JP23K21681
研究課題番号URI https://kaken.nii.ac.jp/grant/KAKENHI-PROJECT-23K21681/
研究課題名 言語の壁を超える低資源多言語Machine Speech Chain技術の構築
戻る
0
views
See details
Views

Versions

Ver.1 2025-12-16 04:50:55.847489
Show All versions

Share

Share
tweet

Cite as

Other

print

エクスポート

OAI-PMH
  • OAI-PMH JPCOAR 2.0
  • OAI-PMH JPCOAR 1.0
  • OAI-PMH DublinCore
  • OAI-PMH DDI
Other Formats
  • JSON
  • BIBTEX
  • ZIP

コミュニティ

確認

確認

確認


Powered by WEKO3


Powered by WEKO3