ログイン
Language:

WEKO3

  • トップ
  • ランキング
To
lat lon distance
To

Field does not validate



インデックスリンク

インデックスツリー

メールアドレスを入力してください。

WEKO

One fine body…

WEKO

One fine body…

アイテム

  1. 02 情報科学
  2. 01 学術雑誌論文

Japanese Neural Incremental Text-to-Speech Synthesis Framework With an Accent Phrase Input

http://hdl.handle.net/10061/0002000110
http://hdl.handle.net/10061/0002000110
b5828399-ab97-4aaa-a1c7-7255959db30f
アイテムタイプ 学術雑誌論文 / Journal Article(1)
公開日 2024-01-26
タイトル
タイトル Japanese Neural Incremental Text-to-Speech Synthesis Framework With an Accent Phrase Input
言語
言語 eng
キーワード
主題Scheme Other
主題 Incremental speech synthesis
キーワード
主題Scheme Other
主題 end-to-end
キーワード
主題Scheme Other
主題 Japanese language
キーワード
主題Scheme Other
主題 accent phrase unit
資源タイプ
資源タイプ journal article
アクセス権
アクセス権 open access
著者 Yanagita, Tomoya

× Yanagita, Tomoya

en Yanagita, Tomoya

Search repository
Sakti, Sakriani

× Sakti, Sakriani

en Sakti, Sakriani

Search repository
中村, 哲

× 中村, 哲

WEKO 171

ja 中村, 哲

ja-Kana ナカムラ, サトシ

en Nakamura, Satoshi

Search repository
抄録
内容記述タイプ Abstract
内容記述 Work in the development of neural incremental text-to-speech (iTTS), which is attracting increasing attention, has recently pursued low-latency processing by generating speech on the fly before reading complete sentences. Most current state-of-the-art iTTS systems use a prefix-to-prefix neural iTTS framework with look-ahead of 1-2 unit segments (i.e., phonemes or words). However, since the Japanese language is based on accent phrase units that are longer than words, using a prefix-to-prefix neural iTTS with a look-ahead approach increases latency. Here, we propose an alternative to the end-to-end neural iTTS architecture that does not apply look-ahead input when synthesizing speech chunks. We further propose a method to use information from the previous time step by connecting the synthesized vector and the model’s internal state to the current time step. We experimentally investigated the latency of various iTTS systems with different modeling and synthesis chunks. The experimental results show that, for Japanese, the proposed iTTS is able to synthesize better speech quality, with a similar latency range, than the conventional baseline prefix-to-prefix neural iTTS with word units. Moreover, we found that our proposed approach improved the prosodic naturalness among synthesized units in the Japanese language. Subjective evaluations also revealed that the proposed approach with an incremental unit of two accent phrases achieved the best scores in Japanese iTTS systems.
書誌情報 en : IEEE Access

巻 11, p. 22355-22363, 発行日 2023-03-02
出版者
出版者 Institute of Electrical and Electronics Engineers
ISSN
収録物識別子タイプ EISSN
収録物識別子 2169-3536
出版者版DOI
関連タイプ isReplacedBy
識別子タイプ DOI
関連識別子 https://doi.org/10.1109/ACCESS.2023.3251657
出版者版URI
関連タイプ isReplacedBy
識別子タイプ URI
関連識別子 https://ieeexplore.ieee.org/document/10057419
権利
権利情報Resource https://creativecommons.org/licenses/by-nc-nd/4.0/
権利情報 This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
著者版フラグ
出版タイプ NA
戻る
0
views
See details
Views

Versions

Ver.1 2024-01-26 07:38:44.518129
Show All versions

Share

Share
tweet

Cite as

Other

print

エクスポート

OAI-PMH
  • OAI-PMH JPCOAR 2.0
  • OAI-PMH JPCOAR 1.0
  • OAI-PMH DublinCore
  • OAI-PMH DDI
Other Formats
  • JSON
  • BIBTEX
  • ZIP

コミュニティ

確認

確認

確認


Powered by WEKO3


Powered by WEKO3