WEKO3
アイテム
{"_buckets": {"deposit": "7686f553-ab32-4a06-853c-6706281f762f"}, "_deposit": {"created_by": 4, "id": "4828", "owners": [4], "pid": {"revision_id": 0, "type": "depid", "value": "4828"}, "status": "published"}, "_oai": {"id": "oai:naist.repo.nii.ac.jp:00004828", "sets": ["36"]}, "author_link": ["12877", "341", "12878", "12879"], "item_1698715929687": {"attribute_name": "会議情報", "attribute_value_mlt": [{"subitem_conference_country": "FRA", "subitem_conference_date": {"subitem_conference_date_language": "en", "subitem_conference_period": "May 20, 2006"}, "subitem_conference_names": [{"subitem_conference_name": "SRIV 2006: ITRW on Speech Recognition and Intrinsic Variatioon", "subitem_conference_name_language": "en"}], "subitem_conference_places": [{"subitem_conference_place": "Toulouse", "subitem_conference_place_language": "en"}]}]}, "item_9_biblio_info_7": {"attribute_name": "書誌情報", "attribute_value_mlt": [{"bibliographicIssueDates": {"bibliographicIssueDate": "2006-05", "bibliographicIssueDateType": "Issued"}, "bibliographicPageEnd": "76", "bibliographicPageStart": "71"}]}, "item_9_description_5": {"attribute_name": "抄録", "attribute_value_mlt": [{"subitem_description": "The construction of acoustic models for speech recognition systems is a very costly and time-consuming process, since their robust training requires large amounts of transcribed speech data. This paper describes an approach for costeffective construction of task-adapted acoustic models. Existing speech data(bases) are employed to set up a large training data pool. Apart from that, only a small amount of taskspecific speech data is required. Based on an algorithm for utterance-based selective training of acoustic models, training utterances are selected from the training data pool so that the likelihood of the acoustic model given the task-specific speech data is maximized. The proposed method is evaluated for acoustic models with context-independent and contextdependent phonetic units. Results are reported for building an infant (preschool children) acoustic model with speech from elementary school children and an elderly acoustic model with adult speech. The proposed approach is already effective if there are only 20 task-specific utterances available. A relative improvement in word accuracy of up to 10% is achieved over conventional acoustic model construction and up to 2.8% over MAP and MLLR adaptation with task-specific data. The gap in performance to a high-cost acoustic model can be reduced up to 76%.", "subitem_description_language": "en", "subitem_description_type": "Abstract"}]}, "item_9_text_21": {"attribute_name": "NAIST ID", "attribute_value_mlt": [{"subitem_text_value": "73292716"}]}, "item_9_version_type_16": {"attribute_name": "著者版フラグ", "attribute_value_mlt": [{"subitem_version_resource": "http://purl.org/coar/version/c_970fb48d4fbd8a85", "subitem_version_type": "VoR"}]}, "item_access_right": {"attribute_name": "アクセス権", "attribute_value_mlt": [{"subitem_access_right": "open access", "subitem_access_right_uri": "http://purl.org/coar/access_right/c_abf2"}]}, "item_creator": {"attribute_name": "著者", "attribute_type": "creator", "attribute_value_mlt": [{"creatorNames": [{"creatorName": "Cincarek, Tobias", "creatorNameLang": "en"}], "nameIdentifiers": [{"nameIdentifier": "12877", "nameIdentifierScheme": "WEKO"}]}, {"creatorNames": [{"creatorName": "Toda, Tomoki", "creatorNameLang": "en"}], "nameIdentifiers": [{"nameIdentifier": "341", "nameIdentifierScheme": "WEKO"}, {"nameIdentifier": "90403328", "nameIdentifierScheme": "e-Rad", "nameIdentifierURI": "https://kaken.nii.ac.jp/ja/search/?qm=90403328"}]}, {"creatorNames": [{"creatorName": "Saruwatari, Hiroshi", "creatorNameLang": "en"}], "nameIdentifiers": [{"nameIdentifier": "12878", "nameIdentifierScheme": "WEKO"}]}, {"creatorNames": [{"creatorName": "Kiyohiro, Shikano", "creatorNameLang": "en"}], "nameIdentifiers": [{"nameIdentifier": "12879", "nameIdentifierScheme": "WEKO"}]}]}, "item_files": {"attribute_name": "ファイル情報", "attribute_type": "file", "attribute_value_mlt": [{"accessrole": "open_date", "date": [{"dateType": "Available", "dateValue": "2023-03-02"}], "displaytype": "detail", "download_preview_message": "", "file_order": 0, "filename": "SRIV_2006_71.pdf", "filesize": [{"value": "191.5 kB"}], "format": "application/pdf", "future_date_message": "", "is_thumbnail": false, "licensetype": "license_note", "mimetype": "application/pdf", "size": 191500.0, "url": {"label": "fulltext", "objectType": "fulltext", "url": "https://naist.repo.nii.ac.jp/record/4828/files/SRIV_2006_71.pdf"}, "version_id": "1c668433-4555-4ace-ba0e-2cb503ba8e45"}]}, "item_language": {"attribute_name": "言語", "attribute_value_mlt": [{"subitem_language": "eng"}]}, "item_resource_type": {"attribute_name": "資源タイプ", "attribute_value_mlt": [{"resourcetype": "conference paper", "resourceuri": "http://purl.org/coar/resource_type/c_5794"}]}, "item_title": "Utterance-based Selective Training for Cost-Effective Task-Adaptation of Acoustic Models", "item_titles": {"attribute_name": "タイトル", "attribute_value_mlt": [{"subitem_title": "Utterance-based Selective Training for Cost-Effective Task-Adaptation of Acoustic Models", "subitem_title_language": "en"}]}, "item_type_id": "9", "owner": "4", "path": ["36"], "permalink_uri": "http://hdl.handle.net/10061/8260", "pubdate": {"attribute_name": "PubDate", "attribute_value": "2012-08-22"}, "publish_date": "2012-08-22", "publish_status": "0", "recid": "4828", "relation": {}, "relation_version_is_last": true, "title": ["Utterance-based Selective Training for Cost-Effective Task-Adaptation of Acoustic Models"], "weko_shared_id": -1}
Utterance-based Selective Training for Cost-Effective Task-Adaptation of Acoustic Models
http://hdl.handle.net/10061/8260
http://hdl.handle.net/10061/8260d5ab297b-1e63-475e-b3f7-5abe2013b7d8
名前 / ファイル | ライセンス | アクション |
---|---|---|
fulltext (191.5 kB)
|
|
Item type | 会議発表論文 / Conference Paper(1) | |||||
---|---|---|---|---|---|---|
公開日 | 2012-08-22 | |||||
タイトル | ||||||
タイトル | Utterance-based Selective Training for Cost-Effective Task-Adaptation of Acoustic Models | |||||
言語 | ||||||
言語 | eng | |||||
資源タイプ | ||||||
資源タイプ | conference paper | |||||
アクセス権 | ||||||
アクセス権 | open access | |||||
著者 |
Cincarek, Tobias
× Cincarek, Tobias× Toda, Tomoki× Saruwatari, Hiroshi× Kiyohiro, Shikano |
|||||
抄録 | ||||||
内容記述タイプ | Abstract | |||||
内容記述 | The construction of acoustic models for speech recognition systems is a very costly and time-consuming process, since their robust training requires large amounts of transcribed speech data. This paper describes an approach for costeffective construction of task-adapted acoustic models. Existing speech data(bases) are employed to set up a large training data pool. Apart from that, only a small amount of taskspecific speech data is required. Based on an algorithm for utterance-based selective training of acoustic models, training utterances are selected from the training data pool so that the likelihood of the acoustic model given the task-specific speech data is maximized. The proposed method is evaluated for acoustic models with context-independent and contextdependent phonetic units. Results are reported for building an infant (preschool children) acoustic model with speech from elementary school children and an elderly acoustic model with adult speech. The proposed approach is already effective if there are only 20 task-specific utterances available. A relative improvement in word accuracy of up to 10% is achieved over conventional acoustic model construction and up to 2.8% over MAP and MLLR adaptation with task-specific data. The gap in performance to a high-cost acoustic model can be reduced up to 76%. | |||||
書誌情報 |
p. 71-76, 発行日 2006-05 |
|||||
会議情報 | ||||||
会議名 | SRIV 2006: ITRW on Speech Recognition and Intrinsic Variatioon | |||||
開催期間 | May 20, 2006 | |||||
開催地 | Toulouse | |||||
開催国 | FRA | |||||
著者版フラグ | ||||||
出版タイプ | VoR |