WEKO3
アイテム
LATTE: Lattice ATTentive Encoding for Character-based Word Segmentation
http://hdl.handle.net/10061/0002000593
http://hdl.handle.net/10061/0002000593fbc9ff3c-5922-4f5d-8c41-8b554da5157b
| アイテムタイプ | 学術雑誌論文 / Journal Article(1) | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 公開日 | 2024-10-17 | |||||||||||
| タイトル | ||||||||||||
| タイトル | LATTE: Lattice ATTentive Encoding for Character-based Word Segmentation | |||||||||||
| 言語 | ||||||||||||
| 言語 | eng | |||||||||||
| キーワード | ||||||||||||
| 主題Scheme | Other | |||||||||||
| 主題 | Word Segmentation | |||||||||||
| キーワード | ||||||||||||
| 主題Scheme | Other | |||||||||||
| 主題 | Representation Learning | |||||||||||
| 資源タイプ | ||||||||||||
| 資源タイプ | journal article | |||||||||||
| アクセス権 | ||||||||||||
| アクセス権 | open access | |||||||||||
| 著者 |
Chay-intr,Thodsaporn
× Chay-intr,Thodsaporn
× 上垣外, 英剛× Funakoshi,Kotaro
× Okumura,Manabu
|
|||||||||||
| 抄録 | ||||||||||||
| 内容記述タイプ | Abstract | |||||||||||
| 内容記述 | A character sequence comprises at least one or more segmentation alternatives. This can be considered segmentation ambiguity and may weaken segmentation performance in word segmentation. Proper handling of such ambiguity lessens ambiguous decisions on word boundaries. Previous works have achieved remarkable segmentation performance and alleviated the ambiguity problem by incorporating the lattice, owing to its ability to capture segmentation alternatives, along with graph-based and pre-trained models. However, multiple granularity information, including character and word, in a lattice that encodes with such models may not be attentively exploited. To strengthen multi-granularity representations in a lattice, we propose the Lattice ATTentive Encoding (LATTE) method for character-based word segmentation. Our model employs the lattice structure to handle segmentation alternatives and utilizes graph neural networks along with an attention mechanism to attentively extract multi-granularity representation from the lattice for complementing character representations. Our experimental results demonstrated improvements in segmentation performance on the BCCWJ, CTB6, and BEST2010 datasets in three languages, particularly Japanese, Chinese, and Thai. | |||||||||||
| 書誌情報 |
ja : 自然言語処理 巻 30, 号 2, p. 456-488, 発行日 2023-06-15 |
|||||||||||
| 出版者 | ||||||||||||
| 出版者 | 一般社団法人言語処理学会 | |||||||||||
| ISSN | ||||||||||||
| 収録物識別子タイプ | EISSN | |||||||||||
| 収録物識別子 | 2185-8314 | |||||||||||
| 出版者版DOI | ||||||||||||
| 関連タイプ | isReplacedBy | |||||||||||
| 識別子タイプ | DOI | |||||||||||
| 関連識別子 | https://doi.org/10.5715/jnlp.30.456 | |||||||||||
| 出版者版URI | ||||||||||||
| 関連タイプ | isReplacedBy | |||||||||||
| 識別子タイプ | URI | |||||||||||
| 関連識別子 | https://www.jstage.jst.go.jp/article/jnlp/30/2/30_456/_article/-char/ja/ | |||||||||||
| 権利 | ||||||||||||
| 権利情報Resource | https://creativecommons.org/licenses/by/4.0/ | |||||||||||
| 権利情報 | $00A9 2023 The Association for Natural Language Processing. Licensed under CC BY 4.0 https://creativecommons.org/licenses/by/4.0/ | |||||||||||
| 著者版フラグ | ||||||||||||
| 出版タイプ | NA | |||||||||||