Character Embedding System
A Character Embedding System is a subword embedding system that implements a character embedding algorithm to solve a character embedding task (to map text characters into a vector representation).
- Context:
- It can be part of a Character Encoding System.
- It can range from being a Position-based Character Embedding System, to being Distributional-based Character Embedding System, to being a Cluster-based Character Embedding System.
- …
- Example(s):
- n-Gram Character Embedding System (Ling et al., 2015),
- a Character-Enhanced Word Embedding (CWE) System,
- a Flair Word Embedding System (Akbik et al., 2018),
- a Neural Character Embedding System,
- a Radical-Enhanced Chinese Character Embedding System,
- a Random Walk based Character Embedding System,
- a Santos-Zadrozny Character Embedding System (Santos & Zadrozny, 2014).
- …
- Counter-Example(s):
- See: Character-Level Seq2Seq Training System, Sentiment Analysis System, Feature Learning System, Natural Language Processing System, Vector, Real Numbers, Knowledge Representation, Vector Space, Neural Net Language Model, Dimensionality Reduction, co-Occurrence Matrix, Syntactic Parsing System.
References
2020
- (Wikipedia, 2020) ⇒ https://en.wikipedia.org/wiki/Word_embedding Retrieved:2020-3-6.
- Word embedding is the collective name for a set of language modeling and feature learning techniques in natural language processing (NLP) where words or phrases from the vocabulary are mapped to vectors of real numbers. Conceptually it involves a mathematical embedding from a space with many dimensions per word to a continuous vector space with a much lower dimension.
Methods to generate this mapping include neural networks, dimensionality reduction on the word co-occurrence matrix, probabilistic models, explainable knowledge base method, and explicit representation in terms of the context in which words appear. Word and phrase embeddings, when used as the underlying input representation, have been shown to boost the performance in NLP tasks such as syntactic parsing and sentiment analysis.
- Word embedding is the collective name for a set of language modeling and feature learning techniques in natural language processing (NLP) where words or phrases from the vocabulary are mapped to vectors of real numbers. Conceptually it involves a mathematical embedding from a space with many dimensions per word to a continuous vector space with a much lower dimension.
2019a
- (Jiang et al., 2019) ⇒ Zhuoren Jiang, Zhe Gao, Guoxiu He, Yangyang Kang, Changlong Sun, Qiong Zhang, Luo Si, and Xiaozhong Liu (2019). "Detect Camouflaged Spam Content via StoneSkipping: Graph and Text Joint Embedding for Chinese Character Variation Representation".In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). DOI:10.18653/v1/D19-1640.
- QUOTE: Figure 2 depicts the proposed SS model. There are three core modules in SS: a Chinese character variation graph to host the heterogeneous variation information; a variation family-enhanced graph embedding for Chinese character variation knowledge extraction and graph representation learning; an enhanced bidirectional language model for joint representation learning.
2019b
- (Cho et al., 2019) ⇒ Won-Ik Cho, Seok Min Kim, and Nam Soo Kim (2019). "Investigating an Effective Character-level Embedding in Korean Sentence Classification". ArXiv:1905.13656.
- QUOTE: For such cases where the conjuncts consist of the components representing consonant(s) and vowel, various character encoding schemes can be adopted beyond merely making up a one-hot vector. However, there has been little work done on intra-language comparison regarding performances using each representation. In this study, utilizing the Korean language which is character-rich and agglutinative, we investigate an encoding scheme that is the most effective among Jamo-level one-hot, character-level one-hot, character-level dense, and character-level multi-hot.
2018
- (Li et al., 2018) ⇒ Jiaming Liu, Chengquan Zhang, Yipeng Sun, Junyu Han, and Errui Ding (2018, December). "Detecting Text in the Wild with Deep Character Embedding Network". In: Proceedings of the 14th Asian Conference on Computer Vision (ACCV 2018).
- QUOTE: The character embedding subnet takes the residual convolution unit (RCU) as the basic blocks which is simplified residual block without batch normalization(...)
During inference, we extract confidence map, offset maps and embedding maps from the two heads of the model. After thresholding on the score map and performing NMS on character proposals, the embedding vectors are extracted by 1×1 RoI pooling on embedding map. In the end, we output character candidates with the format of
{score, coordinates$(x, y)$ of character center, width, height, 128D embedding vector }
. Characters are finally clustered into text blocks as the last post-processing step. The overall structure of the model and pipeline are shown in Fig. 2.
- QUOTE: The character embedding subnet takes the residual convolution unit (RCU) as the basic blocks which is simplified residual block without batch normalization(...)
2016
- (Lu et al., 2016) ⇒ Yanan Lu, Yue Zhang, and Donghong Ji (2016). "Multi-prototype Chinese Character Embedding". In: Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016).
- QUOTE: Chinese sentences are written as sequences of characters, which are elementary units of syntax and semantics. Characters are highly polysemous in forming words. We present a position-sensitive skip-gram model to learn multi-prototype Chinese character embeddings, and explore the usefulness of such character embeddings to Chinese NLP tasks.
2015
- (Chen et al., 2015) ⇒ Xinxiong Chen, Lei Xu, Zhiyuan Liu, Maosong Sun, and Huanbo Luan (2015, June). "Joint Learning Of Character And Word Embeddings". In: Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence (IJCAI 2015).
- QUOTE: We take advantages of both internal characters and external contexts, and propose a new model for joint learning of character and word embeddings, named as character-enhanced word embedding model (CWE). In CWE, we learn and maintain both word and character embeddings together. CWE can be easily integrated in word embedding models and one of the frameworks of CWE based on CBOW is shown in Fig. 1(B), where the word embeddings (blue boxes in figure) and character embeddings (green boxes) are composed together to get new embeddings (yellow boxes). The new embeddings perform the same role as the word embeddings in CBOW.
2014
- (Sun et al., 2014) ⇒ Yaming Sun, Lei Lin, Nan Yang, Zhenzhou Ji, and Xiaolong Wang (2014). "Radical-Enhanced Chinese Character Embedding". In: International Conference on Neural Information Processing (ICONIP 2014). DOI:10.1007/978-3-319-12640-1_34
- QUOTE: We present a method to leverage radical for learning Chinese character embedding. Radical is a semantic and phonetic component of Chinese character. It plays an important role as characters with the same radical usually have similar semantic meaning and grammatical usage (...)
Based on C&W model (Collobert et al., 2011), we present a radical-enhanced model, which utilizes both radical and context information of characters.
- QUOTE: We present a method to leverage radical for learning Chinese character embedding. Radical is a semantic and phonetic component of Chinese character. It plays an important role as characters with the same radical usually have similar semantic meaning and grammatical usage (...)