BERT-based Language Model
(Redirected from BERT-style model)
Jump to navigation
Jump to search
A BERT-based Language Model is an encoder-only transformer-based language model that is based on a BERT architecture and is produced by a BERT training system.
- Context:
- It can (often) be used by a BERT LM Inference System (that implements a BERT LM algorithm).
- It can range from being a base pre-trained BERT model designed for generic tasks to a fine-tuned BERT model that is specialized for specific domains or tasks, enhancing its applicability and effectiveness.
- It can range from being a Unilingual BERT Model to being a Multilingual BERT Model.
- …
- Example(s):
- an English BERT LM, a Chinese BERT LM, a Arabic BERT LM, ...
- a BERT-Base, Uncased LM with:
"attention_probs_dropout_prob": 0.1, "hidden_act": "gelu", "hidden_dropout_prob": 0.1, "hidden_size": 768, "initializer_range": 0.02,
[1].
"intermediate_size": 3072, "max_position_embeddings": 512,
"num_attention_heads": 12, "num_hidden_layers": 12,
"type_vocab_size": 2, "vocab_size": 30522 - BERT-Large, Cased LM (24-layer, 1024-hidden, 16-heads, 340M parameters),
- Domain-specific models, such as: a BioBERT, and a PubMedBERT.
- …
- Counter-Example(s):
- a DistilBERT LM (Sanh et al, 2019).
- a Autoregressive LLM, such as: GPT-2, GPT-3.
- a Turing-NLG LM.
- See: BERT-based SQuAD, BERT-based MultiNLI, BERT-based MRPC, BERTScore, DistilBERT.
References
2023
- https://www.marktechpost.com/2023/01/31/microsoft-research-proposes-biogpt-a-domain-specific-generative-transformer-language-model-pre-trained-on-large-scale-biomedical-literature/
- QUOTE: ... BioBERT and PubMedBERT are two of the most well-known pre-trained language models in the biomedical industry that have achieved superior performance compared to other general pre-trained models on biomedical text.
However, the majority of current research makes use of BERT models, which are more suitable for comprehension tasks as compared to generation tasks. While GPT models have proven adept at generating tasks, their performance in the biomedical area has yet to be fully scrutinized. ...
- QUOTE: ... BioBERT and PubMedBERT are two of the most well-known pre-trained language models in the biomedical industry that have achieved superior performance compared to other general pre-trained models on biomedical text.
2022
- (Luo et al., 2022) ⇒ Renqian Luo, Liai Sun, Yingce Xia, Tao Qin, Sheng Zhang, Hoifung Poon, and Tie-Yan Liu. (2022). “BioGPT: Generative Pre-trained Transformer for Biomedical Text Generation and Mining.” In: Briefings in Bioinformatics, 23(6). doi:10.1093/bib/bbac409
- ABSTRACT: Pre-trained language models have attracted increasing attention in the biomedical domain, inspired by their great success in the general natural language domain. Among the two main branches of pre-trained language models in the general language domain, i.e. BERT (and its variants) and GPT (and its variants), the first one has been extensively studied in the biomedical domain, such as BioBERT and PubMedBERT.
2018
- https://github.com/google-research/bert#pre-trained-models
- The links to the models are here (right-click, 'Save link as...' on the name):
BERT-Large, Uncased (Whole Word Masking)
: 24-layer, 1024-hidden, 16-heads, 340M parametersBERT-Large, Cased (Whole Word Masking)
: 24-layer, 1024-hidden, 16-heads, 340M parametersBERT-Base, Uncased
: 12-layer, 768-hidden, 12-heads, 110M parametersBERT-Large, Uncased
: 24-layer, 1024-hidden, 16-heads, 340M parametersBERT-Base, Cased
: 12-layer, 768-hidden, 12-heads , 110M parametersBERT-Large, Cased
: 24-layer, 1024-hidden, 16-heads, 340M parametersBERT-Base, Multilingual Cased (New, recommended)
: 104 languages, 12-layer, 768-hidden, 12-heads, 110M parametersBERT-Base, Multilingual Uncased (Orig, not recommended)
(Not recommended, useMultilingual Cased
instead): 102 languages, 12-layer, 768-hidden, 12-heads, 110M parametersBERT-Base, Chinese
: Chinese Simplified and Traditional, 12-layer, 768-hidden, 12-heads, 110M parameters
- The links to the models are here (right-click, 'Save link as...' on the name):