BERT-based Language Model

Context:
- It can (often) be used by a BERT LM Inference System (that implements a BERT LM algorithm).
- It can range from being a base pre-trained BERT model designed for generic tasks to a fine-tuned BERT model that is specialized for specific domains or tasks, enhancing its applicability and effectiveness.
- It can range from being a Unilingual BERT Model to being a Multilingual BERT Model.
- …
Example(s):
- an English BERT LM, a Chinese BERT LM, a Arabic BERT LM, ...
- a BERT-Base, Uncased LM with:
  "attention_probs_dropout_prob": 0.1, "hidden_act": "gelu", "hidden_dropout_prob": 0.1, "hidden_size": 768, "initializer_range": 0.02, "intermediate_size": 3072, "max_position_embeddings": 512, "num_attention_heads": 12, "num_hidden_layers": 12, "type_vocab_size": 2, "vocab_size": 30522 [1].
- BERT-Large, Cased LM (24-layer, 1024-hidden, 16-heads, 340M parameters),
- Domain-specific models, such as: a BioBERT, and a PubMedBERT.
- …
Counter-Example(s):
- a DistilBERT LM (Sanh et al, 2019).
- a Autoregressive LLM, such as: GPT-2, GPT-3.
- a Turing-NLG LM.
See: BERT-based SQuAD, BERT-based MultiNLI, BERT-based MRPC, BERTScore, DistilBERT.

References

(Luo et al., 2022) ⇒ Renqian Luo, Liai Sun, Yingce Xia, Tao Qin, Sheng Zhang, Hoifung Poon, and Tie-Yan Liu. (2022). “BioGPT: Generative Pre-trained Transformer for Biomedical Text Generation and Mining.” In: Briefings in Bioinformatics, 23(6). doi:10.1093/bib/bbac409
- ABSTRACT: Pre-trained language models have attracted increasing attention in the biomedical domain, inspired by their great success in the general natural language domain. Among the two main branches of pre-trained language models in the general language domain, i.e. BERT (and its variants) and GPT (and its variants), the first one has been extensively studied in the biomedical domain, such as BioBERT and PubMedBERT.