Transformer-based Vision Model: Difference between revisions

From GM-RKB
Jump to navigation Jump to search
(Created page with "A Transformer-based Vision Model is a vision model that is a transformer-based model. * <B>Counter-Example(s):</B> ** Transformer-based Language Model. * <B>See:</B> Pre-Trained Visual Encoder. ---- ---- == References == === 2023 === * GBard ** A Vision Transformer-based Model is a deep learning model that is inspired by the Transformer architecture, which was originally designed for natural language processing (NLP) tasks. ** Vision Transformers (V...")
 
No edit summary
 
(One intermediate revision by one other user not shown)
Line 2: Line 2:
* <B>Counter-Example(s):</B>
* <B>Counter-Example(s):</B>
** [[Transformer-based Language Model]].
** [[Transformer-based Language Model]].
* <B>See:</B> [[Pre-Trained Visual Encoder]].
* <B>See:</B> [[Pre-Trained Visual Encoder]], [[Language-Vision Multimodal Model]].
----
----
----
----
Line 22: Line 22:
----
----
__NOTOC__
__NOTOC__
[[Category:Concept]]

Latest revision as of 12:30, 16 October 2023

A Transformer-based Vision Model is a vision model that is a transformer-based model.



References

2023

  • GBard
    • A Vision Transformer-based Model is a deep learning model that is inspired by the Transformer architecture, which was originally designed for natural language processing (NLP) tasks.
    • Vision Transformers (ViTs) represent images as a sequence of patches, just like NLP models represent text as a sequence of words. This allows ViTs to learn long-range dependencies between different parts of an image, which is essential for many computer vision tasks, such as image classification and object detection.
    • ViTs have achieved state-of-the-art results on a variety of computer vision benchmarks, and they are quickly becoming the preferred model architecture for many tasks.
    • Here are some examples of Vision Transformer-based Models:
      • DeiT (Data-efficient Image Transformer)
      • ViLT (Vision and Language Transformer)
      • Swin Transformer
      • ConvMixer
      • Pyramid Vision Transformer