Nvidia Ampere Microarchitecture

Context:
- It can be a successor to both the Volta and Turing architectures.
- ...
See: Hopper (Microarchitecture), Nvidia, TSMC, Samsung Electronics, 7 nm Process, 10 nm Process, GeForce 30 Series, DirectX#DirectX 12 Ultimate, Direct3D#Direct3D 12, High-Level Shader Language, OpenCL#OpenCL 3.0, OpenGL#OpenGL 4.6.

References

(Wikipedia, 2023) ⇒ https://en.wikipedia.org/wiki/Ampere_(microarchitecture) Retrieved:2023-5-8.
- Ampere is the codename for a graphics processing unit (GPU) microarchitecture developed by Nvidia as the successor to both the Volta and Turing architectures. It was officially announced on May 14, 2020 and is named after French mathematician and physicist André-Marie Ampère. Nvidia announced the Ampere architecture GeForce 30 series consumer GPUs at a GeForce Special Event on September 1, 2020. Nvidia announced the A100 80GB GPU at SC20 on November 16, 2020. Mobile RTX graphics cards and the RTX 3060 based on the Ampere architecture were revealed on January 12, 2021.
  Nvidia announced Ampere's successor, Hopper, at GTC 2022, and "Ampere Next Next" for a 2024 release at GPU Technology Conference 2021.

Comparison of Compute Capability: GP100 vs GV100 vs GA100^[2]

GPU features	NVIDIA Tesla P100	NVIDIA Tesla V100	NVIDIA A100
GPU codename	GP100	GV100	GA100
GPU architecture	NVIDIA Pascal	NVIDIA Volta	NVIDIA Ampere
Compute capability	6.0	7.0	8.0
Threads / warp	32	32	32
Max warps / SM	64	64	64
Max threads / SM	2048	2048	2048
Max thread blocks / SM	32	32	32
Max 32-bit registers / SM	65536	65536	65536
Max registers / block	65536	65536	65536
Max registers / thread	255	255	255
Max thread block size	1024	1024	1024
FP32 cores / SM	64	64	64
Ratio of SM registers to FP32 cores	1024	1024	1024
Shared Memory Size / SM	64 KB	Configurable up to 96 KB	Configurable up to 164 KB

Comparison of Precision Support Matrix^[3]^[4]

	FP16	FP32	FP64	INT1	INT4	INT8	TF32	BF16	FP16	FP32	FP64	INT1	INT4	INT8	TF32	BF16
	Supported CUDA Core Precisions								Supported Tensor Core Precisions
NVIDIA Tesla P4	No	Yes	Yes	No	No	Yes	No	No	No	No	No	No	No	No	No	No
NVIDIA P100	Yes	Yes	Yes	No	No	No	No	No	No	No	No	No	No	No	No	No
NVIDIA Volta	Yes	Yes	Yes	No	No	Yes	No	No	Yes	No	No	No	No	No	No	No
NVIDIA Turing	Yes	Yes	Yes	No	No	Yes	No	No	Yes	No	No	Yes	Yes	Yes	No	No
NVIDIA A100	Yes	Yes	Yes	No	No	Yes	No	Yes	Yes	No	Yes	Yes	Yes	Yes	Yes	Yes

Legend:

Comparison of Decode Performance