Ampere (microarchitecture)

(Redirected from Nvidia A100)

Ampere is the codename for a graphics processing unit (GPU) microarchitecture developed by Nvidia as the successor to both the Volta and Turing architectures. It was officially announced on May 14, 2020 and is named after French mathematician and physicist André-Marie Ampère.[1][2]

Ampere
LaunchedMay 14, 2020; 4 years ago (2020-05-14)
Designed byNvidia
Manufactured by
Fabrication processTSMC N7 (professional)
Samsung 8N (consumer)
Codename(s)GA10x
Product Series
Desktop
Professional/workstation
  • RTX A series
Server/datacenter
  • A100
Specifications
L1 cache192 KB per SM (professional)
128 KB per SM (consumer)
L2 cache2 MB to 6 MB
Memory support
PCIe supportPCIe 4.0
Supported Graphics APIs
DirectXDirectX 12 Ultimate (Feature Level 12_2)
Direct3DDirect3D 12.0
Shader ModelShader Model 6.8
OpenCLOpenCL 3.0
OpenGLOpenGL 4.6
CUDACompute Capability 8.6
VulkanVulkan 1.3
Media Engine
Encode codecs
Decode codecs
Color bit-depth
  • 8-bit
  • 10-bit
Encoder(s) supportedNVENC
Display outputs
History
PredecessorTuring (consumer)
Volta (professional)
SuccessorAda Lovelace (consumer)
Hopper (datacenter)
Support status
Supported

Nvidia announced the Ampere architecture GeForce 30 series consumer GPUs at a GeForce Special Event on September 1, 2020.[3][4] Nvidia announced the A100 80 GB GPU at SC20 on November 16, 2020.[5] Mobile RTX graphics cards and the RTX 3060 based on the Ampere architecture were revealed on January 12, 2021.[6]

Nvidia announced Ampere's successor, Hopper, at GTC 2022, and "Ampere Next Next" (Blackwell) for a 2024 release at GPU Technology Conference 2021.

Details

edit

Architectural improvements of the Ampere architecture include the following:

  • CUDA Compute Capability 8.0 for A100 and 8.6 for the GeForce 30 series[7]
  • TSMC's 7 nm FinFET process for A100
  • Custom version of Samsung's 8 nm process (8N) for the GeForce 30 series[8]
  • Third-generation Tensor Cores with FP16, bfloat16, TensorFloat-32 (TF32) and FP64 support and sparsity acceleration.[9] The individual Tensor cores have with 256 FP16 FMA operations per clock 4x processing power (GA100 only, 2x on GA10x) compared to previous Tensor Core generations; the Tensor Core Count is reduced to one per SM.
  • Second-generation ray tracing cores; concurrent ray tracing, shading, and compute for the GeForce 30 series
  • High Bandwidth Memory 2 (HBM2) on A100 40 GB & A100 80 GB
  • GDDR6X memory for GeForce RTX 3090, RTX 3080 Ti, RTX 3080, RTX 3070 Ti
  • Double FP32 cores per SM on GA10x GPUs
  • NVLink 3.0 with a 50 Gbit/s per pair throughput[9]
  • PCI Express 4.0 with SR-IOV support (SR-IOV is reserved only for A100)
  • Multi-instance GPU (MIG) virtualization and GPU partitioning feature in A100 supporting up to seven instances
  • PureVideo feature set K hardware video decoding with AV1 hardware decoding[10] for the GeForce 30 series and feature set J for A100
  • 5 NVDEC for A100
  • Adds new hardware-based 5-core JPEG decode (NVJPG) with YUV420, YUV422, YUV444, YUV400, RGBA. Should not be confused with Nvidia NVJPEG (GPU-accelerated library for JPEG encoding/decoding)

Chips

edit
  • GA100[11]
  • GA102
  • GA103
  • GA104
  • GA106
  • GA107
  • GA10B

Comparison of Compute Capability: GP100 vs GV100 vs GA100[12]

GPU features Nvidia Tesla P100 Nvidia Tesla V100 Nvidia A100
GPU codename GP100 GV100 GA100
GPU architecture Pascal Volta Ampere
Compute capability 6.0 7.0 8.0
Threads / warp 32 32 32
Max warps / SM 64 64 64
Max threads / SM 2048 2048 2048
Max thread blocks / SM 32 32 32
Max 32-bit registers / SM 65536 65536 65536
Max registers / block 65536 65536 65536
Max registers / thread 255 255 255
Max thread block size 1024 1024 1024
FP32 cores / SM 64 64 64
Ratio of SM registers to FP32 cores 1024 1024 1024
Shared Memory Size / SM 64 KB Configurable up to 96 KB Configurable up to 164 KB

Comparison of Precision Support Matrix[13][14]

Supported CUDA Core Precisions Supported Tensor Core Precisions
FP16 FP32 FP64 INT1 INT4 INT8 TF32 BF16 FP16 FP32 FP64 INT1 INT4 INT8 TF32 BF16
Nvidia Tesla P4 No Yes Yes No No Yes No No No No No No No No No No
Nvidia P100 Yes Yes Yes No No No No No No No No No No No No No
Nvidia Volta Yes Yes Yes No No Yes No No Yes No No No No No No No
Nvidia Turing Yes Yes Yes No No No No No Yes No No Yes Yes Yes No No
Nvidia A100 Yes Yes Yes No No Yes No Yes Yes No Yes Yes Yes Yes Yes Yes

Legend:

  • FPnn: floating point with nn bits
  • INTn: integer with n bits
  • INT1: binary
  • TF32: TensorFloat32
  • BF16: bfloat16

Comparison of Decode Performance

Concurrent streams H.264 decode (1080p30) H.265 (HEVC) decode (1080p30) VP9 decode (1080p30)
V100 16 22 22
A100 75 157 108

Ampere dies

edit
Die GA100[15] GA102[16] GA103[17] GA104[18] GA106[19] GA107[20] GA10B[21] GA10F
Die size 826 mm2 628 mm2 496 mm2 392 mm2 276 mm2 200 mm2 ~447.75 mm2 (20.16 mm x 22.21 mm[22]) ?
Transistors 54.2B 28.3B 22B 17.4B 12B 8.7B 21B[23] ?
Transistor density 65.6 MTr/mm2 45.1 MTr/mm2 44.4 MTr/mm2 44.4 MTr/mm2 43.5 MTr/mm2 43.5 MTr/mm2 ~46.9 MTr/mm2 ?
Graphics processing clusters 8 7 6 6 3 2 2 1
Streaming multiprocessors 128 84 60 48 30 20 16 12
CUDA cores 12288 10752 7680 6144 3840 2560 2048 1536
Texture mapping units 512 336 240 192 120 80 64 48
Render output units 192 112 96 96 48 32 32 16
Tensor cores 512 336 240 192 120 80 64 48
RT cores N/A 84 60 48 30 20 8 12
L1 cache 24 MB 10.5 MB 7.5 MB 6 MB 3 MB 2.5 MB 3 MB 1.5 MB
192 KB
per SM
128 KB per SM 192 KB
per SM
128 KB
per SM
L2 cache 40 MB 6 MB 4 MB 4 MB 3 MB 2 MB 4 MB ?

A100 accelerator and DGX A100

edit

The Ampere-based A100 accelerator was announced and released on May 14, 2020.[9] The A100 features 19.5 teraflops of FP32 performance, 6912 FP32/INT32 CUDA cores, 3456 FP64 CUDA cores, 40 GB of graphics memory, and 1.6 TB/s of graphics memory bandwidth.[24] The A100 accelerator was initially available only in the 3rd generation of DGX server, including 8 A100s.[9] Also included in the DGX A100 is 15 TB of PCIe gen 4 NVMe storage,[24] two 64-core AMD Rome 7742 CPUs, 1 TB of RAM, and Mellanox-powered HDR InfiniBand interconnect. The initial price for the DGX A100 was $199,000.[9]

Comparison of accelerators used in DGX:[25][26][27]

Model Architecture Socket FP32
CUDA
cores
FP64 cores
(excl. tensor)
Mixed
INT32/FP32
cores
INT32
cores
Boost
clock
Memory
clock
Memory
bus width
Memory
bandwidth
VRAM Single
precision
(FP32)
Double
precision
(FP64)
INT8
(non-tensor)
INT8
dense tensor
INT32 FP4
dense tensor
FP16 FP16
dense tensor
bfloat16
dense tensor
TensorFloat-32
(TF32)
dense tensor
FP64
dense tensor
Interconnect
(NVLink)
GPU L1 Cache L2 Cache TDP Die size Transistor
count
Process Launched
B200 Blackwell SXM6 N/A N/A N/A N/A N/A 8 Gbit/s HBM3e 8192-bit 8 TB/sec 192 GB HBM3e N/A N/A N/A 4.5 POPS N/A 9 PFLOPS N/A 2.25 PFLOPS 2.25 PFLOPS 1.2 PFLOPS 40 TFLOPS 1.8 TB/sec GB100 N/A N/A 1000 W N/A 208 B TSMC 4NP Q4 2024 (expected)
B100 Blackwell SXM6 N/A N/A N/A N/A N/A 8 Gbit/s HBM3e 8192-bit 8 TB/sec 192 GB HBM3e N/A N/A N/A 3.5 POPS N/A 7 PFLOPS N/A 1.98 PFLOPS 1.98 PFLOPS 989 TFLOPS 30 TFLOPS 1.8 TB/sec GB100 N/A N/A 700 W N/A 208 B TSMC 4NP
H200 Hopper SXM5 16896 4608 16896 N/A 1980 MHz 6.3 Gbit/s HBM3e 6144-bit 4.8 TB/sec 141 GB HBM3e 67 TFLOPS 34 TFLOPS N/A 1.98 POPS N/A N/A N/A 990 TFLOPS 990 TFLOPS 495 TFLOPS 67 TFLOPS 900 GB/sec GH100 25344 KB (192 KB × 132) 51200 KB 1000 W 814 mm2 80 B TSMC 4N Q3 2023
H100 Hopper SXM5 16896 4608 16896 N/A 1980 MHz 5.2 Gbit/s HBM3 5120-bit 3.35 TB/sec 80 GB HBM3 67 TFLOPS 34 TFLOPS N/A 1.98 POPS N/A N/A N/A 990 TFLOPS 990 TFLOPS 495 TFLOPS 67 TFLOPS 900 GB/sec GH100 25344 KB (192 KB × 132) 51200 KB 700 W 814 mm2 80 B TSMC 4N Q3 2022
A100 80GB Ampere SXM4 6912 3456 6912 N/A 1410 MHz 3.2 Gbit/s HBM2e 5120-bit 1.52 TB/sec 80 GB HBM2e 19.5 TFLOPS 9.7 TFLOPS N/A 624 TOPS 19.5 TOPS N/A 78 TFLOPS 312 TFLOPS 312 TFLOPS 156 TFLOPS 19.5 TFLOPS 600 GB/sec GA100 20736 KB (192 KB × 108) 40960 KB 400 W 826 mm2 54.2 B TSMC N7 Q1 2020
A100 40GB Ampere SXM4 6912 3456 6912 N/A 1410 MHz 2.4 Gbit/s HBM2 5120-bit 1.52 TB/sec 40 GB HBM2 19.5 TFLOPS 9.7 TFLOPS N/A 624 TOPS 19.5 TOPS N/A 78 TFLOPS 312 TFLOPS 312 TFLOPS 156 TFLOPS 19.5 TFLOPS 600 GB/sec GA100 20736 KB (192 KB × 108) 40960 KB 400 W 826 mm2 54.2 B TSMC N7
V100 32GB Volta SXM3 5120 2560 N/A 5120 1530 MHz 1.75 Gbit/s HBM2 4096-bit 900 GB/sec 32 GB HBM2 15.7 TFLOPS 7.8 TFLOPS 62 TOPS N/A 15.7 TOPS N/A 31.4 TFLOPS 125 TFLOPS N/A N/A N/A 300 GB/sec GV100 10240 KB (128 KB × 80) 6144 KB 350 W 815 mm2 21.1 B TSMC 12FFN Q3 2017
V100 16GB Volta SXM2 5120 2560 N/A 5120 1530 MHz 1.75 Gbit/s HBM2 4096-bit 900 GB/sec 16 GB HBM2 15.7 TFLOPS 7.8 TFLOPS 62 TOPS N/A 15.7 TOPS N/A 31.4 TFLOPS 125 TFLOPS N/A N/A N/A 300 GB/sec GV100 10240 KB (128 KB × 80) 6144 KB 300 W 815 mm2 21.1 B TSMC 12FFN
P100 Pascal SXM/SXM2 N/A 1792 3584 N/A 1480 MHz 1.4 Gbit/s HBM2 4096-bit 720 GB/sec 16 GB HBM2 10.6 TFLOPS 5.3 TFLOPS N/A N/A N/A N/A 21.2 TFLOPS N/A N/A N/A N/A 160 GB/sec GP100 1344 KB (24 KB × 56) 4096 KB 300 W 610 mm2 15.3 B TSMC 16FF+ Q2 2016

Products using Ampere

edit
  • GeForce MX series
    • GeForce MX570 (mobile) (GA107)
  • GeForce 20 series
    • GeForce RTX 2050 (mobile) (GA107)
  • GeForce 30 series
    • GeForce RTX 3050 Laptop GPU (GA107)
    • GeForce RTX 3050 (GA106 or GA107)[28]
    • GeForce RTX 3050 Ti Laptop GPU (GA107)
    • GeForce RTX 3060 Laptop GPU (GA106)
    • GeForce RTX 3060 (GA106 or GA104)[29]
    • GeForce RTX 3060 Ti (GA104 or GA103)[30]
    • GeForce RTX 3070 Laptop GPU (GA104)
    • GeForce RTX 3070 (GA104)
    • GeForce RTX 3070 Ti Laptop GPU (GA104)
    • GeForce RTX 3070 Ti (GA104 or GA102)[31]
    • GeForce RTX 3080 Laptop GPU (GA104)
    • GeForce RTX 3080 (GA102)
    • GeForce RTX 3080 12 GB (GA102)
    • GeForce RTX 3080 Ti Laptop GPU (GA103)
    • GeForce RTX 3080 Ti (GA102)
    • GeForce RTX 3090 (GA102)
    • GeForce RTX 3090 Ti (GA102)
  • Nvidia Workstation GPUs (formerly Quadro)
    • RTX A1000 (mobile) (GA107)
    • RTX A2000 (mobile) (GA106)
    • RTX A2000 (GA106)
    • RTX A3000 (mobile) (GA104)
    • RTX A4000 (mobile) (GA104)
    • RTX A4000 (GA104)
    • RTX A5000 (mobile) (GA104)
    • RTX A5500 (mobile) (GA103)
    • RTX A4500 (GA102)
    • RTX A5000 (GA102)
    • RTX A5500 (GA102)
    • RTX A6000 (GA102)
    • A800 Active
  • Nvidia Data Center GPUs (formerly Tesla)
    • Nvidia A2 (GA107)
    • Nvidia A10 (GA102)
    • Nvidia A16 (4 × GA107)
    • Nvidia A30 (GA100)
    • Nvidia A40 (GA102)
    • Nvidia A100 (GA100)
    • Nvidia A100 80 GB (GA100)
    • Nvidia A100X
    • NVIDIA A30X
  • Tegra SoCs
    • AGX Orin (GA10B)
    • Orin NX (GA10B)
    • Orin Nano (GA10B)
Products using Ampere (per Chip)
Type GA10B GA107 GA106 GA104 GA103 GA102 GA100
GeForce MX series GeForce MX570 (mobile)
GeForce 20 series GeForce RTX 2050 (mobile)
GeForce 30 series GeForce RTX 3050 Laptop
GeForce RTX 3050
GeForce RTX 3050 Ti Laptop
GeForce RTX 3050
GeForce RTX 3060 Laptop
GeForce RTX 3060
GeForce RTX 3060
GeForce RTX 3060 Ti
GeForce RTX 3070 Laptop
GeForce RTX 3070
GeForce RTX 3070 Ti Laptop
GeForce RTX 3070 Ti
GeForce RTX 3080 Laptop
GeForce RTX 3060 Ti
GeForce RTX 3080 Ti Laptop
GeForce RTX 3070 Ti
GeForce RTX 3080
GeForce RTX 3080 Ti
GeForce RTX 3090
GeForce RTX 3090 Ti
Nvidia Workstation GPUs RTX A1000 (mobile) RTX A2000 (mobile)
RTX A2000
RTX A3000 (mobile)
RTX A4000 (mobile)
RTX A4000
RTX A5000 (mobile)
RTX A5500 (mobile) RTX A4500
RTX A5000
RTX A5500
RTX A6000
Nvidia Data Center GPUs Nvidia A2
Nvidia A16
Nvidia A10
Nvidia A40
Nvidia A30
Nvidia A100
Tegra SoCs AGX Orin
Orin NX
Orin Nano

See also

edit

References

edit
  1. ^ Newsroom, NVIDIA. "NVIDIA's New Ampere Data Center GPU in Full Production". NVIDIA Newsroom Newsroom. {{cite web}}: |last= has generic name (help)
  2. ^ "NVIDIA Ampere Architecture In-Depth". NVIDIA Developer Blog. May 14, 2020.
  3. ^ "NVIDIA Delivers Greatest-Ever Generational Leap with GeForce RTX 30 Series GPUs". Nvidia Newsroom. September 1, 2020. Retrieved April 9, 2023.
  4. ^ "NVIDIA GeForce Ultimate Countdown". Nvidia.
  5. ^ "NVIDIA Doubles Down: Announces A100 80GB GPU, Supercharging World's Most Powerful GPU for AI Supercomputing". Nvidia Newsroom. November 16, 2020. Retrieved April 9, 2023.
  6. ^ "NVIDIA GeForce Beyond at CES 2023". NVIDIA.
  7. ^ "I.7. Compute Capability 8.x". Nvidia. Retrieved September 23, 2020.
  8. ^ Bosnjak, Dominik (September 1, 2020). "Samsung's old 8nm tech at the heart of NVIDIA's monstrous Ampere cards". SamMobile. Retrieved September 19, 2020.
  9. ^ a b c d e Smith, Ryan (May 14, 2020). "NVIDIA Ampere Unleashed: NVIDIA Announces New GPU Architecture, A100 GPU, and Accelerator". AnandTech.
  10. ^ Delgado, Gerardo (September 1, 2020). "GeForce RTX 30 Series GPUs: Ushering In A New Era of Video Content With AV1 Decode". Nvidia. Retrieved April 9, 2023.
  11. ^ Morgan, Timothy Prickett (May 29, 2020). "Diving Deep Into The Nvidia Ampere GPU Architecture". The Next Platform. Retrieved March 24, 2022.
  12. ^ "NVIDIA A100 Tensor Core GPU Architecture: Unprecedented Accerlation at Every Scale" (PDF). Nvidia. Retrieved September 18, 2020.
  13. ^ "NVIDIA Tensor Cores: Versatility for HPC & AI". NVIDIA.
  14. ^ "Abstract". docs.nvidia.com.
  15. ^ "NVIDIA A100 Tensor Core GPU Architecture" (PDF). NVIDIA Corporation. Retrieved April 29, 2024.
  16. ^ "NVIDIA GA102 GPU Specs". TechPowerUp. Retrieved April 29, 2024.
  17. ^ "NVIDIA GA103 GPU Specs". TechPowerUp. Retrieved April 29, 2024.
  18. ^ "NVIDIA GA104 GPU Specs". TechPowerUp. Retrieved April 29, 2024.
  19. ^ "NVIDIA GA106 GPU Specs". TechPowerUp. Retrieved April 29, 2024.
  20. ^ "NVIDIA GA107 GPU Specs". TechPowerUp. Retrieved April 29, 2024.
  21. ^ "NVIDIA AGX Orin Series Technical Brief v1.2" (PDF). NVIDIA Corporation. Retrieved April 29, 2024.
  22. ^ "Future Nintendo Hardware & Technology Speculation & Discussion |ST| (New Staff Post, Please read) | Page 3493 | Famiboards". FamiBoards. Retrieved November 2, 2024.
  23. ^ "Future Nintendo Hardware & Technology Speculation and Discussion |OT|: "Now You're Playing with Super Power!" Nintendo - OT | Page 161 | ResetEra". ResetEra. Retrieved November 2, 2024.
  24. ^ a b Tom Warren; James Vincent (May 14, 2020). "Nvidia's first Ampere GPU is designed for data centers and AI, not your PC". The Verge.
  25. ^ Smith, Ryan (March 22, 2022). "NVIDIA Hopper GPU Architecture and H100 Accelerator Announced: Working Smarter and Harder". AnandTech.
  26. ^ Smith, Ryan (May 14, 2020). "NVIDIA Ampere Unleashed: NVIDIA Announces New GPU Architecture, A100 GPU, and Accelerator". AnandTech.
  27. ^ "NVIDIA Tesla V100 tested: near unbelievable GPU power". TweakTown. September 17, 2017.
  28. ^ Igor, Wallossek (February 13, 2022). "The two faces of the GeForce RTX 3050 8GB". Igor's Lab. Retrieved February 23, 2022.
  29. ^ Shilov, Anton (September 25, 2021). "Gainward and Galax List GeForce RTX 3060 Cards With GA104 GPU". Tom's Hardware. Retrieved September 23, 2022.
  30. ^ Tyson, Mark (February 23, 2022). "Zotac Debuts First RTX 3060 Ti Desktop Cards With GA103 GPU". Tom's Hardware. Retrieved September 23, 2022.
  31. ^ WhyCry (October 26, 2022). "ZOTAC launches GeForce RTX 3070 Ti with GA102-150 GPU". VideoCardz. Retrieved May 21, 2023.
edit