NVIDIA CUDA GPU Framework
(Redirected from NVIDIA's CUDA toolkit)
Jump to navigation
Jump to search
A NVIDIA CUDA GPU Framework is a GPU programming framework created by NVIDIA.
- AKA: Compute Unified Device Architecture.
- Context:
- It can (typically) allow developers to access the Virtual Instruction Set and ... Memory.
- It can (often) be used for General-Purpose Computing on Graphics Processing Units (GPGPU).
- ...
- It can be employed in ___ Computational Tasks.
- It can be supported on CUDA-enabled GPU, such as: NVIDIA GPUs.
- ...
- Example(s):
- CUDA Toolkit 8.0 (~2017-02) - Introduced support for new CUDA-enabled GPUs and provided enhanced debugging and performance profiling tools.
- CUDA Toolkit 9.2 (~2018-05-22) - Included updates to libraries like cuBLAS.
- CUDA Toolkit 12.6.0 (August 2024) - Featured advanced optimizations for AI Workloads.
- cuDNN (CUDA Deep Neural Network library) - A GPU-accelerated library for deep neural networks that significantly speeds up the training and inference of deep learning models..
- NVIDIA TensorRT - A platform for high-performance deep learning inference, optimized to deliver low latency and high throughput for deep learning applications.
- ...
- Counter-Example(s):
- OpenCL Platform, which is an open standard for cross-platform, parallel programming, and is not specific to NVIDIA GPUs.
- AMD ROCm (Radeon Open Compute), which is a framework developed by AMD for its GPUs, serving a similar purpose but not compatible with CUDA.
- See: Graphics Processing Unit, Instruction Set, GPGPU, Software Program Directive, OpenACC, LLVM, sciGPGPU, Numerical Analysis System, Theano.
References
2018
- https://developer.nvidia.com/cuda-toolkit/whatsnew
- QUOTE: CUDA 9.2 includes updates to libraries, a new library for accelerating custom linear-algebra algorithms, and lower kernel launch latency.
With CUDA 9.2, you can:
- Speed up recurrent and convolutional neural networks through cuBLAS optimizations
- Speed up FFT of prime size matrices through Bluestein kernels in cuFFT
- Accelerate custom linear algebra algorithms with CUTLASS 1.0
- Launch CUDA kernels up to 2X faster than CUDA 9 with new optimizations to the CUDA runtime
- QUOTE: CUDA 9.2 includes updates to libraries, a new library for accelerating custom linear-algebra algorithms, and lower kernel launch latency.
2015
- (Wikipedia, 2015) ⇒ http://en.wikipedia.org/wiki/CUDA Retrieved:2015-1-17.
- CUDA (after the Plymouth Barracuda) [1] is a parallel computing platform and programming model created by NVIDIA and implemented by the graphics processing units (GPUs) that they produce. [2] CUDA gives developers direct access to the virtual instruction set and memory of the parallel computational elements in CUDA GPUs. Using CUDA, the GPUs can be used for general purpose processing (i.e., not exclusively graphics); this approach is known as GPGPU. Unlike CPUs, however, GPUs have a parallel throughput architecture that emphasizes executing many concurrent threads slowly, rather than executing a single thread very quickly. The CUDA platform is accessible to software developers through CUDA-accelerated libraries, compiler directives (such as OpenACC), and extensions to industry-standard programming languages, including C, C++ and Fortran. C/C++ programmers use 'CUDA C/C++', compiled with "nvcc", NVIDIA's LLVM-based C/C++ compiler. [3] Fortran programmers can use 'CUDA Fortran', compiled with the PGI CUDA Fortran compiler from The Portland Group. In addition to libraries, compiler directives, CUDA C/C++ and CUDA Fortran, the CUDA platform supports other computational interfaces, including the Khronos Group's OpenCL, Microsoft's DirectCompute, OpenGL Compute Shaders and C++ AMP. Third party wrappers are also available for Python, Perl, Fortran, Java, Ruby, Lua, Haskell, R, MATLAB, IDL, and native support in Mathematica. In the computer game industry, GPUs are used not only for graphics rendering but also in game physics calculations (physical effects such as debris, smoke, fire, fluids); examples include PhysX and Bullet. CUDA has also been used to accelerate non-graphical applications in computational biology, cryptography and other fields by an order of magnitude or more. [4] [5] CUDA provides both a low level API and a higher level API. The initial CUDA SDK was made public on 15 February 2007, for Microsoft Windows and Linux. Mac OS X support was later added in version 2.0, [6] which supersedes the beta released February 14, 2008. [7] CUDA works with all Nvidia GPUs from the G8x series onwards, including GeForce, Quadro and the Tesla line. CUDA is compatible with most standard operating systems. Nvidia states that programs developed for the G8x series will also work without modification on all future Nvidia video cards, due to binary compatibility.
- ↑ [Mark Ebersole, Nvidea educator, 2012 presentation]
- ↑ NVIDIA CUDA Home Page
- ↑ CUDA LLVM Compiler
- ↑ Pyrit – Google Code http://code.google.com/p/pyrit/
- ↑ Use your Nvidia GPU for scientific computing, BOINC official site (December 18, 2008)
- ↑ Nvidia CUDA Software Development Kit (CUDA SDK) – Release Notes Version 2.0 for MAC OS X
- ↑ CUDA 1.1 – Now on Mac OS X- (Posted on Feb 14, 2008)
2014
- http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using_cluster_computing.html
- QUOTE: If you require high parallel processing capability, you'll benefit from using GPU instances, which provide access to NVIDIA GPUs with up to 1,536 CUDA cores and 4 GB of video memory. You can use GPU instances to accelerate many scientific, engineering, and rendering applications by leveraging the Compute Unified Device Architecture (CUDA) or OpenCL parallel computing frameworks. You can also use them for graphics applications, including game streaming, 3-D application streaming, and other graphics workloads.
2007
- (Harish & Narayanan, 2007) ⇒ Pawan Harish, and P. J. Narayanan. (2007). “Accelerating Large Graph Algorithms on the GPU using CUDA.” In: High performance computing (HiPC 2007).
- ABSTRACT: Large graphs involving millions of vertices are common in many practical applications and are challenging to process. Practical-time implementations using high-end computers are reported but are accessible only to a few. Graphics Processing Units (GPUs) of today have high computation power and low price. They have a restrictive programming model and are tricky to use. The G80 line of Nvidia GPUs can be treated as a SIMD processor array using the CUDA programming model. We present a few fundamental algorithms – including breadth first search, single source shortest path, and all-pairs shortest path – using CUDA on large graphs. We can compute the single source shortest path on a 10 million vertex graph in 1.5 seconds using the Nvidia 8800GTX GPU costing $600. In some cases optimal sequential algorithm is not the fastest on the GPU architecture. GPUs have great potential as high-performance co-processors.