NVIDIA® CUDATM technology takes advantage of NVIDIA GPUs’ massively parallel computing capacity. The CUDA framework is a ground-breaking massively parallel architecture that brings NVIDIA’s world-class graphics processor expertise to general-purpose GPU computing. CUDA-enabled GPUs are found in over one hundred million desktop and notebook computers, professional workstations, & supercomputer clusters, allowing applications written in the CUDA architecture to take advantage of them.

Developers are gaining substantial speedups in domains like medical imaging & natural resource exploitation and generating breakthrough applications in areas like image recognition as real-time HD video playing and encoding, thanks to the CUDA design and tools.

Developers are obtaining huge speedups in industries like medical imaging and mineral wealth exploitation and generating ground-breaking applications in areas like image recognition as true HD video playback & encoding using these technologies and techniques.

Standard APIs like soon-to-be-released OpenCLTM and DirectX® Compute and high-level programming languages like C/C++, Fortran, Java, Python, and the Microsoft.NET Framework enable this remarkable performance.

CUDA’s processing resources are intended to aid in the optimization of performance in GPU use cases. Threads, task blocks, & kernel grids are three of the hierarchy’s most important elements.

A thread (or CUDA core) is a parallel processor that performs floating-point math calculations in an Nvidia GPU. A CUDA core handles all of the data that a GPU processes. CUDA cores can number in the hundreds or thousands on modern GPUs. Each CUDA core does have its memory registers that other threads do not have access to.

While the relationship between computing power & CUDA cores is not completely linear, the more CUDA core a GPU has, the more the computation power it has (provided everything else is equal). However, there are certain exceptions.

The CUDA Architecture is a graphics processing unit (GPU).

The CUDA architecture is made up of various components, which are listed below in green boxes:

  1. NVIDIA GPUs have parallel computation engines.


  1. Hardware start-up, setup, and other OS kernel-level support


  1. Consumer driver, which gives developers a device-level API.


  1. Parallel computing kernel and functions based on the PTX instructions set architecture (ISA).


CUDA Software Development Environment supports two programming interfaces:

  1. A device-level coding interface for applications that use DirectX Compute.

Set up the GPU, compute kernels, and use OpenCL or even the CUDA Drivers API directly and then read back the results.

  1. A programming interface allows an application to use the C programming language.

For CUDA, there is a runtime. Developers and designers use a tiny number of extensions to indicate which available features.

Instead of using the CPU, computational functions should be conducted on the GPU. Developers write compute kernels in the device-level programming interface. Use the kernel language provided by the API of choice to create separate files. Compute with DirectX.

HLSL is used to write kernels (also known as “compute shaders”). Kernels for OpenCL are built in a C-like language. “OpenCL C” is the name of the programming language. CUDA is a graphics processing unit. Developers write compute functions utilizing a language integration programming interface. In C and C++, The CUDA Runtime establishes the GPU and runs the code. Calculate functions. Developers can use this programming interface to reap the benefits of native features.

High-level languages like C, C++, Fortran, Java, Python, and many others are supported.

Using type integration & code integration to reduce the complexity of the algorithm and development costs:

  • Standard kinds, vectors, and user-defined types can be integrated and used consistently across all CPU-based and graphics processing (GPU) functions.


  • Code integration enables you to invoke the same function from different functions performed on the CPU. The word C for CUDA can be used to define the limited handful of extensions that allow developers to state at state features performed on the GPU, what GPU recollection will be used, as well as how the parallelization functionality of the GPU is being used by the implementation when distinguishing between features that will be performed on the CPU and those that will be performed on the GPU. Thousands of software engineers already use the free CUDA development tools, first released in March 2007, and have already sold over 100 million Multi core GPUs. Help solve difficulties in various professional and personal applications, including video and image processing.


From oil and gas development, product development, and medical imaging, through processing & physics simulations,

Today, developers may utilize the CUDA architecture’s high performance by utilizing rich libraries.

APIs and several high-level languages are available on 32-bit and 64-bit Linux ™ Operating systems, macOS, and Windows.

Hierarchy of Memory

6.5. Memory Hierarchy — CS160 Reader

Separate memory areas exist for the CPU and GPU. This ensures that data to be processed either by GPU must be transported from the CPU to a GPU before the calculation can begin, and the outputs of the calculation must be returned to the CPU once the processing is finished.

  • Memory at the global level
  • All threads and the host have access to this memory (CPU).
  • The host manages global memory allocation and deallocation.
  • This is used to set up the data which the GPU would work with.

Leave a Reply

Your email address will not be published. Required fields are marked *