Nvidia and CUDA programming

June 18, 2025

Nvidia and CUDA programming - Let's try some CUDA programming

I am running linux, Fedora linux 42 on an x86-64 desktop. I just upgraded to Fedora 42, which is a bit unfortunate as there is actually a supported setup for Fedora 41, but not yet for Fedora 42.
Details of how I installed the CUDA toolkit are here:

Fedora 42 - my notes on installing the CUDA toolkit

Hello world

The following "tutorial" is worthless (more or less). You will see why as you keep reading. You will find recommendations for some much better tutorials below. Working through the shortcomings did provide a useful, but needlessly painful learning exercise.
I take you through the play by play in this page.

CUDA tutorial -- hello world

I try to follow the above. I copy this code into "hello.cu"

__global__ void cuda_hello(){
    printf("Hello World from GPU!\n");
}

int main() {
    cuda_hello<<<1,1>>>();
    return 0;
}

I set up a Makefile like this:

#NOPTS = -allow-unsupported-compiler -Wno-deprecated-gpu-targets
NOPTS = -Wno-deprecated-gpu-targets

NVCC = /usr/local/cuda-12.9/bin/nvcc $(NOPTS)

all: hello

hello: hello.cu
	$(NVCC) hello.cu -o hello

I set up a bash script that sets a bunch of environment variables, then invokes "make" (I call it "cuda_make") and it looks like the following. (I do away with this, see below).

#!/bin/bash

# You could put all this into some bash script ...
export CUDAHOSTCXX=/usr/bin/g++-14
export CC=/usr/bin/gcc-14
export CXX=/usr/bin/g++-14
export NVCC_CCBIN=/usr/bin/g++-14

# And perhaps this also.
export LD_LIBRARY_PATH=/usr/local/cuda-12.9/targets/x86_64-linux/lib:$LD_LIBRARY_PATH
export CPATH=/usr/local/cuda-12.9/targets/x86_64-linux/include:$CPATH
export PATH=/usr/local/cuda-12.9/bin:$PATH

make

So now I type "cuda_make" and I get:

/usr/local/cuda-12.9/bin/nvcc -Wno-deprecated-gpu-targets hello.cu -o hello
/usr/include/bits/mathcalls.h(79): error: exception specification is incompatible with that of previous function "cospi" (declared at line 2601 of /usr/local/cuda-12.9/bin/../targets/x86_64-linux/include/crt/math_functions.h)
   extern double cospi (double __x) noexcept (true); extern double __cospi (double __x) noexcept (true);

hello.cu(2): error: identifier "printf" is undefined
      printf("Hello World from GPU!\n");
      ^

5 errors detected in the compilation of "hello.cu".
make: *** [Makefile:24: hello] Error 2

There are several errors involving "math_functions.h". These are mentioned in the discussion of setting things up for Fedora 42. I just show one of them.

These apparently reference things in /usr/include/bits/mathcalls.h (which is a regular part of linux) but in an "old" way. The fix is something like this. Replace the first line with the second:


extern __DEVICE_FUNCTIONS_DECL__ __device_builtin__ double                 sinpi(double x);
extern __DEVICE_FUNCTIONS_DECL__ __device_builtin__ double                 sinpi(double x) noexcept (true);

The trick is appending the "noexcept (true)" business. He calls this an "ugly hack" because the game is to edit the following file file and add this to sinpi, sinpif, cospi, cospif.

cd /usr/local/cuda-12.9/targets/x86_64-linux/include/crt
vi math_functions.h

After this, we are left with the complaint about printf. The thing to do is the usual:

#include

Add this to the start of the file and we get no more compile errors.

We get the file "hello" and running it does nothing. No output. Just like the tutorial said. So what is the point? It did help us to grind out way through compiling something with nvcc. Nonetheless, a printf() that doesn't produce output is pretty disappointing.

As an experiment, I changed my file to this, compiled it, and ran it. No output. And it returns to the command line in about a second. It should be in an infinite loop. There are clearly strange things going on.

#include 

__global__ void cuda_hello(){
    for ( ;; )
        printf("Hello World from GPU!\n");
}

int main() {
    cuda_hello<<<1,1>>>();
    return 0;
}

We try something else (nothing like experimenting).

#include 

__global__ void cuda_hello(){
	for ( ;; )
		printf("Hello World from GPU!\n");
}

int main() {
    cuda_hello<<<1,1>>>();
	printf ( "Goodbye\n" );
    return 0;
}

Here is see the "Goodbye" message after about 1 second. I do a bit of reading and learn that the __global__ marker indicates code that will run in the CUDA world. The main() function in the above is just good old C code running on the x86.

I found the solution!

First of all the tutorial I started following sucks, when all is said and done. I ran this:

#include 

__global__ void cuda_hello(){
	int i;

	for ( i=0 ; i<4; i++ )
		printf("Hello World %d from GPU!\n", i+1 );
}

int main() {
    cuda_hello<<<1,1>>>();
	cudaDeviceSynchronize();
	printf ( "Goodbye\n" );
    return 0;
}

And now I get the following, as expected:

Hello World 1 from GPU!
Hello World 2 from GPU!
Hello World 3 from GPU!
Hello World 4 from GPU!
Goodbye

The necessary ingredient was the call to cudaDeviceSynchronize(); -- what was happening without it was that the code running in main() simply exited before the CUDA code could run, or something of that sort.

MUCH better tutorials

These are on the Nvidia site and have stood the test of time.

A better Makefile

For the time being, I have settled on the following, which does away with the need for my cuda_make wrapper script

# Build a cuda hello world
# Tom Trebisky  6-19-2025
# Hot! these days. over 110 yesterday.

# We could do this:
## .EXPORT_ALL_VARIABLES:

export CUDAHOSTCXX=/usr/bin/g++-14
export CC=/usr/bin/gcc-14
export CXX=/usr/bin/g++-14
export NVCC_CCBIN=/usr/bin/g++-14

# These won't work (and don't seem to be needed).
# They fail because of how the previous values are referenced.
# In lieu of CPATH we could use a -I line on gcc (but not nvcc)

#export LD_LIBRARY_PATH=/usr/local/cuda-12.9/targets/x86_64-linux/lib:$LD_LIBRARY_PATH
##export CPATH=/usr/local/cuda-12.9/targets/x86_64-linux/include:$CPATH
#export PATH=/usr/local/cuda-12.9/bin:$PATH

#NOPTS = -allow-unsupported-compiler -Wno-deprecated-gpu-targets
NOPTS = -Wno-deprecated-gpu-targets

# This nice options doesn't seem to work for us
#NVCC = /usr/local/cuda-12.9/bin/nvcc --std=c++14 $(NOPTS)
NVCC = /usr/local/cuda-12.9/bin/nvcc $(NOPTS)

all: hello

hello: hello.cu
	$(NVCC) hello.cu -o hello

clean:
	rm -f hello

# THE END

Time to do my homework

The following is long and big, but I should roll up my sleeves and start reading.

CUDA C programming guide

Feedback? Questions? Drop me a line!

Tom's Computer Info / tom@mmto.org