Problem description	The First Stage
ooooo	ooo
Semestrai Project Announcement
J-W    i-     ■ i " " V
in Fihpovic
Fall 2024
Problem description The First Stage
•oooo ooo
Savings Accounting
The goal of the project is to implement an GPU-accelerated code that accounts savings of c customers for p discreete periods
9 the input is a 2D array, which contains money sent to a savings account for each custommer
• a column of the array contains info about one customer
• a row of the array contains money added in one time period
• the output is a 2D array containing accounts ballance and ID array summing all money per period
Jiří Filipovič        Semestral Project Announcement
Problem description The First Stage
O0OOO ooo
void  solveCPU(int  *changes,   int *account,
int  clients,   int  periods) { for  (int  i = 0;   i < clients; i++)
account[i]  = changes[i];   //  the first for  (int  j = 1;   j < periods; { for  (int  i = 0;   i < clients;   i++) { account[j* clients + i]  = account[(j + changes[j* clients + i];
}
}
for  (int  j = 0;   j < periods; { int  s = 0; for  (int  i = 0;   i < clients;   i++) { s += account[j* clients + i];
}
sum [ j ] — s ;
}
}
int  * sum ,
change   is copied
—l)*clients + i]
Jin Fihpovic
Semestral Project Announcement
Problem description The First Stage
OO0OO ooo
Implementation
You get a framework, which does all the boring stuff: o creates input, copies it into GPU memory
• check result of CUDA implementation against non-optimized CPU code
• benchmarks your code Your work
• you are expected to write CUDA code (kernel and code calling the kernel in file kernel.cu)
• you can get inspiration (and precise specification) from unoptimized code in kernel_CPU.C
• compilation: nvcc -o framework f ramework. cu, you don't need to use Makefile
J in Fihpovic
Semestral Project Announcement
Problem description The First Stage
ooo«o ooo
Project Rules
What will be tested?
<* the input size predefined in f ramework. cu can be changed (can be rectangular)
o the size will be divisible by 128 in each dimension
9 the code should run on computing capability 3.0 and newer
• your code can expect that output arrays are zeroized
What is forbidden?
o collaboration (discuss general questions, not your code)
Jiří Filipovič        Semestral Project Announcement
Problem description The First Stage
oooo« ooo
Project Stages
The project has three stages:
• running parallel implementation (till Nov 4th): 25p
• efficient implementation (till Dec 2nd, required performance discussed on the next slide): 25p
• the final competititon (till Dec 12th): up to 20p for above average implementations
Submit all stages via IS.
Problem description
ooooo
The First Stage •oo
The First Stage
Write a correct implementation in C for CUDA.
• the performance is not relevant
• but must be efficiently parallelized (computation in multiple blocks, each having multiple threads, all doing something useful :-))
• till November 4th (any daytime)
9 the points will be assigned according to the functionality of your code (check different input sizes!), if delayed, -2 points for each day of delay
• I highly recommend to start with optimization immediatelly after you have a functional code
Jiří Filipovič        Semestral Project Announcement
Problem description The First Stage
ooooo omo
The Second Stage
Write an efficent implementation in C for CUDA. 9 tested on input size 8192 x 8192
• performance on airacuda (GeForce GTX 1070): 20,000 megavalues/s
<> performance on barracuda (GeForce RTX 2080 Ti): 40,000 megavalues/s
o you have to deliver fast and correct code (also for different input sizes than 8192 x 8192)
• I will compile the code with AlRACUDA/BARRACUDA macro, so you can use it if you want different optimizations for {aira,barra}cuda
• till December 2nd (any daytime)
• the points will be assigned according to the speed of your code (you have to match performance of both machines), if delayed, -2 points for each day of delay
Jin Fihpovic
Semestral Project Announcement
Problem description The First Stage
ooooo oo«
The Third Stage
Submit your best code.
• tested on input size 8192 x 8192 and 512 x 512
a the score will be computed as a sum of performances of all combinations of inputs and machines
<* the code has to be correct, otherwise zero score is assigned
• the students with above average score will get from 1 to 20 points according to their position
• till December 12th (any daytime), there is no possibility to submit your code after the deadline (fair play)
Jiří Filipovič        Semestral Project Announcement