Problem description The First Stage ooooo ooo Semestrai Project Announcement J-W i- ■ i " " V in Fihpovic Fall 2024 Problem description The First Stage •oooo ooo Savings Accounting The goal of the project is to implement an GPU-accelerated code that accounts savings of c customers for p discreete periods 9 the input is a 2D array, which contains money sent to a savings account for each custommer • a column of the array contains info about one customer • a row of the array contains money added in one time period • the output is a 2D array containing accounts ballance and ID array summing all money per period Jiří Filipovič Semestral Project Announcement Problem description The First Stage O0OOO ooo void solveCPU(int *changes, int *account, int clients, int periods) { for (int i = 0; i < clients; i++) account[i] = changes[i]; // the first for (int j = 1; j < periods; { for (int i = 0; i < clients; i++) { account[j* clients + i] = account[(j + changes[j* clients + i]; } } for (int j = 0; j < periods; { int s = 0; for (int i = 0; i < clients; i++) { s += account[j* clients + i]; } sum [ j ] — s ; } } int * sum , change is copied —l)*clients + i] Jin Fihpovic Semestral Project Announcement Problem description The First Stage OO0OO ooo Implementation You get a framework, which does all the boring stuff: o creates input, copies it into GPU memory • check result of CUDA implementation against non-optimized CPU code • benchmarks your code Your work • you are expected to write CUDA code (kernel and code calling the kernel in file kernel.cu) • you can get inspiration (and precise specification) from unoptimized code in kernel_CPU.C • compilation: nvcc -o framework f ramework. cu, you don't need to use Makefile J in Fihpovic Semestral Project Announcement Problem description The First Stage ooo«o ooo Project Rules What will be tested? <* the input size predefined in f ramework. cu can be changed (can be rectangular) o the size will be divisible by 128 in each dimension 9 the code should run on computing capability 3.0 and newer • your code can expect that output arrays are zeroized What is forbidden? o collaboration (discuss general questions, not your code) Jiří Filipovič Semestral Project Announcement Problem description The First Stage oooo« ooo Project Stages The project has three stages: • running parallel implementation (till Nov 4th): 25p • efficient implementation (till Dec 2nd, required performance discussed on the next slide): 25p • the final competititon (till Dec 12th): up to 20p for above average implementations Submit all stages via IS. Problem description ooooo The First Stage •oo The First Stage Write a correct implementation in C for CUDA. • the performance is not relevant • but must be efficiently parallelized (computation in multiple blocks, each having multiple threads, all doing something useful :-)) • till November 4th (any daytime) 9 the points will be assigned according to the functionality of your code (check different input sizes!), if delayed, -2 points for each day of delay • I highly recommend to start with optimization immediatelly after you have a functional code Jiří Filipovič Semestral Project Announcement Problem description The First Stage ooooo omo The Second Stage Write an efficent implementation in C for CUDA. 9 tested on input size 8192 x 8192 • performance on airacuda (GeForce GTX 1070): 20,000 megavalues/s <> performance on barracuda (GeForce RTX 2080 Ti): 40,000 megavalues/s o you have to deliver fast and correct code (also for different input sizes than 8192 x 8192) • I will compile the code with AlRACUDA/BARRACUDA macro, so you can use it if you want different optimizations for {aira,barra}cuda • till December 2nd (any daytime) • the points will be assigned according to the speed of your code (you have to match performance of both machines), if delayed, -2 points for each day of delay Jin Fihpovic Semestral Project Announcement Problem description The First Stage ooooo oo« The Third Stage Submit your best code. • tested on input size 8192 x 8192 and 512 x 512 a the score will be computed as a sum of performances of all combinations of inputs and machines <* the code has to be correct, otherwise zero score is assigned • the students with above average score will get from 1 to 20 points according to their position • till December 12th (any daytime), there is no possibility to submit your code after the deadline (fair play) Jiří Filipovič Semestral Project Announcement