PV227 GPU programming Marek Vinkler Department of Computer Graphics and Design PV227 GPU programming 1 / 56 Motivation Figure: Taken from shoraspot.com Figure: Taken from cgsociety.org PV227 GPU programming 2 / 56 Course no more than 2 absences, final test (on the spot programming), first lectures more theoretical, then mostly practical. PV227 GPU programming 3 / 56 Course new course → active participation, only major language features are introduced, graphics change fast → help me ;-) PV227 GPU programming 4 / 56 Contact Office C420 xvinkl@fi.muni.cz PV227 GPU programming 5 / 56 Why GPU? graphics computations are costly, graphics are “embarrassingly parallel”, increasing model complexity, screen resolution, . . . GPU is parallel co-processor. PV227 GPU programming 6 / 56 Why GPU? Figure: Taken from docs.nvidia.com Figure: Taken from docs.nvidia.com PV227 GPU programming 7 / 56 Shaders Shaders are small programmes, that can alter the processing of the input data. The hardware units they target are called processors. They come in various flavours: vertex shader: modifies individual vertices, geometry shader: operates on whole primitives, can create new primitives, tessellation shader: similar to geometry shader, specific for tesselation, fragment shader: modifies individual pixel fragments, compute shader: arbitrary parallel computations. PV227 GPU programming 8 / 56 Fragment vs. Pixel A pixel represents the contents of the frame buffer at a specific location. A fragment is the state required to potentially update a particular pixel. A fragment has an associated pixel location, a depth value, and a set of interpolated parameters. PV227 GPU programming 9 / 56 Brief history: 1980’s integrated framebuffer, draw to display, tightly CPU controlled, addition of shaded solids, vertex lighting, rasterization of filled polygons, depth buffer, OpenGL in 1989, beginning of graphics pipeline. PV227 GPU programming 10 / 56 Brief history: 1990’s Generation 0 fixed graphics pipeline, half the pipeline on CPU, half on GPU, 1 pixel per cycle, easy to overload → multiple pipelines, dawn of “cheap” game hardware: 3DFX (Voodoo), NVIDIA (TNT), ATI (Rage), developement driven by games: Quake, Doom, . . . PV227 GPU programming 11 / 56 Brief history: 1990’s Generation I no 2D graphics acceleration; only 3D, transform part of the pipeline on CPU, rendering part on GPU (texture mapping, z-buffering, rasterization), 3DFX Voodoo. PV227 GPU programming 12 / 56 Brief history: 1990’s Generation II entire pipeline on GPU, term “GPU” introduced for GeForce 256, AGP instead of PCI bus, new features: multi-texturing, bump mapping, hardware T&L, fixed function pipeline. PV227 GPU programming 13 / 56 Brief history: 2000–2002 Generation III programmable pipeline (NVIDIA GeForce 3, ATI Radeon 8500), parts of the pipeline can be change with custom programme, only vertex shaders, small assembly language “kernels”. PV227 GPU programming 14 / 56 Brief history: 2002–2004 Generation IV “fully” programmable pipeline (NVIDIA GeForce FX, ATI Radeon 9700), vertex and fragment (pixel) shaders, dedicated vertex and fragment processors, floating point support, advanced texture processing → GPGPU. PV227 GPU programming 15 / 56 Brief history: 2004–2006 Generation V faster than Moore’s law growth, PCI-express bus (NVIDIA GeForce 6, ATI Radeon X800), multiple rendering targets, increased GPU memory, high level GPU languages with dynamic flow control (Brook, Sh). PV227 GPU programming 16 / 56 Brief history: 2006–2009 Generation VI massively parallel processors, unified shaders (NVIDIA GeForce 8), streaming multiprocessor (SM), addition of geometry shaders, new general purpose languages: CUDA, OpenCL. PV227 GPU programming 17 / 56 Unified shaders before – different instruction set, capabilities, now they can do the same (almost – differences of pipeline position), gradient merging of instruction sets, HLSL perspective (http://en.wikipedia.org/wiki/ High-level_shader_language), currently Shader model 5.0 (compute). PV227 GPU programming 18 / 56 Brief history: 2009–? Generation VII even more programmability, cache hierarchy, ECC, unified memory address space, focus on general computations, debuggers and profilers. PV227 GPU programming 19 / 56 Brief future :D Generation Vxx slower rate of performance growth, more CPU like, emphasis on better programming languages and tools, merge of graphics and general purpose APIs. PV227 GPU programming 20 / 56 Graphics pipeline Figure: Taken from goanna.cs.rmit.edu.au PV227 GPU programming 21 / 56 Graphics pipeline Figure: Taken from lighthouse3d.com The graphics pipeline is a sequence of stages operating in parallel and in a fixed order. Each stage receives its input from the prior stage and sends its output to the subsequent stage. PV227 GPU programming 22 / 56 Why programmable pipeline? Fixed pipeline is limited to algorithms hard-coded into the graphics chips → narrow class of effects. Programmability gives the developer almost limitless possibilities. We cannot combine fixed and programmable pipeline. Once shader is active it is responsible for the entire stage. PV227 GPU programming 23 / 56 Shaders continued Typical tasks done in shaders: vertex shader: animation, deformation, lighting, geometry shader: mesh processing, tessellation shader: tessellation, fragment shader: shading ;-), compute shader: almost anything. PV227 GPU programming 24 / 56 Shader languages Cg (C for Graphics), NVIDIA, HLSL (High Level Shading Language), Microsoft, GLSL (OpenGL Shading Language), Khronos Group. PV227 GPU programming 25 / 56 Shader languages comparison almost the same capabilities, conversion tools between them, Cg and HLSL very similar (different setup), HLSL DirectX only, GLSL OpenGL only, Cg for both → different platforms supported. PV227 GPU programming 26 / 56 Shader languages comparison HLSL needs DirectX, Cg needs Cg toolkit [DirectX], GLSL comes with driver, HLSL & Cg: toolkit compiler → “same” binary code for all vendors → translation to machine code, GLSL: vendor compiler → “faster” machine code, inconsistencies, harder to deal with varying hardware, Cg may have compiler issues on ATI cards. PV227 GPU programming 27 / 56 Shader languages comparison We will use GLSL: open standard (same as OpenGL), no install needed, all platforms, all vendors. Will will use GLSL 3.30 for OpenGL 3.3 (NVIDIA 9600 GT is a OpenGL 2.1/3.3 card). Newer features will be mentioned but not demonstrated. PV227 GPU programming 28 / 56 OpenGL evolution Figure: Taken from news.cnet.com PV227 GPU programming 29 / 56 Hands-on shading http://pixelshaders.com/ http://glsl.heroku.com/ http://www.kickjs.org/example/shader_editor/ shader_editor.html http://www.iquilezles.org/default.html http://www.iquilezles.org/live/index.htm PV227 GPU programming 30 / 56 Coordinate spaces and transforms the pipeline transforms 3D objects into 2D image, divided into several coordinate spaces beneficial for different tasks, transformation starts with polygon representation of the model, represented in object space (local space), origin and units chosen according to the model. PV227 GPU programming 31 / 56 Coordinate spaces and transforms Figure: Taken from yaldex.com objects are composed in a single scene (share a single world), represented in world space (model space), origin and units chosen according to the scene, objects are transformed into this space by modeling transformation as defined by model matrix, spatial relations of objects are known afterwards. PV227 GPU programming 32 / 56 Coordinate spaces and transforms Figure: Taken from yaldex.com the scene is viewed by a camera, the view is represented in eye space (camera space), origin at the eye position, looking down the the negative Z axis, objects are transformed into this space by viewing transformation as defined by view matrix, spatial relations of objects are unchanged, model and view matrix are combined into modelview matrix modelview = view × model. PV227 GPU programming 33 / 56 Coordinate spaces and transforms Figure: Taken from yaldex.com the camera defines a viewing volume, space visible in the final image, the view is represented as a axis-aligned cube in clip space, −w ≤ x ≤ w, −w ≤ y ≤ w, w ≤ z ≤ w, objects are transformed into this space by projection transformation as defined by projection matrix, beneficial for frustum clipping polygons outside the axis-aligned cube. PV227 GPU programming 34 / 56 Coordinate spaces and transforms Figure: Taken from yaldex.com the clip space is compressed into [-1,1] range with the perspective divide, achieved by dividing with w → only 3 coordinates left, the resulting space is called normalized device coordinate space, beneficial for mapping visible primitives to arbitrarly sized viewports. PV227 GPU programming 35 / 56 Coordinate spaces and transforms Figure: Taken from yaldex.com pixels coordinates are of form 0 – (width-1) and 0 – (height-1), i.e. window coordinate system (screen space), viewport transformation transforms the [-1,1] range into this system, primitives are rasterized in this system. PV227 GPU programming 36 / 56 Coordinate spaces and transforms during computations the variables must be in the same space, e.g. vertices, normals and light positions in eye space, vertex shader must output the clip coordinates. PV227 GPU programming 37 / 56 GLSL shader setup 1 # include 2 # include 3 4 void main ( i n t argc , char ∗∗argv ) 5 { 6 g l u t I n i t (&argc , argv ) ; 7 . . . 8 g l e w I n i t ( ) ; 9 10 i f ( glewIsSupported ( "GL_VERSION_3_3" ) ) 11 { 12 p r i n t f ( "Ready f o r OpenGL 3.3\ n" ) ; 13 } 14 else 15 { 16 p r i n t f ( "OpenGL 3.3 not supported \ n" ) ; 17 e x i t (1) ; 18 } 19 setShaders ( ) ; 20 initGL ( ) ; 21 22 glutMainLoop ( ) ; 23 } PV227 GPU programming 38 / 56 GLSL shader setup Figure: Taken from lighthouse3d.com PV227 GPU programming 39 / 56 Creating shader Figure: Taken from lighthouse3d.com GLuint glCreateShader(GLenum shaderType); shaderType − GL_{VERTEX|FRAGMENT| GEOMETRY|TESS_CONTROL|TESS_EVALUATION| COMPUTE}_SHADER. Creates shader object of a specified type that acts as a container. Returns the handle for that container. PV227 GPU programming 40 / 56 Creating shader Figure: Taken from lighthouse3d.com void glShaderSource(GLuint shader, GLsizei count, const GLchar ∗∗string, const GLint ∗length); shader − the handler to the shader. count − the number of strings in the arrays. string − the array of strings . length − an array with the length of each string; NULL, meaning that the strings are NULL terminated. Replaces a source code for the shader. Single string can be used instead of an array. Multiple strings can define common pieces of code, third-party library functions, . . . . PV227 GPU programming 41 / 56 Creating shader Figure: Taken from lighthouse3d.com void glCompileShader(GLuint shader); shader − the handler to the shader. Compiles the shader. Checks its validity. PV227 GPU programming 42 / 56 Creating program Figure: Taken from lighthouse3d.com GLuint glCreateProgram(void); Creates program object that acts as a container. Returns the handle for that container. Any number of programs can be created and used in a single frame. Programes can be switched at runtime. No program used → fixed pipeline. PV227 GPU programming 43 / 56 Creating program Figure: Taken from lighthouse3d.com void glAttachShader(GLuint program, GLuint shader); program − the handler to the program. shader − the handler to the shader you want to attach. Attaches a shader into the program. The shaders need neither be compiled nor have source code. Any number of shaders can be attached, but only one main for each shader type. Single shader can be attached to many programes. PV227 GPU programming 44 / 56 Creating program Figure: Taken from lighthouse3d.com void glLinkProgram(GLuint program); program − the handler to the program. Links the program, resolves cross-shader references. Shaders must be compiled at this point. Afterwards the shaders can be modified & recompiled. Uniform variables are assigned locations and set to 0. PV227 GPU programming 45 / 56 Creating program Figure: Taken from lighthouse3d.com void glUseProgram(GLuint prog); program − the handler to the program; zero to use fixed functionality . Sets the program for use in rendering. Relinking a used program also sets it for use. PV227 GPU programming 46 / 56 Cleanup void glDetachShader(GLuint program, GLuint shader); program − the program to detach from. shader − the shader to detach. Detaches shader from a program. void glDeleteShader(GLuint id); void glDeleteProgram(GLuint id); id − the handler of the shader / program to erase. When attached shader/program is deleted, it is only “marked for deletion” and is fully deleted when no longer used. Shaders may be deleted as soon as they are attached, everything will be cleaned up when program is deleted. PV227 GPU programming 47 / 56 GLSL setup example 1 void setShaders ( ) 2 { 3 char ∗vs , ∗ fs ; 4 5 / / Setup 6 v = glCreateShader (GL_VERTEX_SHADER) ; 7 f = glCreateShader (GL_FRAGMENT_SHADER) ; 8 9 vs = textFileRead ( " simple . vert " ) ; 10 fs = textFileRead ( " simple . frag " ) ; 11 12 const char ∗ vv = vs ; 13 const char ∗ f f = fs ; 14 15 glShaderSource ( v , 1 , &vv , NULL) ; 16 glShaderSource ( f , 1 , &f f , NULL) ; 17 18 free ( vs ) ; 19 free ( fs ) ; 20 21 glCompileShader ( v ) ; 22 glCompileShader ( f ) ; PV227 GPU programming 48 / 56 GLSL setup example (cont.) 23 24 p = glCreateProgram ( ) ; 25 26 glAttachShader (p , v ) ; 27 glAttachShader (p , f ) ; 28 29 glLinkProgram ( p ) ; 30 glUseProgram ( p ) ; 31 32 . . . 33 34 / / Clean up 35 glDetachShader (p , v ) ; 36 glDetachShader (p , f ) ; 37 38 glDeleteShader ( v ) ; 39 glDeleteShader ( f ) ; 40 41 glUseProgram (0) ; 42 glDeleteProgram ( p ) ; 43 } PV227 GPU programming 49 / 56 State query void glGetShaderiv(GLuint shader, GLenum pname, GLint ∗params); shader − the shader to query. pname − parameter to query. params − queried state. pname: GL_SHADER_TYPE – type of the shader, GL_DELETE_STATUS – marked for deletion?, GL_COMPILE_STATUS – last compile successful?, GL_INFO_LOG_LENGTH – length of the information log, GL_SHADER_SOURCE_LENGTH – length of the concatenated shader. PV227 GPU programming 50 / 56 State query void glGetProgramiv(GLuint program, GLenum pname, GLint ∗params); program − the shader to query. pname − parameter to query. params − queried state. pname (not all shown): GL_LINK_STATUS – last link successful?, GL_DELETE_STATUS – marked for deletion?, GL_VALIDATE_STATUS – last validation successful?, GL_INFO_LOG_LENGTH – length of the information log, information on number of shaders attached, number of attribute values and uniform variables. PV227 GPU programming 51 / 56 State query void glGetShaderInfoLog(GLuint shader, GLsizei maxLength, GLsizei ∗length, GLchar ∗infoLog); shader − the shader to query. maxLength − maximal length of output buffer. length − actual length of the log. infoLog − the shader log. updated during shader compile, may contain diagnostic messages, errors, warnings etc. (implementation specific). PV227 GPU programming 52 / 56 State query void glGetProgramInfoLog(GLuint program, GLsizei maxLength, GLsizei ∗length, GLchar ∗infoLog); program − the program to query. maxLength − maximal length of output buffer. length − actual length of the log. infoLog − the shader log. updated during program validation or link, may contain diagnostic messages, errors, warnings etc. (implementation specific). PV227 GPU programming 53 / 56 State query void glValidateProgram(GLuint program); program − the program to validate. checks whether program can execute given current OpenGL state, updates the program log, only for developement (slow). PV227 GPU programming 54 / 56 GLSL query example 1 void printShaderInfoLog ( GLuint obj ) 2 { 3 i n t infologLength = 0; 4 i n t charsWritten = 0; 5 char ∗infoLog ; 6 7 glGetShaderiv ( obj , GL_INFO_LOG_LENGTH, &infologLength ) ; 8 9 i f ( infologLength > 0) 10 { 11 infoLog = ( char ∗) malloc ( infologLength ) ; 12 glGetShaderInfoLog ( obj , infologLength , &charsWritten , infoLog ) ; 13 p r i n t f ( "%s \ n" , infoLog ) ; 14 free ( infoLog ) ; 15 } 16 } PV227 GPU programming 55 / 56 GLSL query example 1 void printProgramInfoLog ( GLuint obj ) 2 { 3 i n t infologLength = 0; 4 i n t charsWritten = 0; 5 char ∗infoLog ; 6 7 glGetProgramiv ( obj , GL_INFO_LOG_LENGTH, &infologLength ) ; 8 9 i f ( infologLength > 0) 10 { 11 infoLog = ( char ∗) malloc ( infologLength ) ; 12 glGetProgramInfoLog ( obj , infologLength , &charsWritten , infoLog ) ; 13 p r i n t f ( "%s \ n" , infoLog ) ; 14 free ( infoLog ) ; 15 } 16 } PV227 GPU programming 56 / 56