Lesson 7 – Particle systems
Compute shaders, Geometry shaders
PV227 – GPU Rendering
Jiˇrí Chmelík, Jan ˇCejka
Fakulta informatiky Masarykovy univerzity
31. 10. 2016
PV227 – GPU Rendering (FI MUNI) Lesson 7 – Particle system 31. 10. 2016 1 / 31
Particle systems
Particle systems are used for many effects:
Fire Smoke
water, wind, explosions, debris, leaves, birds, . . .
PV227 – GPU Rendering (FI MUNI) Lesson 7 – Particle system 31. 10. 2016 2 / 31
N-body simulation
N-Body simulation
PV227 – GPU Rendering (FI MUNI) Lesson 7 – Particle system 31. 10. 2016 3 / 31
Physics behind
Force between particles:
F = G
m1m2
r2
Acceleration:
a =
F
m
Position:
x = a dt2
PV227 – GPU Rendering (FI MUNI) Lesson 7 – Particle system 31. 10. 2016 4 / 31
Physics behind
Force from particle pother to partice p:
|F| =
constant
pother − p 2
direction of F = direction of (pother − p)
Acceleration:
a = constant · F
Position:
x1 = x0 + v0∆t +
1
2
a∆t2
v1 = v0 + a∆t
PV227 – GPU Rendering (FI MUNI) Lesson 7 – Particle system 31. 10. 2016 5 / 31
Physics – pseudocode
foreach particle p do
x0 ← read p’s position
v0 ← read p’s velocity
accel ← (0, 0, 0)
foreach other particle other do
xother ← read other’s position
direction ← xother − x0
dist2
← dot(direction, direction)
if dist2
> threshold then
accel ← accel + normalize(direction)/dist2
end
end
accel ← accel · accel_factor
x1 ← x0 + v0∆t + 1
2 accel∆t2
v1 ← v0 + accel∆t
store x1
store v1
end
PV227 – GPU Rendering (FI MUNI) Lesson 7 – Particle system 31. 10. 2016 6 / 31
Task: Implement N-body simulation
Task 1: Implement N-body simulation on CPU
See the comments in C++ code for the names of variable and
constants
Don’t forget there are two arrays with particle positions, one to read
from and one to write into
The complexity is O(n2
), test on low number of particles. Once it all
works, try Release build.
PV227 – GPU Rendering (FI MUNI) Lesson 7 – Particle system 31. 10. 2016 7 / 31
General Purpose GPU (GPGPU)
Motivation: Use those many threads on GPU to speed up our
computation.
In this lecture, we will describe the very basics of GPGPU. For
more information:
Loop up CUDA or OpenCL on the Internet
See PV197 GPU Programming
PV227 – GPU Rendering (FI MUNI) Lesson 7 – Particle system 31. 10. 2016 8 / 31
History of GPGPU
Brief history:
Since cca 2000: fragment shaders
Since cca 2006: CUDA, OpenCL
Now: Compute shaders
PV227 – GPU Rendering (FI MUNI) Lesson 7 – Particle system 31. 10. 2016 9 / 31
Basic principles of compute shaders
Similar to vertex/fragment shaders:
Many (mostly independent) threads
Threads do (mostly) the same
Different from vertex/fragment shaders:
VS/FS processes one vertex/fragment
Compute shaders may process whatever
Each thread may process any number of items
Threads can share the mid-results of the computation
Reading and writing data
Buffers via SSBO
Textures via image load/store
Atomic operations
OK, available in other shaders too
Can do (mostly) whatever, so beware of bugs in the code
PV227 – GPU Rendering (FI MUNI) Lesson 7 – Particle system 31. 10. 2016 10 / 31
Support in OpenGL
GLSL code like in other shaders:
Access to uniform variables, UBOs, SSBOs, textures
Structures vec4, mat4, . . .
Functions dot, cross, . . .
Runs the code in main function
Loading and using similarly as other shaders
glCreateShader(GL_COMPUTE_SHADER)
Attaching to programs, using programs
Outside rendering pipeline
Use glDispatchCompute instead of glDraw*
PV227 – GPU Rendering (FI MUNI) Lesson 7 – Particle system 31. 10. 2016 11 / 31
Organization of threads
Threads are organized into work groups:
6 work groups, 24 threads
Threads in work group can share data via shared memory
Threads can be organized in 1D, 2D, and 3D. We will use 1D.
Up to 1024 threads in one work group.
Up to 65536 work groups.
PV227 – GPU Rendering (FI MUNI) Lesson 7 – Particle system 31. 10. 2016 12 / 31
Indexing of threads
Specifying number of threads in work group:
In GLSL: layout (local_size_x = 256) in;
Specifying number of work groups:
In C++: glDispatchCompute(#_of_work_groups_in_x, 1, 1);
Index of a thread in its work group:
In GLSL: gl_LocalInvocationID.x
Index of a thread in all work groups:
In GLSL: gl_GlobalInvocationID.x
Index of the work group a thread is a part of
In GLSL: gl_WorkGroupID.x
Size of one work group (as speciﬁed with layout):
In GLSL: gl_WorkGroupSize.x
Number of work groups (as speciﬁed with glDispatchCompute):
In GLSL: gl_NumWorkGroups.x
PV227 – GPU Rendering (FI MUNI) Lesson 7 – Particle system 31. 10. 2016 13 / 31
Indexing of threads
PV227 – GPU Rendering (FI MUNI) Lesson 7 – Particle system 31. 10. 2016 14 / 31
Task: Rewrite to compute shaders
Task 2: Implement N-body simulation in compute shaders
See the comments in the code for the names of variable and
constants
Use one thread to compute one particle.
Copy and paste the code from C++ and do minor changes
PV227 – GPU Rendering (FI MUNI) Lesson 7 – Particle system 31. 10. 2016 15 / 31
Sharing data between threads
Sharing via shared memory, can be shared only between threads
in the same work group.
Speciﬁcation in GLSL:
shared variable_type variable_name;
Stored values are visible to other threads
Threads run in parallel (!), so we must synchronize the threads
GLSL function barrier()
Calling thread waits until all other threads in the work group reach
the barrier
After the barrier, all threads can read the new values in shared
variables
After the barrier, no threads will need the old data in shared
variables
PV227 – GPU Rendering (FI MUNI) Lesson 7 – Particle system 31. 10. 2016 16 / 31
Sharing data between threads – pseudocode
We will use shared memory to improve reads from the global memory.
foreach particle p do. . .
foreach gl_WorkGroupSize.x of other particles do
read position of one particle into shared memory
barrier() – wait until all other threads read their positions
foreach other particle other in shared memory do
process the particle
end
barrier() – wait until all other threads ﬁnish processing the data
end
. . .
end
PV227 – GPU Rendering (FI MUNI) Lesson 7 – Particle system 31. 10. 2016 17 / 31
Task: Share data between threads
Task 3: Share the positions between threads in work group
Copy the code from nbody_compute.glsl to
nbody_shared_compute.glsl and rewrite it
See the comments in the code for the names of variable and
constants
PV227 – GPU Rendering (FI MUNI) Lesson 7 – Particle system 31. 10. 2016 18 / 31
Pros and cons of using compute shaders
When compared to CPU:
Pros: many threads, the data stays on GPU
Cons: threads must run mostly the same code
When compared to other shaders
Pros: more ﬂexible
Cons: more difﬁcult
When compared to CUDA / OpenCL
Pros: native access to buffers / textures
Cons: less ﬂexible
PV227 – GPU Rendering (FI MUNI) Lesson 7 – Particle system 31. 10. 2016 19 / 31
glMemoryBarrier
glMemoryBarrier
When the data is updated using outputs from vertex/fragment
shaders, memory copies etc., OpenGL knows which data is
update, what operations must wait and what operations may be
executed in parallel.
When we load/store the data using SSBO or texture images (in
compute or other shaders), OpenGL does not know what was
done. Delaying all operations may not be necessary.
Use glMemoryBarrier to tell OpenGL which memory reads
depend on the result of the (not only compute) shaders.
Look up its usage in Cv7_main.cpp.
PV227 – GPU Rendering (FI MUNI) Lesson 7 – Particle system 31. 10. 2016 20 / 31
Geometry shaders
New programmable stage (optional)
Between vertex shader and fragment shader
Takes the whole primitive on input
Creates new primitives on output
Use GL_GEOMETRY_SHADER in C++ to create a geometry
shader
PV227 – GPU Rendering (FI MUNI) Lesson 7 – Particle system 31. 10. 2016 21 / 31
Input Primitives
Deﬁned in GLSL code:
layout (primitive_type) in;
Five supported types, each corresponds with different number of
vertices visible on input
primitive #vertices
points 1
lines 2
lines_adjacency 4
triangles 3
triangles_adjacency 6
Primitive type must match the draw command
Input triangles, drawing triangles: OK
Input triangles, drawing triangle strip: OK
Input points, drawing triangles: not OK
PV227 – GPU Rendering (FI MUNI) Lesson 7 – Particle system 31. 10. 2016 22 / 31
Additional OpenGL primitives
GL_LINES_ADJACENCY
GL_LINE_STRIP_ADJACENCY
PV227 – GPU Rendering (FI MUNI) Lesson 7 – Particle system 31. 10. 2016 23 / 31
GL_TRIANGLES_ADJACENCY
PV227 – GPU Rendering (FI MUNI) Lesson 7 – Particle system 31. 10. 2016 24 / 31
GL_TRIANGLE_STRIP_ADJACENCY
PV227 – GPU Rendering (FI MUNI) Lesson 7 – Particle system 31. 10. 2016 25 / 31
Output Primitives
Three options: points, line_strip, triangle_strip
Geometry shader must also specify maximum number of vertices
that can be generated.
Speciﬁcation in GLSL:
layout (triangle_strip, max_vertices = 4) out;
Input primitive needs not to correspond with output primitive
Input primitive is discarded
PV227 – GPU Rendering (FI MUNI) Lesson 7 – Particle system 31. 10. 2016 26 / 31
Input Data
Data from vertex shader, in arrays.
Size of the array corresponds to the number of vertices of the
input primitive.
Build-in variables in array gl_in, e.g.:
gl_in[0].gl_Position
Other variables must be deﬁned as arrays, e.g.:
in VertexData { ... } inData[];
Size of the array may either be not speciﬁed, or must correspond
to the number of vertices of the primitive.
PV227 – GPU Rendering (FI MUNI) Lesson 7 – Particle system 31. 10. 2016 27 / 31
Output Data
Output data speciﬁed in the same way as in vertex shader.
Once all data of a vertex is speciﬁed, call EmitVertex()
Always deﬁne values of all output variables!
Primitive can be closed and restarted with EndPrimitive()
PV227 – GPU Rendering (FI MUNI) Lesson 7 – Particle system 31. 10. 2016 28 / 31
Example: Render points as textured quads
Use geometry shaders to render quads with texture in place of
points.
Input primitive is point
Output primitive is one triangle strip of four vertices
Positions and texture coordinates can be computed very well in
view space:
PV227 – GPU Rendering (FI MUNI) Lesson 7 – Particle system 31. 10. 2016 29 / 31
Task: Render points as textured quads
Task 4: Use geometry shaders to render points as quads
In vertex shader, transform the position into view space, and pass
the color.
In geometry shader, derive the position, texture coordinate and
color of each vertex, and compute gl_Position
Fragment shader is done.
PV227 – GPU Rendering (FI MUNI) Lesson 7 – Particle system 31. 10. 2016 30 / 31
More on geometry shaders
In the next lecture . . .
PV227 – GPU Rendering (FI MUNI) Lesson 7 – Particle system 31. 10. 2016 31 / 31