There are two important aspects of dynamic programming which are(1) breaking a problem into smaller parts and (2) storing or
“memoizing” results/calculations to avoid redundant work. While
common examples are often optimization to recursive solutions, the
above two concepts can be applied to many things, and I present
a real-world implementation.
Instead saying “memoizing”, I believe a more common term is
simply caching. Large programs (and computers in general) cache
many different things to save on redundant work and it is no different
here. For my example of dynamic programming, I will describe how I
have used it for calculating shadow and lighting when rendering
graphics.
Without Dynamic Programming
First a quick description of how shadow and light could be calculated
without the aid of dynamic programming. Consider a scene of various
geometry and an array of lights with various properties set.
1. Set a shadow render target
2. Render all shadows cast from the geometry based on the lights
position
3. Set a scene render target
4. Bind any material, normal, and color data
5. Render light, minus shadow, and read other data in the
calculation
6. Repeat for all lights
To break down the cost of this the texture sampling is the most
expensive operation. That is where for each fragment that light is
drawn to, the shadow and other texture data must be read. In the
above that would mean a sample rate of 4 per fragment, per
light. Fragments are a bit different than “just pixels”, but for simplicity
let’s say it is 1-to-1 at a resolution of 1920×1080. That means 2,073,600
fragments, and a total sample rate per light of 8,294,400. I stress per
light because the following exposes how costly that is.
With Dynamic Programming
So, how could dynamic programming break this down into smaller
problems while caching as much information as possible to save on
work?
1. Set a shadow render target
2. Render a batch of shadow data at a single time (**explained later)
3. Set material, normal, and color render targets
4. Render all three at once
5. Set a scene render target
6. Bind the three textures from step 3 and the shadow data
7. Render the entire batch of lights at once while reading the
shadow and other data
Looking at the sample rate now it is still 8,294,400, but for
as many lights as can be fit into a single batch–or no longer per
light. At a minimum that would be 32 with the red, green, blue, and
alpha (RGBA) of a texel set at 1 byte each. The first algorithm would
cost 265,420,800 samples to render 32 lights, but with caching and
breaking things apart into smaller steps (shadow, materials, and final
render) the difference is vast. This keeps the sample rate at a constant
4 as long as the scene manages to only use 32 or less lights a time. For
reference, a sample rate of 4 is average for a common mobile device
with graphics capabilities. Using only 2 lights in the non-dynamic
solution would double that.
Bitwise Shadow Maps**
Rendering an entire batch of shadow data into a texture at once is my
own solution which I have developed. I mentioned the number of
bytes in a texel, because the key is packing corresponding shadow
values into the separate bits of the RGBA. Each light is given an index 1
– 32, and when all lights render in a batch the index is used with a
bitwise operation to flip the corresponding bit. Then later when the
light is rendered the same indexed light can unpack that same bit with
another bitwise operation to determine where shadows should be. At
the cost of memory it is possible to use texels with more than 32 bits
each.
Another important step when rendering the shadow bits is to use
depth testing with “equal to” comparison. This is to prevent a single
light index from casing overlapping shadows upon itself which would
ruin the entire process.
If we take a peek at what this bitwise shadow map looks like after
compiled it is the following:
Captured on the exact same frame as that shadow map is the final
render which read it for shadow data:
In the images only a few lights are being rendered, and use the green
channel alone to pack bits. This is on purpose just for the example
images because it would be hard to make out any corresponding
shapes if all 32 bits were in use–although the computer would have no
problem reading it.
The second pseudo-code steps above are referred to as a type deferred
rendering. My own addition is the bitwise shadow maps which further
reduce the problem size and cache more data at once than the normal
deferred rendering pipeline.
If you are curious how deferred rendering works, here is an
implementation I wrote in WebGL: https://badwrongg.github.io/webgldeferred-pbr/ (Warning large texture loading) You can see the small
images on the left where I break about the cached data to show it for
example purposes. It too is an example of breaking up an entire job
into smaller steps where data is cached. In this case it is called a gbuffer which stores information for later use.