Previously, we were able to make beautiful 3D objects using the robustness of meshes. Now with an organized way to identify 3D objects, we naturally move on to the next new feature to implement:
Rendering with Lighting
To render objects with light, we will utilize light equations derived in Physics and implement the ray-tracing algorithm rendering technique with acceleration optimization. This section includes generating the camera rays, building a bounding volume hierarchy (BVH) data structure, computing intersection queries for BVH, bounding boxes, and primitives (i.e. triangle, spheres, etc.), global illumination, and adaptive sampling.
I personally found this section increases in difficulty as each part is heavily reliant on the previous tasks and debugging with thousands of rays offer very little information for floating-point errors, short circuiting, and artifacts. This section requires a solid understanding of the many mathematical light and intersection equations and ray tracing behavior. Furthermore, since much of the assignment is mathematical and precise, many variables offer very small degrees-of-freedom and execute extreme effects on the rendered image if implemented incorrectly. Overall, this section introduces how one may convert light theory into code and render real-looking images quickly.
To begin, one needs to know how to generate sample rays from a pixel to shoot into the scene in order to collect and average the Spectrum (integral of radiance) of the pixel. To represent a ray, it simply has an origin vector and a direction vector r(t) = origin + direction*t, where t is the time of travel from the origin. Thus, we first implement ray generation of the rendering pipeline.
In the image space, we define one pixel with a height and width of 1, where the bottom left corner is (x, y) and the top right corner is (x+1, y+1). The origin of the image space is located on the bottom left corner (0, 0). Thus, given a pixel, we will generate N random rays locally inside the pixel and calculate the Monte Carlo estimate for the Spectrum value of that pixel. By generating enough random rays in the pixel, the Monte Carlo estimate is sufficent, effective, and quick enough to represent the mathematical integral of radiance. With a random sampler (uniform distribution on a unit square), we can normalize our random position on the pixel and convert it into a sensor in the camera space. In the camera space, the ray originates at the camera position and goes through the point on the sensor. Once we have a ray in the camera space, we can simply transform the ray into world space via matrix transformation (camera-to-world matrix). Below demonstrates this idea:
Great! Now for every pixel in the image space, we can generate random rays and convert them from the image space to the world space! The next thing to do is to develop a primitive intersection algorithm (we will only focus on triangle and sphere primitives) to find whether our rays intersected our mesh.
Given a ray, a single primitive in the mesh, and intersection object, we will return a boolean for our intersection query, as well as record/update information of the nearest intersection point known so far into the intersection object and ray (time of intersection, surface normal at the intersection, primitive intersected, and primitive's BSDF). If the function returns false, nothing is updated.
To detect if a ray hits the selected triangle, we will use the Moller Trumbore algorithm. By taking the ray equation and making it equal to the barycentric equation, we can obtain the time t of intersection inside the triangle and the barycentric coordinates of that intersection. Barycentric coordinates are (α, β, γ) "area" coordinates in respect to a triangle.
With that, we simply check if time t is nonnegative and between the ray's "time scope" (min_t, max_t). If (0 <= min_t < t < max_t) and
(0 <= α, β, γ <= 1) [any barycentric coordinate outside of this range is not even within the triangle anymore!], we update. Else, the ray failed to intersect the triangle.
To detect if a ray hits the selected sphere, we will use the ray equation and substitute it into sphere equation. Then, we can obtain the times of intersection (a ray can hit a sphere at most 2 times!) through the quadratic equation.
Once again, like the triangle intersection test, we check if time(s) valid and update if the closer intersection (smaller time if there is 2 intersections) is smaller than the current nearest intersection point time. Otherwise, return false without updating.
Below are images of simple .dae files with normal shading.
|
|
Sure, now that we can generate random rays, we can simply shoot ONE ray into the scene and query ALL primitives for intersection to obtain our closest intersection for that ray. However, as you may know, with thousands of rays and meshes of higher complexities, this would be painstakingly slow and likely impossible to render! Thus enters the bounding volume hierarchy (BVH) data structure.
This data structure is nearly identical to a tree structure. Given a root BVHNode, we can construct a BVH tree, where each BVHNode holds a list of primitives, a left and right BVH node, and a bounding box. A bounding box is simply a way to group and keep track of the primitives in the BVHNode spatially. Important invariants are:
1) For interior nodes, the left and right nodes are non-NULL.
2) For leaf nodes, the list of primitives is non-NULL (meaning when traversing through the tree, only the leaves actually have primitives to iterate through).
3) For leaf nodes, the size of the list of primitives cannot exceed max_leaf_size. A good max_leaf_size can make ray tracing very quick!
With this, we can begin. For each recursive step, we make a bounding box over ALL primitives in scope of the node. With this bounding box, we can pick a heuristic that determines how we split the box into two new BVHNodes/bounding boxes and give these two BVHNodes their respective primitives through another iteration. To split the box, I personally chose to split along the longest axis of the bounding box via bbox.extents. If the number of primitives in scope is equal or below the max_leaf_size, I would set the BVHNode's list of primitives and return. Else, make a left AND right BVHNode and give both of them their respective list of primitives after the split. With this, we will have recursively created a BVH tree.
There are many different heuristics one can use for picking the splitting point along an axis of a BVHNode. The heuristic I chose is via primitive centroids. By iterating through ALL primitives and collecting each of their centroids, I create a bounding box over these centroids and split via the bounding box's centroid itself. In most cases, this would efficently give me non-empty left and right BVHNodes most of the time.
Please be aware that some different heuristics may have corner cases where one child node (left BVHNode or right BVHNode) of a parent may have 0 primitives after the split! If this happens, the program will go into an infinite loop. When I encountered this issue, I simply checked both the left and right nodes for emptiness after spitting and forced a random primitive from the primitive-filled node to the empty node, giving the empty node one primitive (it becomes a leaf node immediately). The only potential issue is that a corner case mesh can cause our BVH to become spindly, causing linear runtime. But for my needs, this is sufficent for many different types of meshes and scenes.
Below are images of large .dae files that can only render with normal shading (under a practical amount of time) using BVH acceleration.
|
|
|
|
|
|
|
|
Now with a BVH tree (ideally bushy), we can achieve a ray intersection complexity from O(N) to O(log(N)), where N is the number of primitives in the mesh. The BVH allows us to traverse the scene and quickly discard collections of primitives in a bounding box that a particular ray is guaranteed not to intersect (this is via bounding box intersection test). The worst case scenario is if our BVH tree is spindly (explained under "Implementation Caution" above) where each leaf node contains one primitive. This gives the BVH traversal algorithm a runtime of O(N) as we are not sectioning our primitives/bounding boxes into useful spatial groups that we can ignore. However, this BVH is sufficent enough to speed up the rendering process quite well.
Below are some runtime comparisons on some .dae files:
1) "CBlucy.dae":
(with BVH) -> 0.1220s to make BVH; time to render 0.0435s
(without BVH) -> time to render 974.0313s
2) "dragon.dae":
(with BVH) -> 0.1106s to make BVH; time to render 0.0752s
(without BVH) -> time to render 759.4424s
3) "maxplanck.dae":
(with BVH) -> 0.1220s to make BVH; time to render 0.0435s
(without BVH)-> time to render 364.4386s
Without a BVH we are expected to run in a O(N) runtime, meaning the runtime is very long even with multiple threads to parallelize the process! For all three complex geometries above, rendering without a BVH took a VERY long time with 8 threads on a HIVE machine! This is because we need query every single mesh one-by-one for the closest intersection. However, once we implemented BVH, we were able to render the same .dae files in lightning fast speed! This is because when we shoot a ray into the scene, we ignore all bounding boxes that did not intersect with the ray. Also, because these pictures are very symmetric, our BVH tree would be bushy, helping us achieve O(log(N)) time. For example, notice that "maxplanck.dae" is split evenly into a top and bottom bounding box. If the ray only hit the top bounding box, we have already saved time by NOT querying half of the list of primitives!
There are two implementations of direct Lighting: Direct Lighting with Uniform Hemisphere Sampling and Direct Lighting by Importance Sampling Lights. Both use the idea of shooting a ray into the scene, then sampling N "shadow rays" (rays where the origin is the hit point). How we determine the direction depends on the type of sampling.
Once we have a hit point in the scene, we will make N = (# of lights * samples per light) samples/shadow rays and calculate the spectrum for the pixel. Since we know we are sampling uniformly in a hemisphere, our pdf of picking a point on the hemisphere is 1 / (2 * PI). With this point on the hemisphere, we can create a shadow ray, where its direction goes towards this sampled point. With this shadow ray, we query if we intersect a primitive (we assume the light source is a primitive and intersectable like area lights, unlike some lights like point lights). If true, we will arbitrarily grab the emission/radiance of the hitpoint's primitive surface (0 for nonlight primitives; non-zero for light primitives). With all these information gathered we can iteratively gather N total samples and then average it for our Monte Carlo Estimate for our reflection value on the hitpoint (equation below). The reflection equation is simply the summation of the hit point's BSDF (which the BRSD is a constant: reflectance * 1 / PI) * the irradiance of the hit point given angle cos theta (we can get this via the shadow ray's z (w_in.z) component in the local space since the normal at the hit point is a vector (0,0,1)), all over the pdf of sampling uniformly in a hemisphere (this is to average out the lights). With this summation, we divide it by N and return.
With the same setup (a hit point in the scene), we will now sample differently. Instead of arbitarily sampling in a random direction of the hemisphere and querying if we indeed hit a light source, we directly sample ON the lights (no misses)! So with a handy function scene->lights.at(i)->sample_L(..) we can directly sample a random position on the light source. With this, we must consider two light types, delta lights and non-delta lights. While we iterate through our list of lights in the scene, we will ask if it is a delta light. If it is, then we simply take one sample from this light (since taking more samples is arbitrary and redundant since it is just a point) and compute the exact same spectrum quantity inside Monte Carlo's summation as we did in uniform hemisphere sampling.
However, now that we are creating shadow rays pointing directly to the a point on the light source from the hit point, we have an extra condition. If we did NOT intersect a primitive/a primitive is blocking the shadow ray's path to the light source (!bvh->intersect(shadow_ray, &intsct)) and the light source is NOT behind the surface of the hit point ((w2o * w_in).z >= 0), we will calulate and add to our overall spectrum.
As for non-delta lights, we do the exact same thing, except we make (samples_per_light) shadow rays that are randomly sampled onto the light source, and then average the total spectrum by (samples_per_light).
With this, we can calculate the total spectrum of a ray by the combination of these two (if there happens to be both delta and nondelta lights in the scene) and return our result.
Note: we will need to set (shadow_ray.min_t = EPS_F) AND (shadow_ray.max_t = dist_to_light - EPS_F) to account for floating point errors and not hit the surface we originate from. Also, if the returned spectrum is zero (our hit point is actually on the light itself!) we can simply return the value for zero bounces (getting the emmision value at the hitpoint directly). This is help light up primitives that are light sources. When I implemented this, I had not realized I needed these until my images were different from the reference images.
Below are some images rendered with both implementations of the direct lighting function.
|
|
As one can see above, because uniform hemisphere sampling is purely random within the range of an encapsulating hemisphere, it is expected that our shadow rays will often miss the light source. As it is probability based, having a small amount of samples will make the image appear very noisy with sharp spectrum fluctuations between pixels. As we can see in "CSspheres.dae" (left), there are some small black grainy spots, showing that the position missed the light a couple of times, darkening the spot in result. Naturally, like in statistics, the more samples we take, the more accurate our spectrum estimate is compared to the mathematical integral of spectrum. With more samples, we can see our image converge and have a lower frequency between pixel colors. However, even with a high sampling rate above (1024 samples per pixel), we can still see some noise in the picture. On the other hand, lighting sampling does a way better job at converging with a lot less samples per pixel compared to uniform hemisphere sampling. This is because we are always prioritizing ray directions that can contribute to the total spectrum. With this, importance sampling doesn't waste its shadow rays and makes for a better render. Also, because uniform hemisphere sampling relies on intersecting primitives, we are unable to render "dragon.dae" since its light sources in the scene are not intersectable! Light sampling is superior as the technique has access to these lights in a list and generates a ray towards these lights.
Below is a comparison of rendered views for "CBspheres.dae" with at least one area (on the ceiling of the Cornell Box). We compare the noise levels in soft shadows when rendering with 1, 4, 16, and 64 light rays and with 1 sample per pixel using light sampling, not uniform hemisphere sampling.
|
|
|
|
Here we notice that, naturally, with more sampling, we will converge better and faster towards our ideal image. Indeed, we can see that light rays 1, 4, and 6 have noticable noise on the spheres's shadow and and walls. However, a key observation is that even given 1 sample per pixel and 1 light sample, light importance sampling did a LOT better at converging than the uniform hemisphere sampling at a much higher sampler per pixel and light samples. Certainly, importance sampling prioritizes on the important/contributing parts of the scene!
Note: After implementing this function and noticed my render was off, understand that the problem could potentially come from another piece of code (it came from)! This is why this project is very difficult to debug! Make sure to test on multiple DIFFERENT .dae files to ensure the function really works as it should.
Now that we have implemented ray bouncing functions of zero and one bounce, we can now start implementing more than one bounce! The result of this is an indirect lighting function, which will help us achieve global illumination. With global illumination, we can represent and render images that look similar to real life and adhere to the physics of light. We will see that indirect lighting is a recursive call to the one bounce function.
Of course, being recursive, we cannot have our recursive function run forever. Also, ideally, the contribution of higher bounces decreases exponentially as energy dissipates through bounces. Thus, we will have a kill probability that works like Russian Roulette. In the section, we will give a ray a continuation probability of 0.7, meaning a ray has a 0.3 probability of stopping and returning. We will also have a max_ray_depth, in which a ray will stop and return once it reaches that hard limit (assuming the ray lasts that long with the kill probability!). Thus, each ray's current depth is initiallized as 0 and is accumulated every bounce/recursive call.
Now, given a ray and the current hit point in the scene, we will generate the spectrum for one bounce and store it. If the max_ray_depth is less than 1, we will simply return this spectrum. Otherwise, if the ray is allowed to bounce by the continuation probability and the ray.depth <= max_ray_depth, we will generate a new ray (ray.depth is accumulated by +1 here!) going in the direction w_in. To generate the direction, we use the probabilistically sampled w_in unit vector giving the incoming radiance direction based on the primitive's specific BSDF. With this ray, we query if the new ray intersects a primitive. If true, we will recurse. Else, we will simply stop recursing and return out sprectrum values. Note: our spectrum value we calculate in this function is no different than the other bounce functions, except we will divide this quantity by the continuation probability (in my case, 0.7) to average the spectrum properly in accordance to probability.
Finally, in our wrapper function, we will add together our zero bounce function (direct) and at least one bounce function (indirect) to get our resulting spectrum for the ray.
Note: When I implemented this, I totally forgot to divide the spectrum by the continuation probability! Make sure to do this.
Below are some images rendered with global (direct and indirect) illumination.
|
|
Below is a comparison of rendered views with only direct illumination, and then only indirect illumination.
|
|
Below is a comparison of rendered views for "CBbunny.dae" with max_ray_depth (# of bounces) set to 0, 1, 2, 3, and 100.
|
|
|
|
The zero, one, and two bounces is quite distinctive. Zero bounce means tracing a direct ray onto the scene and getting the Spectrum if that sampled ray at the hitpoint. Only the light source would have light here. One bounce means hitting a point in the scene and then sampling another ray (shadow ray) out of the hit point to query if the shadow ray hits a light source or not. Global illumination includes both direct and indirect illumination, bouncing multiple times and taking the Spectrum at every bounce, and then appropriately averaging the resulting spectrum. Interestingly for our scene, as we continue bouncing more, the differences are hard to see. This simply means that only through a few bounces in our setup, most of the scene has converged to its ideal lighting state and further bounces produces small (and uninteresting) differences. Below is a gif to help see the very small differences are we go from 2, 3, and 100 bounces. Not only that, because we also have a kill probability, it is likely that the higher # of bounces were killed off (i.e. it is VERY unexpected and unlikely that a ray would continue bouncing its 100th time). Thus, when we rendered .dae files with higher bounces, there is a expectation in time and # bounces in which most of rays have been terminated. Thus, this explains why the higher bounce renders all ended at a similar time (26 minutes)!
Below is a comparison of rendered views for "CBspheres.dae" with various sample-per-pixel rates of 1, 2, 4, 8, 16, 64, and 1024, with 4 light rays with global illumination. Notice that naturally, the more samples/information we take, the more our image converges to the ideal state of lighting.
|
|
|
|
|
|
Now that we are able to execute global illumination with BVH acceleration, seems like we have optimized our ray tracing program to make realistic high resolution images in a reasonable amount of time! But we can do even better. With adaptive sampling we can avoid using a fixed number of samples per pixel by cutting the number of samples per pixel when we see that the pixel has converged to the ideal state early. This way, we can concentrate the samples in more difficult parts of the image and even get a noiseless/better image!
Thus, while I am generating N rays for a pixel in my iterative loop, for the samplesPerBatch(th) ray I am generating, I will calculate and check the convergence value I (the smaller, the more likely the pixel converged) by calculating the mean and variance of the number of total rays I have generated so far. If I <= maxTolerance * mean, I will simply break out of the loop and stop generating rays. Thus, probability is very powerful for programs utilizing random sampling!
I = 1.96 * (σ / √(samples so far))
I ≤ maxTolerance * μ
Below is a rendered and noise-free result of "CBunny.dae" with global illumination, 2048 samples per pixel, 1 sample per light, and 5 for max ray depth. Next to it is the rendered image's corresponding sampling rate image, showing differences in sampling rate over various regions and pixels (red->blue == high->low sampling rates). Here, adaptive sampling allowed for a faster runtime as some places took less samples to converge (ex: walls). [Note that the sample rate image is better than the reference solution!]
|
|