In this project, I implemented the main parts of a ray/path tracer. The parts involved generating rays that come from a camera, implementing the ray-sphere intersection test and the ray-triangle intersection test using the Möller-Trumbore Algorithm, implementing Bounding Volume Hierarchy (BVH) construction with the Surface Area Heuristic (SAH) for accelerating ray/path tracing, calculating global illumination using direct and indirect lighting with Importance Sampling to achieve lower noise images and more efficient rendering, and finally using Adaptive Sampling to improve performance by detecting when pixels have converged enough to their final values to stop sampling pixels early.
I encountered several problems while working on this project. The first problem was that my initial BVH heuristic caused my BVH traversal to be too slow and resulted in very slow render times. I solved the problem by implementing the SAH instead, resulting in significantly better render times. The other problems I encountered were visual bugs in my render outputs which were mostly caused by me not correctly averaging. The ways I tried to solve these visual problems involved commenting out or adding code to try to isolate the problem, and re-reading the code I wrote to make sure I did not make any mistakes such as forgetting parentheses in math equations which caused math operations to proceed in a different order than how I intended. The best example of solving a problem I had was when I rendered the dragon.dae file at the end of part 3, the shadow cast by the point light was missing. I narrowed down the problem by adding code that removed all other lights except for the point light and discovered that the shadow was present which led me to conclude that the light from the area light was overpowering the light from the point light. This led me to the cause of the problem which was the incorrect averaging of both area and point lights samples together. Once I separated the lights based on their type and averaged their samples separately before adding them up, the problem was solved.
Ray generation involves generating some number of rays that goes through some pixel of the image space. The ray is randomly offset so that it goes through some random part of the pixel instead of always going through the center or the bottom-left corner of the pixel. Next, the ray's direction is converted to the camera space using the lerp equation: $$v_0 + x(v_1 - v_0) \text{ where } v_0 = -tan(0.5 * hFov), v_1 = tan(0.5 * hFov) \text{ for the x coordinate}$$ $$v_0 + y(v_1 - v_0) \text{ where } v_0 = -tan(0.5 * vFov), v_1 = tan(0.5 * vFov) \text{ for the y coordinate}$$ Then, the ray's direction is transformed to the world space using the c2w matrix. I did not need to transform the ray's origin to world space since the camera position was already in world space. This world space ray is then tested against triangles and spheres in the scene to see if the ray intersects with any of them. The closest intersection is used if it exists and the normal at the intersection point is used to determine the color used in the shading of the pixel.
My ray-triangle intersection test uses the Möller-Trumbore Algorithm to calculate t, b1, b2 where t is the time of intersection, and b1 and b2 are Barycentric coordinates. The third Barycentric coordinate can be found using $b0 = 1 - b1 - b2$. To determine if the calculated intersection is inside the triangle, I first checked if all of the Barycentric coordinates added up to 1. Then, I checked if all the b0, b1, and b2 are in the range [0, 1]. Finally, I checked if t is positive and t is in between the near and far clipping planes to ensure that the triangle is not behind the ray's origin or outside the near or far clipping planes. If all the checks pass, the triangle was hit by the ray.
|
|
|
Originally, my BVH construction algorithm was just sorting the vector of primitives by their centroids along the longest axis, splitting the primitives into left and right halves at the midpoint of the axis, and recursing on the left and right halves. Unfortunately, when I tried to render test images for Part 3, I discovered that traversing the BVH was too slow which was most likely due to the suboptimal BVH construction heuristic I used.
To speed up my implementation, I rewrote my BVH construction algorithm to use the Surface Area Heuristic based on code examples from Physically Based Rendering: From Theory to Implementation (Third Edition) Section 4.3.2: The Surface Area Heuristic. My final BVH construction algorithm involves first determining which axis to split on by finding the longest axis of the bounding centroid box that contains the centroids of all primitives being considered at this level of the BVH. Next, I sliced the axis into 32 equal buckets and determined for each primitive which bucket that it belongs to using the centroid of the primitive's bounding box. To add the primitive to a bucket, I expanded the bucket's bounding box with the primitive's bounding box. Once each primitive have been placed in a bucket, I computed which bucket I should split at by calculating the cost of splitting at a particular bucket using the equation from PBRT: $$cost = 0.125 + \frac{c0 * b0.\text{surface_area} + c1 * b1.\text{surface_area}}{bbox.\text{surface_area}}$$ Where 0.125 is the estimated traversal cost, c0 and c1 are the number of primitives before and after the split point respectively, b0.surface_area and b1.surface_area are the surface areas of the bounding boxes containing the primitives before and after the split point respectively, and bbox.surface_area is the surface area of the bounding box containing the primitives from all of the buckets. Once the lowest cost is found over all the buckets and the optimal split point is determined, I split the primitives into a left and right half and called the construct_bvh method recursively on the left and right halves. The recursion stops once the number of primitives falls below max_leaf_size.
|
|
|
All comparisons were done on an idle hive machine with ./pathtracer -t 8 -r 800 600.
| teapot.dae without BVH | cow.dae without BVH | CBcoil.dae without BVH |
|---|---|---|
|
# of primitives: 2,464 Rendering time: 17.7809 seconds # of rays traced: 475,205 Average speed: 0.0267 million rays/second Averaged 570.685424 intersection tests/ray |
# of primitives: 5,856 Rendering time: 41.4098 seconds # of rays traced: 447,271 Average speed: 0.0108 million rays/second Averaged 1,395.320106 intersection tests/ray |
# of primitives: 7,884 Rendering time: 56.1299 seconds # of rays traced: 461,537 Average speed: 0.0082 million rays/second Averaged 1,847.080490 intersection tests/ray |
| teapot.dae with BVH | cow.dae with BVH | CBcoil.dae with BVH |
|
# of primitives: 2,464 Rendering time: 0.1079 seconds # of rays traced: 430,814 Average speed: 3.9931 million rays/second Averaged 2.013851 intersection tests/ray |
# of primitives: 5,856 Rendering time: 0.1535 seconds # of rays traced: 437,380 Average speed: 2.8485 million rays/second Averaged 2.598349 intersection tests/ray |
# of primitives: 7,884 Rendering time: 0.0399 seconds # of rays traced: 383,422 Average speed: 9.6002 million rays/second Averaged 0.872047 intersection tests/ray |
Based on the above table, my BVH with SAH performed very well by reducing the number of intersection tests per ray dramatically. For example, CBcoil.dae went from 1,847 down to 0.872 intersection tests/ray. On these moderately complex scenes, my BVH managed to lower the rendering times from tens of seconds to far less than one second which shows how important it is to have a BVH with a good heuristic when doing ray tracing since checking a ray with every single primitive is too inefficient when the ray would most likely miss most of them. The BVH can more efficiently test if a ray will intersect a primitive by checking to see if the ray will hit the bounding boxes of the BVH nodes first before continuing down to the leaf nodes where the final primitive intersection tests are done which are only a few because each leaf node only contains a few primitives.
| teapot.dae with midpoint BVH | cow.dae with midpoint BVH | CBcoil.dae with midpoint BVH | dragon.dae with midpoint BVH | CBlucy.dae with midpoint BVH |
|---|---|---|---|---|
|
# of primitives: 2,464 Rendering time: 0.1028 seconds # of rays traced: 387,637 Average speed: 3.7710 million rays/second Averaged 3.831814 intersection tests/ray |
# of primitives: 5,856 Rendering time: 0.1702 seconds # of rays traced: 420,536 Average speed: 2.4713 million rays/second Averaged 6.322291 intersection tests/ray |
# of primitives: 7,884 Rendering time: 0.0782 seconds # of rays traced: 413,846 Average speed: 5.3623 million rays/second Averaged 2.642814 intersection tests/ray |
# of primitives: 105,120 Rendering time: 5.0082 seconds # of rays traced: 447,995 Average speed: 0.0895 million rays/second Averaged 172.537935 intersection tests/ray |
# of primitives: 133,796 Rendering time: 0.1821 seconds # of rays traced: 420,881 Average speed: 2.3118 million rays/second Averaged 6.579796 intersection tests/ray |
| teapot.dae with SAH BVH | cow.dae with SAH BVH | CBcoil.dae with SAH BVH | dragon.dae with SAH BVH | CBlucy.dae with SAH BVH |
|
# of primitives: 2,464 Rendering time: 0.1079 seconds # of rays traced: 430,814 Average speed: 3.9931 million rays/second Averaged 2.013851 intersection tests/ray |
# of primitives: 5,856 Rendering time: 0.1535 seconds # of rays traced: 437,380 Average speed: 2.8485 million rays/second Averaged 2.598349 intersection tests/ray |
# of primitives: 7,884 Rendering time: 0.0399 seconds # of rays traced: 383,422 Average speed: 9.6002 million rays/second Averaged 0.872047 intersection tests/ray |
# of primitives: 105,120 Rendering time: 0.0585 seconds # of rays traced: 402,999 Average speed: 6.8858 million rays/second Averaged 1.822925 intersection tests/ray |
# of primitives: 133,796 Rendering time: 0.0381 seconds # of rays traced: 399,009 Average speed: 10.4636 million rays/second Averaged 0.893251 intersection tests/ray |
Based on the above data, it can be observed that the SAH performs consistently better than the midpoint heuristic from small to large scenes. The dragon.dae test is interesting because it shows how the midpoint heuristic and the SAH can be slower for certain scenes but SAH does not slow down as much as the midpoint heuristic.
I implemented direct lighting with uniform hemisphere sampling by casting a new ray from the original hit point in a new direction $(\omega_j)$ sampled from the uniform hemisphere. If the new ray hits a light, the emission of the light is incorporated into the Monte Carlo Estimator version of the Reflection Equation: $$\frac{1}{N} \sum_{j=1}^N \frac{f_r(p, \omega_j \rightarrow \omega_r)L_i(p, \omega_j) cos(\theta_j)}{p(\omega_j)} \label{refEq}\tag{1}$$ Where N is the total number of lighting samples, $f_r$ is the bsdf function of the originally hit surface, $L_i$ is the emission of the light, $cos(\theta_j)$ is the angle between the original surface normal and $\omega_j$, and $p(\omega_j)$ is the probability of taking a sample from the uniform hemisphere ($\frac{1}{2\pi}$).
I implemented direct lighting with Importance Sampling by iterating over all lights in the scene and from the original hit point casting ns_area_light rays toward area lights and only one ray toward point lights since point lights only needed to be sampled once. If the new rays end up hitting some object on the way to the light, then the light's radiance is not included in the final luminance output, resulting in the appearance of shadows in the final image output. Otherwise, the light's radiance will be included in the same equation $\ref{refEq}$ with a different $p(\omega_j)$ obtained from light->sample_L since rays are now only going toward lights instead of in a random direction through a hemisphere.
|
|
|
|
|
|
In the above 4 images, it can be observed that the soft shadows become less and less noisy as the number of light rays used increases. This effect is most apparent around the outer edges of the shadows.
Importance Sampling is better than Uniform Hemisphere Sampling in terms of image quality because the images rendered with Importance Sampling are far less noisier than those rendered with Uniform Hemisphere Sampling with the same sampling settings. This can be observed in the 2 images from the Uniform Hemisphere Sampling vs. Importance Sampling Images section. Additionally, the rendering time is better (37.3683s vs. 20.7805s) for the Importance Sampling case because light rays are directed toward lights instead of mostly missing them in the Uniform Hemisphere case. Thus, Importance Sampling converges faster toward the desired result. Another important property of Importance Sampling is the fact that it can direct rays toward point lights in the scene while Uniform Hemisphere Sampling will most likely always miss hitting point lights and so their effects such as shadows from point lights will be missing from the scene.
To implement indirect lighting, I called the one_bounce_radiance function inside of my at_least_one_bounce_radiance function to get the radiance from lights shining on the original hit point which is added to L_out. Then, I called the sample_f method of the bsdf of the surface that was originally intersected to get a new $\omega_j$ and the radiance for the given $\omega_o$ and the new $\omega_j$, and pdf. A new ray is created with the origin being the original hit point and the direction being the new $\omega_j$. This new ray is sent into the scene and checked for a intersection. If there was an intersection with a new surface, I checked if the calculation should continue if this was the very first bounce or by using Russian Roulette with a continuation probability of 0.65 to prevent infinite recursion. The same estimator equation $\ref{refEq}$ is used but this time $L_i$ will be a recursive function (at_least_one_bounce_radiance) to determine how much light is arriving at the original hit point from some arbitrary amount of further ray bounces into the scene. Additionally, if Russian Roulette was used, $p(\omega_j)$ is multiplied by the continuation probability of 0.65 to ensure that the expected value of the original estimator equation $\ref{refEq}$ remains the same. The output of the equation is added to L_out and L_out is returned when the ray is terminated to complete the recursive call chain.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
I implemented Adaptive Sampling by computing for a pixel the average $(\mu)$ and standard deviation $(\sigma)$ of its illuminance based on the number of samples taken so far. Then, for every samplesPerBatch I calculated $I = 1.96*\frac{\sigma}{\sqrt{n}}$ where n is the current number of samples and checked if $I \le maxTolerance * \mu$. If true, then the pixel can be considered converged and the sampling loop is broken out of early. If false, then the sampling continues until the convergence criterial is met or the maximum amount of samples is reached.
|
|
Blue is the lowest number of samples for a pixel and red the highest. Note how the sampling rate varies across the image.