Assignment 3: PathTracer

Lawrence Yiu

Overview

In this project, I implemented the main parts of a ray/path tracer. The parts involved generating rays that come from a camera, implementing the ray-sphere intersection test and the ray-triangle intersection test using the Möller-Trumbore Algorithm, implementing Bounding Volume Hierarchy (BVH) construction with the Surface Area Heuristic (SAH) for accelerating ray/path tracing, calculating global illumination using direct and indirect lighting with Importance Sampling to achieve lower noise images and more efficient rendering, and finally using Adaptive Sampling to improve performance by detecting when pixels have converged enough to their final values to stop sampling pixels early.

I encountered several problems while working on this project. The first problem was that my initial BVH heuristic caused my BVH traversal to be too slow and resulted in very slow render times. I solved the problem by implementing the SAH instead, resulting in significantly better render times. The other problems I encountered were visual bugs in my render outputs which were mostly caused by me not correctly averaging. The ways I tried to solve these visual problems involved commenting out or adding code to try to isolate the problem, and re-reading the code I wrote to make sure I did not make any mistakes such as forgetting parentheses in math equations which caused math operations to proceed in a different order than how I intended. The best example of solving a problem I had was when I rendered the dragon.dae file at the end of part 3, the shadow cast by the point light was missing. I narrowed down the problem by adding code that removed all other lights except for the point light and discovered that the shadow was present which led me to conclude that the light from the area light was overpowering the light from the point light. This led me to the cause of the problem which was the incorrect averaging of both area and point lights samples together. Once I separated the lights based on their type and averaged their samples separately before adding them up, the problem was solved.

Part 1: Ray Generation and Intersection

Ray Generation and Primitive Intersection

Ray generation involves generating some number of rays that goes through some pixel of the image space. The ray is randomly offset so that it goes through some random part of the pixel instead of always going through the center or the bottom-left corner of the pixel. Next, the ray's direction is converted to the camera space using the lerp equation: $$v_0 + x(v_1 - v_0) \text{ where } v_0 = -tan(0.5 * hFov), v_1 = tan(0.5 * hFov) \text{ for the x coordinate}$$ $$v_0 + y(v_1 - v_0) \text{ where } v_0 = -tan(0.5 * vFov), v_1 = tan(0.5 * vFov) \text{ for the y coordinate}$$ Then, the ray's direction is transformed to the world space using the c2w matrix. I did not need to transform the ray's origin to world space since the camera position was already in world space. This world space ray is then tested against triangles and spheres in the scene to see if the ray intersects with any of them. The closest intersection is used if it exists and the normal at the intersection point is used to determine the color used in the shading of the pixel.

Ray-Triangle Intersection Test Algorithm

My ray-triangle intersection test uses the Möller-Trumbore Algorithm to calculate t, b1, b2 where t is the time of intersection, and b1 and b2 are Barycentric coordinates. The third Barycentric coordinate can be found using $b0 = 1 - b1 - b2$. To determine if the calculated intersection is inside the triangle, I first checked if all of the Barycentric coordinates added up to 1. Then, I checked if all the b0, b1, and b2 are in the range [0, 1]. Finally, I checked if t is positive and t is in between the near and far clipping planes to ensure that the triangle is not behind the ray's origin or outside the near or far clipping planes. If all the checks pass, the triangle was hit by the ray.

CBempty.dae rendering
CBspheres_lambertian.dae rendering
CBgems.dae rendering

Part 2: Bounding Volume Hierarchy

BVH Construction Algorithm with SAH

Originally, my BVH construction algorithm was just sorting the vector of primitives by their centroids along the longest axis, splitting the primitives into left and right halves at the midpoint of the axis, and recursing on the left and right halves. Unfortunately, when I tried to render test images for Part 3, I discovered that traversing the BVH was too slow which was most likely due to the suboptimal BVH construction heuristic I used.

To speed up my implementation, I rewrote my BVH construction algorithm to use the Surface Area Heuristic based on code examples from Physically Based Rendering: From Theory to Implementation (Third Edition) Section 4.3.2: The Surface Area Heuristic. My final BVH construction algorithm involves first determining which axis to split on by finding the longest axis of the bounding centroid box that contains the centroids of all primitives being considered at this level of the BVH. Next, I sliced the axis into 32 equal buckets and determined for each primitive which bucket that it belongs to using the centroid of the primitive's bounding box. To add the primitive to a bucket, I expanded the bucket's bounding box with the primitive's bounding box. Once each primitive have been placed in a bucket, I computed which bucket I should split at by calculating the cost of splitting at a particular bucket using the equation from PBRT: $$cost = 0.125 + \frac{c0 * b0.\text{surface_area} + c1 * b1.\text{surface_area}}{bbox.\text{surface_area}}$$ Where 0.125 is the estimated traversal cost, c0 and c1 are the number of primitives before and after the split point respectively, b0.surface_area and b1.surface_area are the surface areas of the bounding boxes containing the primitives before and after the split point respectively, and bbox.surface_area is the surface area of the bounding box containing the primitives from all of the buckets. Once the lowest cost is found over all the buckets and the optimal split point is determined, I split the primitives into a left and right half and called the construct_bvh method recursively on the left and right halves. The recursion stops once the number of primitives falls below max_leaf_size.

Scenes Only Renderable with BVH

CBdragon.dae rendering with 100,012 primitives
CBlucy.dae rendering with 133,796 primitives
dragon.dae rendering with 105,120 primitives

With and Without BVH Comparision

All comparisons were done on an idle hive machine with ./pathtracer -t 8 -r 800 600.

teapot.dae without BVH cow.dae without BVH CBcoil.dae without BVH

# of primitives: 2,464

Rendering time: 17.7809 seconds

# of rays traced: 475,205

Average speed: 0.0267 million rays/second

Averaged 570.685424 intersection tests/ray

# of primitives: 5,856

Rendering time: 41.4098 seconds

# of rays traced: 447,271

Average speed: 0.0108 million rays/second

Averaged 1,395.320106 intersection tests/ray

# of primitives: 7,884

Rendering time: 56.1299 seconds

# of rays traced: 461,537

Average speed: 0.0082 million rays/second

Averaged 1,847.080490 intersection tests/ray

teapot.dae with BVH cow.dae with BVH CBcoil.dae with BVH

# of primitives: 2,464

Rendering time: 0.1079 seconds

# of rays traced: 430,814

Average speed: 3.9931 million rays/second

Averaged 2.013851 intersection tests/ray

# of primitives: 5,856

Rendering time: 0.1535 seconds

# of rays traced: 437,380

Average speed: 2.8485 million rays/second

Averaged 2.598349 intersection tests/ray

# of primitives: 7,884

Rendering time: 0.0399 seconds

# of rays traced: 383,422

Average speed: 9.6002 million rays/second

Averaged 0.872047 intersection tests/ray

Based on the above table, my BVH with SAH performed very well by reducing the number of intersection tests per ray dramatically. For example, CBcoil.dae went from 1,847 down to 0.872 intersection tests/ray. On these moderately complex scenes, my BVH managed to lower the rendering times from tens of seconds to far less than one second which shows how important it is to have a BVH with a good heuristic when doing ray tracing since checking a ray with every single primitive is too inefficient when the ray would most likely miss most of them. The BVH can more efficiently test if a ray will intersect a primitive by checking to see if the ray will hit the bounding boxes of the BVH nodes first before continuing down to the leaf nodes where the final primitive intersection tests are done which are only a few because each leaf node only contains a few primitives.

Extra Credit: Comparison Between Midpoint Heuristic and SAH

teapot.dae with midpoint BVH cow.dae with midpoint BVH CBcoil.dae with midpoint BVH dragon.dae with midpoint BVH CBlucy.dae with midpoint BVH

# of primitives: 2,464

Rendering time: 0.1028 seconds

# of rays traced: 387,637

Average speed: 3.7710 million rays/second

Averaged 3.831814 intersection tests/ray

# of primitives: 5,856

Rendering time: 0.1702 seconds

# of rays traced: 420,536

Average speed: 2.4713 million rays/second

Averaged 6.322291 intersection tests/ray

# of primitives: 7,884

Rendering time: 0.0782 seconds

# of rays traced: 413,846

Average speed: 5.3623 million rays/second

Averaged 2.642814 intersection tests/ray

# of primitives: 105,120

Rendering time: 5.0082 seconds

# of rays traced: 447,995

Average speed: 0.0895 million rays/second

Averaged 172.537935 intersection tests/ray

# of primitives: 133,796

Rendering time: 0.1821 seconds

# of rays traced: 420,881

Average speed: 2.3118 million rays/second

Averaged 6.579796 intersection tests/ray

teapot.dae with SAH BVH cow.dae with SAH BVH CBcoil.dae with SAH BVH dragon.dae with SAH BVH CBlucy.dae with SAH BVH

# of primitives: 2,464

Rendering time: 0.1079 seconds

# of rays traced: 430,814

Average speed: 3.9931 million rays/second

Averaged 2.013851 intersection tests/ray

# of primitives: 5,856

Rendering time: 0.1535 seconds

# of rays traced: 437,380

Average speed: 2.8485 million rays/second

Averaged 2.598349 intersection tests/ray

# of primitives: 7,884

Rendering time: 0.0399 seconds

# of rays traced: 383,422

Average speed: 9.6002 million rays/second

Averaged 0.872047 intersection tests/ray

# of primitives: 105,120

Rendering time: 0.0585 seconds

# of rays traced: 402,999

Average speed: 6.8858 million rays/second

Averaged 1.822925 intersection tests/ray

# of primitives: 133,796

Rendering time: 0.0381 seconds

# of rays traced: 399,009

Average speed: 10.4636 million rays/second

Averaged 0.893251 intersection tests/ray

Based on the above data, it can be observed that the SAH performs consistently better than the midpoint heuristic from small to large scenes. The dragon.dae test is interesting because it shows how the midpoint heuristic and the SAH can be slower for certain scenes but SAH does not slow down as much as the midpoint heuristic.

Part 3: Direct Illumination

Uniform Hemisphere Sampling for Direct Lighting

I implemented direct lighting with uniform hemisphere sampling by casting a new ray from the original hit point in a new direction $(\omega_j)$ sampled from the uniform hemisphere. If the new ray hits a light, the emission of the light is incorporated into the Monte Carlo Estimator version of the Reflection Equation: $$\frac{1}{N} \sum_{j=1}^N \frac{f_r(p, \omega_j \rightarrow \omega_r)L_i(p, \omega_j) cos(\theta_j)}{p(\omega_j)} \label{refEq}\tag{1}$$ Where N is the total number of lighting samples, $f_r$ is the bsdf function of the originally hit surface, $L_i$ is the emission of the light, $cos(\theta_j)$ is the angle between the original surface normal and $\omega_j$, and $p(\omega_j)$ is the probability of taking a sample from the uniform hemisphere ($\frac{1}{2\pi}$).

Light Sampling for Direct Lighting

I implemented direct lighting with Importance Sampling by iterating over all lights in the scene and from the original hit point casting ns_area_light rays toward area lights and only one ray toward point lights since point lights only needed to be sampled once. If the new rays end up hitting some object on the way to the light, then the light's radiance is not included in the final luminance output, resulting in the appearance of shadows in the final image output. Otherwise, the light's radiance will be included in the same equation $\ref{refEq}$ with a different $p(\omega_j)$ obtained from light->sample_L since rays are now only going toward lights instead of in a random direction through a hemisphere.

Uniform Hemisphere Sampling vs. Importance Sampling Images

CBspheres_lambertian.dae with Hemisphere Sampling
CBspheres_lambertian.dae with Importance Sampling

1, 4, 16, and 64 Light Rays Light Sampling Images

1 Light Rays
4 Light Rays
16 Light Rays
64 Light Rays

In the above 4 images, it can be observed that the soft shadows become less and less noisy as the number of light rays used increases. This effect is most apparent around the outer edges of the shadows.

Uniform Hemisphere Sampling vs. Lighting Importance Sampling

Importance Sampling is better than Uniform Hemisphere Sampling in terms of image quality because the images rendered with Importance Sampling are far less noisier than those rendered with Uniform Hemisphere Sampling with the same sampling settings. This can be observed in the 2 images from the Uniform Hemisphere Sampling vs. Importance Sampling Images section. Additionally, the rendering time is better (37.3683s vs. 20.7805s) for the Importance Sampling case because light rays are directed toward lights instead of mostly missing them in the Uniform Hemisphere case. Thus, Importance Sampling converges faster toward the desired result. Another important property of Importance Sampling is the fact that it can direct rays toward point lights in the scene while Uniform Hemisphere Sampling will most likely always miss hitting point lights and so their effects such as shadows from point lights will be missing from the scene.

Part 4: Global Illumination

Indirect Lighting Implementation

To implement indirect lighting, I called the one_bounce_radiance function inside of my at_least_one_bounce_radiance function to get the radiance from lights shining on the original hit point which is added to L_out. Then, I called the sample_f method of the bsdf of the surface that was originally intersected to get a new $\omega_j$ and the radiance for the given $\omega_o$ and the new $\omega_j$, and pdf. A new ray is created with the origin being the original hit point and the direction being the new $\omega_j$. This new ray is sent into the scene and checked for a intersection. If there was an intersection with a new surface, I checked if the calculation should continue if this was the very first bounce or by using Russian Roulette with a continuation probability of 0.65 to prevent infinite recursion. The same estimator equation $\ref{refEq}$ is used but this time $L_i$ will be a recursive function (at_least_one_bounce_radiance) to determine how much light is arriving at the original hit point from some arbitrary amount of further ray bounces into the scene. Additionally, if Russian Roulette was used, $p(\omega_j)$ is multiplied by the continuation probability of 0.65 to ensure that the expected value of the original estimator equation $\ref{refEq}$ remains the same. The output of the equation is added to L_out and L_out is returned when the ray is terminated to complete the recursive call chain.

Images with Global Illumination

CBspheres_lambertian.dae
CBbunny.dae rendering
banana.dae rendering

Direct Illumination vs. Indirect Illumination Scene

CBspheres_lambertian.dae with Direct illumination
CBspheres_lambertian.dae with indirect illumination

0, 1, 2, 3, and 100 Ray Depth Images

Ray depth 0
Ray depth 1
Ray depth 2
Ray depth 3
Ray depth 100

1, 2, 4, 8, 16, 64, and 1024 Samples/Pixel Rate with 4 Light Rays

1 samples/pixel
2 samples/pixel
4 samples/pixel
8 samples/pixel
16 samples/pixel
64 samples/pixel
1024 samples/pixel

Part 5: Adaptive Sampling

Adaptive Sampling Implementation

I implemented Adaptive Sampling by computing for a pixel the average $(\mu)$ and standard deviation $(\sigma)$ of its illuminance based on the number of samples taken so far. Then, for every samplesPerBatch I calculated $I = 1.96*\frac{\sigma}{\sqrt{n}}$ where n is the current number of samples and checked if $I \le maxTolerance * \mu$. If true, then the pixel can be considered converged and the sampling loop is broken out of early. If false, then the sampling continues until the convergence criterial is met or the maximum amount of samples is reached.

2048 Samples/Pixel Image with Sampling Rate Image

CBspheres_lambertian.dae noise free rendering
CBspheres_lambertian.dae rate image

Blue is the lowest number of samples for a pixel and red the highest. Note how the sampling rate varies across the image.