In a project that felt as long as its rendering times, we learned just how cool (and easily bug-ifiable) ray tracing was.
We explored primitive 3D geometric intersections (cubes, spheres, and triangle intersections with rays), which is applicable not just in ray tracing, but other areas such as collision detection in game engines or autonomous driving. And not just the math, but also the CS, the data structures that speed up collision detection.
We then went into tackling the problem of estimating how real life would behave. We couldn't simply trace all possible paths light could take, because there are an infinite number of those paths, so we used statistics and sampling to model this. We first modeled first how a single ray of light would look like if it didn't bounce, then used a recursive algorithm that simulated light bouncing to generate our images. In this recursive algorithm, in a rather clever application of statistics, we sampled first uniformly where light would bounce, and used this to estimate how much light overall came in.
We then extended this to importance sampling, since we knew light was more likely to come from light sources, creating an unbiased estimator that converged faster than our naive uniform sampling. We then used these to simulate arbitrary amounts of bouncing, creating beautiful color bleeding images, and lastly using the fact that we had an underlying probability distribution to cut down on rendering times, since some pixels converged to the actual realistic values faster than others.
Ray generation was done by taking all points (x, y) that we wanted to render and transforming it into the camera space, then using the provided camera to world matrix to convert to world space. (The details of this transformation are below, in the image taken from the spec)
Hence, it went image -> camera -> world space. We first translated (x, y) by (-.5, -.5) so that the center was at (0, 0) like the image, then scaled by 2 * (tan(hFov/2), tan(vFov/2)) since our top was now (.5, .5). Multiplying by 2 set it back to (1, 1), then we scaled it by the respective tangent equations like in the image. Lastly, since the camera to world matrix was provided to us, we simply multiplied that to convert to world coordinates. This gave us a way to project a ray from our image, the screen, to the world.
We next did random sampling for each pixel in the image. Since each pixel was a 1x1, we sampled ns_aa rays within this unit box, and we averaged the radiance of these ns_aa sampled rays, and colored the pixel using that radiance
For intersecting, there were two primitive shapes we have: spheres and triangles.
For triangles, we first used a naive approach, where we solved for t using the plane equation and ray equation, then using t to find the point of intersection, then seeing if that point was inside the triangle using the test from project 1 (this was all done in has_intersection).
To speed it up, we used the Moller Trombore algorithm, which used some algebraic manipulation of the naive equations to reduce the number of multiply and add operations needed to find t, and as a bonus gave us the barycentric coordinates along with t. We could thus easily tell if the point was in the triangle, since if any barycentric weight was outside the interval [0, 1], then the point wasn't in the triangle. It also allowed us to easily calculate the barycentric weighted normals.
Here is the Moller Trombore algorithm:
We then updated ray's max_t to be this t, and the intersection data accordingly, setting the bsdf, primitive, t, and normal of the ray's intersection with this triangle.
Here is how to solve for t:
Of course, since this involved solving a quadratic, there were three cases. Note that for the following cases, if none of our solved t were in the min_t and max_t range, we immediately knew that we didn't intersect.
1) There's no intersection, indicated by a negative determinant. We return false.
2) There's one intersection, where the determinant is 0 (hence 1 real t solution). We return the t and update the ray's max_t accordingly.
3) There's two intersections, since the determinant is >0 (so 2 real t solutions). We set the max_t to be the closer of the two intersections (given they're both valid), else we take the valid one.
We first start off by creating a node with a bounding box (axis aligned) that boxes all the primitives. We took advantage of looping through all these primitives to also calculate the average centroid of all these primitives, for speedup later.
If the number of items is our max_leaf_size limit, we end the function by making our node a leaf containing all these primitives.
Otherwise, we used a heuristic to split our primitives into left and right.
We originally used largest axis as our heuristic, since dividing the largest axis theoretically was a simple way to divide boxes in two such that generally primitives were split into the most evenly distributed divisions.
However, we tried using a surface area entropy heuristic to calculate the axes to split on later, to see if we could get a faster BVH. Entropy is
-sum(p(x) * log(p(x)))
Therefore, we would split along each axis, calculate the entropy of that split, then choose the axis with the highest entropy (and thus the most balanced split).
For our purposes, p(x) was the ratio of the surface area of the primitive (ie: x is the primitive) to the total surface area. Essentially, this is the probability that we hit the primitive if we hit this node.
Interestingly, our new heuristic did well for lucy, but was worse in comparison for most other things we tested. Maybe if we had more time, we could have played around a little more in making an entropy/probability based heuristic that performed better than the naive method, such as maybe changing the log base or reworking our probability equation.
Here is a table of results:
| DAE file | NAIVE's average intersections per ray | ENTROPY's average intersections per ray |
| Cow | 1.94 | 2.59 |
| Dragon | 1.98 | 6.27 |
| Lucy | 2.12 | 1.64 |
| Tea | 1.55 | 1.98 |
| MaxPlanck | 2.57 | 8.61 |
| Empty | 1.48 | 1.45 |
Reference screenshots:
| DAE file | NAIVE's average intersections per ray | ENTROPY's average intersections per ray |
| Cow |
|
|
| Dragon |
|
|
| Lucy |
|
|
| Tea |
|
|
| MaxPlanck |
|
|
| Empty |
|
|
We made sure that the only new memory we created was for the BVHNode itself, and rather than creating two new heap vector arrays to store the primitives for the left and right respectively, we made those temporary stack variables that we would use to rearrange our original list in place. We did this by, after choosing our axis, inserting all our items on the left and right into their respective arrays, then from our original list's first index inserting all the elements of the left array in, then once that finished, starting from the pointer we left off on, putting all the elements of the right in.
This avoided a lot of wasted memory; specifically, if our original memory was size 8, then we avoided duplicating it when we split into 4 and 4 elements, then again when we split into 2, 2, 2, 2, and so forth.
Mathematically, this saves
nlog(n)memory, where n is the number of primitives (and log(n) is the depth of the BVH tree). Pretty good I'd say!
We then recursed on to create the right and left branches.
Images only accelerated BVH can render (in reasonable time):
|
|
|
Accelerated vs Non-Accelerated:
|
|
|
|
As evident in the rendering times (eg: the cow had 1.95 vs 1067.14 average intersections per ray, and a roughly 916x speedup for accelerated), accelerated BVH greatly reduced traversal time. Mathematically, if we theoretically split everything well, we'd only traverse roughly
log(n)primitives, where n is the total number of primitives.
This is because a "balanced" BVH tree would have us able to rule out half of the primitives with each traversal down the tree. Of course, this is just in theory; with varying geometries, we'd get varying results, since factors like distance between nodes resulting in large gaps, spots of concentrated geometries, and other factors could lead to different results for different scenes.
For example, if we naively always split on the x axis, a ray across the x axis would intersect every bounding box, so we'd search all n primitives. However the same split would mean a ray across the y or z axis would be very efficient, only intersecting a single bounding box.
Our implementation hopes to be a little more general than simple largest axis split, since the probability distribution entropy aims at splitting such that it'd make the probability of intersecting either side as equal as possible, and the x speedup would suggest it is doing well.
For intersections, we saw if the ray hit the bounding box (if it didn't then we could stop), then recursed on the left and right. If we arrived at leaf nodes, we checked if any of its primitives intersected the ray.
Bounding box calculations were done with axis aligned bounding boxes (could be sped up using non-axis aligned boxes). We found, for all 6 sides of the bounding cube, the t where the ray hit the side, and from there returned t that were in the range min_t to max_t for the ray and also actually intersected the box. We did not update the ray's max_t because hitting a bounding box did not guarantee we would actually hit a primitive.
At a high level, direct illumination was implementing zero bounce illuminance, which is just lighting from the light itself, and one bounce illuminance, which is nothing if the bounce doesn't hit a light. Uniform hemisphere lighting is the "naive" approach, where a ray projected from for example the camera hits a point, which we'll call the hit point.
A random direction within the hemisphere around that hitpoint is chosen for the next bounce point (though for one bounce, there's no more bouncing). The amount of light, the radiance, is calculated using the reflectance equation
f(wi->wo) * Li * cos(theta) / p(wi)
Because it was uniform, p(wi) was a constant 1/2pi, since the surface area of a unit hemisphere is 2pi.
A succinct formula (although very incomplete in terms of the implementation details) for this would be
where N is the number of lights * samples per light, p is our hitpoint, w_j is wi, w_r is wo.
Importance sampling was more or less the same, in fact using the same equation, except we took advantage of the fact that most light comes directly from the light, so we sampled from a nonuniform distribution that more often gave wi that pointed towards the light(s). We also had to check if cos(theta) was positive, because then it bounced towards the light; otherwise, it pointed towards shadows, so, without this check, the shadows would be thought of as lights by the computer (so shadows would be white).
We also set the ray's min_t to be slightly positive so it didn't intersect with itself on the next bounce, and the max_t to the distance to the light (if it bounced to a light) since we didn't want it to keep bouncing if we already reached a light.
Below are some comparison between the two methods:
|
|
Above we see a vast difference between uniform hemisphere sampling and lighting sampling. The uniform hemisphere sampling is pixelated and dark compared to the image produced by lighting sampling, because the samples are less accurate, since it's uniform and tend to bounce anywhere, instead of towards important points (namely, the light sources).
We can also see a difference in the light above the bunny where the light box is clear and crisp in lighting sampling but not so much in uniform hemisphere sampling, which is again because the samples are less accurate.
This can be said about the shadows, as well on the bunny where it's harder to see where the shadows are in uniform sampling. In comparison, there is a clear difference between the lit part on the bunny and the shadow parts on the bunny, again due to the accuracy. Overall, the image produced by lighting sampling converges faster and is less grainy in terms of quality and lighting.Below we see various amounts of light rays. Because we are using 1 sample per pixel, the results are very noisy, but you can see that the tops of the spheres, which bounce towards the light, get less noisy with more light rays.
|
|
|
|
At a lower level, the hemisphere direct lighting function was implemented by iterating through all samples and sampling an object space ray direction, wi, from the hempisphere sampler every loop. The wi direction is converted into world space and along with hit_p (which is already in world space) is used to instatiate a new ray object. The ray's min_t is set to EPS_F in order to alleviate numerical precision issues. We created a new intersection object and if the light source intersects at hit_p on the surface, new_intersect is set to the appropriate values resulting from the intersection and we calculate the monte carlo estimate of the reflectance equation of the ray and add it to the sum of L_outs over all samples. To use the reflectance equation, we find the f term by using the inputed intersection object's bsdf with the inputs w_out and our sampled wi, the cosine theta term by calculating the dot product of the w_out direction and input intersection object's normal term in world space, and the L term by getting the emission from our new intersection object's bsdf term. After summing up L_out over all samples, L_out is normalized by dividing it with the total number of samples and 2 * PI. Using this algorithm, we can successfully estimate the direct lighting from a uniform hemisphere.
Our importance direct lighting function was implemented by iterating through every lights present in the scene and sampling the point light source. We first checked if the light was a delta light and if so, we sampled only once since all samples from a point light will be the same in that case. Otherwise, we sampled the number samples per area light source given to us. We then sample the light num_samples times by getting the radiance, setting wi (world space), distToLight, and pdf using our light's sample_L() function. Using wi (the incoming ray direction) and the given intersection object's normal, we calculate the cosine theta in object space, which is then used to check if the light is behind the surface at the hit point (cos_w > 0). If so, a new ray is casted with its min_t set to EPS_D and max_t set to distToLight - EPS_F to avoid intersecting with the light itself and a new intersect object instantiated. We then check to see if the new ray doesn't intersect with the hit point and if it doesn't intersect (shadow ray), we get the f value from the given intersection's bsdf and calculate L_out using the reflection equation given in the spec normalized by pdf. At the end of our sampling for every light, we normalize L_out by the number of samples. This algorithm allows us to render images that only have point lights.
GI builds off of the previous part. Part 3 was just 0 and 1 bounce, but this part recursively bounces, using the following algorithm:
If our ray's depth is 0, we have finished, so we just return the zero bounce radiance (we return a zero vector since we already do zero bounce in est_radiance_global_illumination).
Otherwise, we calculate the one bounce radiance for the ray, then do russian roulette to speed up our implementation. Our roulette probability is .65 arbitrarily, but we wanted it to be above .6 so that we wouldn't have to take as many samples (since a lower probability would result in darker images and more noise, due to less bouncing around).
We then sampled a random incoming direction, w_i, based on w_o, which was for the outgoing light direction. We setup a new ray with direction w_i to be our potential light bounce, and min_t to be a super small positive number so that we didn't intersect with ourselves during the next recursion.
We also tested if our new w_i ray left the scene (in which we returned our one bounce radiance for this level), otherwise we recursed, scaling our recursed value by the reflectance of the intersected material (fr), lambert's cosine law (cos_w, which dots wi in the object space with the normal in the object space), and the inverses of the probability of getting that sample and russian roulette probability.
The first two allow us to get the reflectance/bounciness of the light looking correct, and the last two ensure our estimate is unbiased. Here are some images rendered using global illumination:
1024 sample per pixel images, full global illumination
|
|
|
|
Here are images of indirect vs direct illumination. Direct is zero and one bounce, so no color bleeds, whereas indirect is two or more bounces.
Only Indirect vs Only Direct illumination (All 1024 samples per pixel, 16 light rays)
|
|
|
|
Here are images of different max depths (number of times it can bounce). The more bounces, the lighter the image (and color bleeds more).
Differing depths (all 1024 samples per pixel, 16 light rays)
|
|
|
|
Here are varying samples per pixel. Less samples render faster, at the cost of being less accurate/realistic, since it takes many samples before we converge to a "realistic" image.
Differing samples per pixel (all 4 light rays, 6 bounces)
|
|
|
|
|
|
|
At a high level, we know some statistics based on samples radiances we've calculated so far; since distributions with smaller standard deviations will converge more quickly to the mean, and more samples will increase convergence, we set I to be 1.96 (this magic number comes from the fact that this is calculating a 95% confidence interval, in statistical terms) times the standard deviation, divided by the number of samples.
To be more clear, here are the formulas:
where mu is the mean of the samples so far and sigma is the standard deviation of the samples so far (note that these are NOT the population parameters).
Over time, our sample standard deviation will approach the actual standard deviation (as well as the mean), and as our sample count increases, I will get smaller, so eventually we'll hit our max_tolerance * mean. From the properties mentioned above, this means faster converging pixels will have an I that is smaller than the threshold faster, so we can break more quickly.
To visualize, red sample rates mean that the pixel converged more slowly, and blue means the sample converged more quickly. We see the light converged more quickly, because it's a light, so we know very quickly what value it should be. Shadows however, did not bounce towards the light as often, so they converged less quickly, as well as corners, where light could get "trapped" bouncing back and forth.
|
|
|
|
At a lower level, our implementation of adaptive sampling involves adjusting the loop created after part 1.2 in raytrace_pixel(). The first thing that we did was create the variables s1 and s2 to keep track of the sum of every sample's illuminance to be used for the mean and variance calculations. For every sample, we added the illuminance of the sample's L_out generated by est_radiance_global_illumination() into s1 and the squared illuminance of L_out into s2. In the beginning of the loop, we check for convergence for every samplesPerBatch samples using the conditional statement count != 0 && count % samplesPerBatch == 0. In the convergence check, we calculate the mean, standard deviation, and I value as defined by the equations in the spec using s1 and s2 after converting our sample count variable into a double. If I ends up <= to maxTolerance * mean, we break the loop and stop tracing more rays for this pixel. At the end, we average the total L_out (the sum of all sampled L_out colors) using the number of actual samples we sampled (using the variable count) as well as setting sampleCountBuffer to the actual samples we took for the pixel.
We collaborated by taking turns "piloting," where we discussed approaches, then coded one at a time. However, we didn't always have time to meet together, so sometimes after discussing approaches, one of us would be the ones to actually implement it, without the other person's supervision. We then came back if the other became stuck, trying to debug. For example, one of us would notice the other person's render was a slight bit darker than the spec, and we'd trace through different flags (in that case it was adaptive sampling not setting ray.max_t correctly) until we found the problem. It went relatively well, although there were some hiccups with merge conflicts that led to weird bugs, but luckily we knew or learned more about resolving these. We learned more about remote software development, as well as more about how to debug each other's code and writing more understandable code, especially more mathematical code. It was quite a challenge, since debugging graphics code itself is difficult, but we learned a lot about writing code for more than one person, especially as it relates to graphics!