CS180 Project 5

Kyle Wong

November 2023

Part 1: Fit a Neural Field to a 2D Image

With Neural Radiance Fields (NeRF) we can recreate 2D photos using machine learning. For part 1 of the project I used a neural network with following architecture

Here is the original fox photo we will try to recreate

This was able to achieve a PSNR of roughly 27 on the image of a fox with the learning rate = 1e-2 and L = 10 for the positional encoding.

foxorig — Fox with 256 channels and lr=1e-2

This was able to achieve a PSNR of roughly 28 on the image of a fox with the learning rate = 1e-3 and L = 10 and channels = 512 for the positional encoding.

foxchannels — Fox with 512 channels and lr=1e-3

This was able to achieve a PSNR of roughly 28 (slightly higher than with L = 10) on the image of a fox with the learning rate = 1e-3 and L = 15 and channels = 512 for the positional encoding.

foxlayers — Fox with 512 channels, lr=1e-3, and layers=15

PSNR was defined to be this:

Below is my training PSNR across all the iterations for the 3 different hyperparameter settings

Here is my own photo of my friend that I will try to recreate

Here are the photos at different iterations in the training loop

Below is my training PSNR across all the iterations for the 3 different hyperparameter settings

Part 2: Fit a Neural Radiance Field from Multi-view Images

Part 2.1: Create Rays from Cameras

In this step we are trying to create rays from cameras. The first step is to define the camera-to-world transformation matrices for each view in the scene. We implement a function x_w = transform(c2w, x_c) that transforms a point from camera to the world space. Given world_to_camera matrices you can go from world coordinate to camera coordinates. In my function I inverted the world-to-camera transformation and applied the inverse to the camera coordinates to get their corresponding world coordinates. Next, we implement a function that transforms a point from the pixel coordinate system back to the camera coordinate system: x_c = pixel_to_camera(K, uv, s). These matrices are then used to convert points from the camera coordinate system to the 3D world coordinate system. Given K you can go from camera coordinates to pixel coordinates that are scaled by s. I simply apply the function here. Lastly, we implement a function that convert a pixel coordinate to a ray with origin and noramlized direction: ray_o, ray_d = pixel_to_ray(K, c2w, uv). This function is necessary so we can sample points along theses rays and then do the volumetric rendering.

Part 2.2: Sampling

In this step we are trying to sample rays from our collection of images. I flattened all the pixels from all the images and do a random global sample of N pixels. I pass these into pixel_to_ray to get their corresponding rays. We also need to sample points along these rays which I do by creating an interval of 64 numbers, from 2 to 6, where for each t in that interval we have x = R_0 + R_d * t. We also jitter the values to prevent fixed coordinates and overfitting.

Part 2.3: Putting the Dataloading All Together

I made a class RaysData that has a sample_rays function. Below is the visualualization of these rays with their corresponding cameras with 100 rays.

Part 2.4: Neural Radiance Field

I implemented the neural network architecture for NERF in 3D with the original architecture shown below.

Part 2.5: Volume Rendering

I simply followed the formula that they gave and using torch.cumsum() to calculate T_i. Below are the renders of a one validation image at different iterations. To make training happen faster (around 10 minutes) I decided to use batch sizes of 4096 and I was able to achieve PSNR of around 23 with 1000 iterations. Finally there is the spherical rendering gif from the model after training for 1000 iterations.