CS180 Project 5

Kyle Wong

November 2023

Part 1: Fit a Neural Field to a 2D Image

With Neural Radiance Fields (NeRF) we can recreate 2D photos using machine learning. For part 1 of the project I used a neural network with following architecture

2dnerf

Here is the original fox photo we will try to recreate

fox

This was able to achieve a PSNR of roughly 27 on the image of a fox with the learning rate = 1e-2 and L = 10 for the positional encoding.

foxorig

This was able to achieve a PSNR of roughly 28 on the image of a fox with the learning rate = 1e-3 and L = 10 and channels = 512 for the positional encoding.

foxchannels

This was able to achieve a PSNR of roughly 28 (slightly higher than with L = 10) on the image of a fox with the learning rate = 1e-3 and L = 15 and channels = 512 for the positional encoding.

foxlayers

PSNR was defined to be this:

psnr

Below is my training PSNR across all the iterations for the 3 different hyperparameter settings

psnr

Here is my own photo of my friend that I will try to recreate

man

Here are the photos at different iterations in the training loop

man

Below is my training PSNR across all the iterations for the 3 different hyperparameter settings

psnr

Part 2: Fit a Neural Radiance Field from Multi-view Images

Part 2.1: Create Rays from Cameras

In this step we are trying to create rays from cameras. The first step is to define the camera-to-world transformation matrices for each view in the scene. We implement a function x_w = transform(c2w, x_c) that transforms a point from camera to the world space. Given world_to_camera matrices you can go from world coordinate to camera coordinates. In my function I inverted the world-to-camera transformation and applied the inverse to the camera coordinates to get their corresponding world coordinates. Next, we implement a function that transforms a point from the pixel coordinate system back to the camera coordinate system: x_c = pixel_to_camera(K, uv, s). These matrices are then used to convert points from the camera coordinate system to the 3D world coordinate system. Given K you can go from camera coordinates to pixel coordinates that are scaled by s. I simply apply the function here. Lastly, we implement a function that convert a pixel coordinate to a ray with origin and noramlized direction: ray_o, ray_d = pixel_to_ray(K, c2w, uv). This function is necessary so we can sample points along theses rays and then do the volumetric rendering.

Part 2.2: Sampling

In this step we are trying to sample rays from our collection of images. I flattened all the pixels from all the images and do a random global sample of N pixels. I pass these into pixel_to_ray to get their corresponding rays. We also need to sample points along these rays which I do by creating an interval of 64 numbers, from 2 to 6, where for each t in that interval we have x = R_0 + R_d * t. We also jitter the values to prevent fixed coordinates and overfitting.

Part 2.3: Putting the Dataloading All Together

I made a class RaysData that has a sample_rays function. Below is the visualualization of these rays with their corresponding cameras with 100 rays.

rays

Part 2.4: Neural Radiance Field

I implemented the neural network architecture for NERF in 3D with the original architecture shown below.

nerf3d

Part 2.5: Volume Rendering

I simply followed the formula that they gave and using torch.cumsum() to calculate T_i. Below are the renders of a one validation image at different iterations. To make training happen faster (around 10 minutes) I decided to use batch sizes of 4096 and I was able to achieve PSNR of around 23 with 1000 iterations. Finally there is the spherical rendering gif from the model after training for 1000 iterations.

volrend
lego1
lego2
psnrlego
animation