November 2023
With Neural Radiance Fields (NeRF) we can recreate 2D photos using machine learning. For part 1 of the project I used a neural network with following architecture
Here is the original fox photo we will try to recreate
This was able to achieve a PSNR of roughly 27 on the image of a fox with the learning rate = 1e-2 and L = 10 for the positional encoding.
This was able to achieve a PSNR of roughly 28 on the image of a fox with the learning rate = 1e-3 and L = 10 and channels = 512 for the positional encoding.
This was able to achieve a PSNR of roughly 28 (slightly higher than with L = 10) on the image of a fox with the learning rate = 1e-3 and L = 15 and channels = 512 for the positional encoding.
PSNR was defined to be this:
Below is my training PSNR across all the iterations for the 3 different hyperparameter settings
Here is my own photo of my friend that I will try to recreate
Here are the photos at different iterations in the training loop
Below is my training PSNR across all the iterations for the 3 different hyperparameter settings
In this step we are trying to create rays from cameras. The first step is to define the camera-to-world transformation matrices for each view in the scene. We implement a function x_w = transform(c2w, x_c) that transforms a point from camera to the world space. Given world_to_camera matrices you can go from world coordinate to camera coordinates. In my function I inverted the world-to-camera transformation and applied the inverse to the camera coordinates to get their corresponding world coordinates. Next, we implement a function that transforms a point from the pixel coordinate system back to the camera coordinate system: x_c = pixel_to_camera(K, uv, s). These matrices are then used to convert points from the camera coordinate system to the 3D world coordinate system. Given K you can go from camera coordinates to pixel coordinates that are scaled by s. I simply apply the function here. Lastly, we implement a function that convert a pixel coordinate to a ray with origin and noramlized direction: ray_o, ray_d = pixel_to_ray(K, c2w, uv). This function is necessary so we can sample points along theses rays and then do the volumetric rendering.
In this step we are trying to sample rays from our collection of images. I flattened all the pixels from all the images and do a random global sample of N pixels. I pass these into pixel_to_ray to get their corresponding rays. We also need to sample points along these rays which I do by creating an interval of 64 numbers, from 2 to 6, where for each t in that interval we have x = R_0 + R_d * t. We also jitter the values to prevent fixed coordinates and overfitting.
I made a class RaysData that has a sample_rays function. Below is the visualualization of these rays with their corresponding cameras with 100 rays.
I implemented the neural network architecture for NERF in 3D with the original architecture shown below.
I simply followed the formula that they gave and using torch.cumsum() to calculate T_i. Below are the renders of a one validation image at different iterations. To make training happen faster (around 10 minutes) I decided to use batch sizes of 4096 and I was able to achieve PSNR of around 23 with 1000 iterations. Finally there is the spherical rendering gif from the model after training for 1000 iterations.