ViT+CLIP+NeRF: Fewshot Learning, Putting NeRF on a Diet
Is anyone interested in Computer Vision Field?
Our team project goal is the implementation of this paper: Putting NeRF on a Diet: Semantically Consistent Few-Shot View Synthesis :
With this project, We will code a 3D neural scene representation (NeRF: Neural Radiances Field) estimated from a few images. Which is based on extracting the semantic information using a pre-trained visual encoder such as CLIP, a Vision Transformer.
1. Languages and Skills
Languages : Python, Pytorch, JAX
Other Skills : Git, Github, Cloud Computing Experience
2. Model / Baseline Code
ViT, CLIP : https://github.com/openai/CLIP
Meatlearning-NeRF (Learned Initialization Paper : JAX Based NeRF Code) : https://github.com/tancik/learnit
NeRF (Pytorch Based NeRF Code) : https://github.com/yenchenlin/nerf-pytorch
@valhalla Thank you for fast response. Could I reply after checking more detail on the paper? At now I guess that CLIP and NeRF could be available as JAX code but connecting CLIP and NeRF + Training unified model to new data would be challenge for our team. Also we will consider detail implementation on the paper.