当前位置：首页 > news >正文

《Keras 3 使用 NeRF 进行 3D 体积渲染》：此文为AI自动翻译

news 2025/9/16 8:39:00

《Keras 3 使用 NeRF 进行 3D 体积渲染》

作者： Aritra Roy Gosthipaty， Ritwik Raha
创建日期： 2021/08/09
最后修改时间： 2023/11/13
描述： 体积渲染的最小实现，如 NeRF 中所示。

（i）此示例使用 Keras 3

在 Colab 中查看

GitHub 源

介绍

在此示例中，我们展示了 Ben Mildenhall 等人的研究论文 NeRF：将场景表示为视图合成的神经辐射场的最小实现。铝。作者提出了一个巧妙的方法通过对 Volumetric 进行建模来合成场景的新视图 scene 函数。

为了帮助您直观地理解这一点，让我们从以下问题开始：是否可以将 network 图像中像素的位置，并询问 network 预测该位置的颜色？


图 1：为图像提供坐标的神经网络
作为输入，并要求预测坐标处的颜色。

神经网络将假设记住（过拟合）这个图像。这意味着我们的神经网络将对整个图像进行编码在其权重中。我们可以查询包含每个位置的神经网络，它最终会重建整个图像。


图 2：经过训练的神经网络从头开始重新创建图像。

现在出现了一个问题，我们如何扩展这个想法来学习 3D 体积场景？实施与上述类似的过程将需要了解每个体素（体积像素）。事实证明，这个是一项相当具有挑战性的任务。

论文的作者提出了一种简单而优雅的方法来学习使用场景的一些图像的 3D 场景。他们放弃了使用体素进行训练。网络学习对体积场景进行建模，从而生成模型 3D 场景的新视图（图像）在训练时未显示。

需要完全了解一些先决条件欣赏这个过程。我们以这样的方式构建示例：在开始之前，您将拥有所有必需的知识实现。

设置

import osos.environ["KERAS_BACKEND"] = "tensorflow"# Setting random seed to obtain reproducible results.
import tensorflow as tftf.random.set_seed(42)import keras
from keras import layersimport os
import glob
import imageio.v2 as imageio
import numpy as np
from tqdm import tqdm
import matplotlib.pyplot as plt# Initialize global variables.
AUTO = tf.data.AUTOTUNE
BATCH_SIZE = 5
NUM_SAMPLES = 32
POS_ENCODE_DIMS = 16
EPOCHS = 20

下载并加载数据

数据文件包含图像、摄像机姿势和焦距。图像是从多个摄像机角度拍摄的，如图 3 所示。npz


图 3：多个摄像机角度
来源：NeRF

要了解此上下文中的摄像机姿势，我们必须首先允许我们自己认为相机是现实世界之间的映射和 2-D 图像。


图 4：通过相机进行 3D 世界到 2D 图像的映射
来源：Mathworks

考虑以下等式：

其中 x 是 2-D 图像点，X 是 3-D 世界点，P 是相机矩阵。P 是一个 3 x 4 矩阵，它播放将真实世界对象映射到图像平面的关键作用。

相机矩阵是一个仿射变换矩阵，它是与 3 x 1 列连接以生成姿势矩阵。此矩阵是尺寸为 3 x 5，其中第一个 3 x 3 块位于相机的点中的视图。轴是或相机面向前方的位置。[image height, image width, focal length][down, right, backwards][-y, x, z]-z


图 5：仿射变换。

COLMAP 帧为或。读更多关于 COLMAP 的信息这里.[right, down, forwards][x, -y, -z]

# Download the data if it does not already exist.
url = ("http://cseweb.ucsd.edu/~viscomp/projects/LF/papers/ECCV20/nerf/tiny_nerf_data.npz"
)
data = keras.utils.get_file(origin=url)data = np.load(data)
images = data["images"]
im_shape = images.shape
(num_images, H, W, _) = images.shape
(poses, focal) = (data["poses"], data["focal"])# Plot a random image from the dataset for visualization.
plt.imshow(images[np.random.randint(low=0, high=num_images)])
plt.show()

PNG 格式

数据管道

现在您已经了解了相机矩阵的概念以及从 3D 场景到 2D 图像的映射，我们来谈谈逆向映射，即从 2D 图像到 3D 场景。

我们需要讨论使用光线投射和追踪进行体积渲染，这些都是常见的计算机图形技术。本节将帮助您快速掌握这些技术。

考虑一个带有像素的图像。我们通过每个像素发射一条光线并对射线上的一些点进行采样。射线通常由其中是参数的方程式是 origin 的，是单位方向向量，如图 6 所示。Nr(t) = o + tdtod


图 6：其中 t 为 3`r(t) = o + td`

在图 7 中，我们考虑一条光线，并在射线。这些采样点每个都有唯一的位置，并且光线具有视角。视角为特别有趣，因为我们可以穿过单个像素发射光线以许多不同的方式，每一种都有独特的视角。另一个这里需要注意的一点是添加到取样过程。我们为每个样本添加均匀的噪声，以便样本对应于 continuous 分布。在图 7 中，蓝点是均匀分布的样本，白点是随机放置在样本之间的。(x, y, z)(theta, phi)(t1, t2, t3)


图 7：从射线中采样点。

图 8 以 3D 形式展示了整个采样过程，其中可以看到从白色图像中射出的光线。这意味着每个像素将具有其相应的光线，并且每条光线将在不同的点。


图 8：从 3D 图像的所有像素拍摄光线

这些采样点充当 NeRF 模型的输入。模型为然后要求预测 RGB 颜色和该颜色的体积密度点。


图 9：数据管道
来源：NeRF

def encode_position(x):"""Encodes the position into its corresponding Fourier feature.    Args:
        x: The input coordinate.    Returns:
        Fourier features tensors of the position.
    """positions = [x]for i in range(POS_ENCODE_DIMS):for fn in [tf.sin, tf.cos]:positions.append(fn(2.0**i * x))return tf.concat(positions, axis=-1)def get_rays(height, width, focal, pose):"""Computes origin point and direction vector of rays.    Args:
        height: Height of the image.
        width: Width of the image.
        focal: The focal length between the images and the camera.
        pose: The pose matrix of the camera.    Returns:
        Tuple of origin point and direction vector for rays.
    """# Build a meshgrid for the rays.i, j = tf.meshgrid(tf.range(width, dtype=tf.float32),tf.range(height, dtype=tf.float32),indexing="xy",)# Normalize the x axis coordinates.transformed_i = (i - width * 0.5) / focal# Normalize the y axis coordinates.transformed_j = (j - height * 0.5) / focal# Create the direction unit vectors.directions = tf.stack([transformed_i, -transformed_j, -tf.ones_like(i)], axis=-1)# Get the camera matrix.camera_matrix = pose[:3, :3]height_width_focal = pose[:3, -1]# Get origins and directions for the rays.transformed_dirs = directions[..., None, :]camera_dirs = transformed_dirs * camera_matrixray_directions = tf.reduce_sum(camera_dirs, axis=-1)ray_origins = tf.broadcast_to(height_width_focal, tf.shape(ray_directions))# Return the origins and directions.return (ray_origins, ray_directions)def render_flat_rays(ray_origins, ray_directions, near, far, num_samples, rand=False):"""Renders the rays and flattens it.    Args:
        ray_origins: The origin points for rays.
        ray_directions: The direction unit vectors for the rays.
        near: The near bound of the volumetric scene.
        far: The far bound of the volumetric scene.
        num_samples: Number of sample points in a ray.
        rand: Choice for randomising the sampling strategy.    Returns:
       Tuple of flattened rays and sample points on each rays.
    """# Compute 3D query points.# Equation: r(t) = o+td -> Building the "t" here.t_vals =