【手把手教你】搭建神经网络（3D点云分类）

大家好，我是羽峰，今天要和大家分享的是一个基于PointNet的3D点云分类研究。文章会把整个代码进行分割讲解，完整看完，相信你一定会有所收获。该示例实现了开创性的点云深度学习论文PointNet (Qi et al., 2017)。有关PointNet的详细介绍，请参阅this blog post。欢迎关注“羽峰码字”目录1. 3D点云分类简介1.1 何为点云[1]1.2 点云的获取[1]1.

羽峰码字

4568人浏览 · 2021-05-23 10:28:34

羽峰码字 · 2021-05-23 10:28:34 发布

大家好，我是羽峰，今天要和大家分享的是一个基于PointNet的3D点云分类研究。文章会把整个代码进行分割讲解，完整看完，相信你一定会有所收获。

该示例实现了开创性的点云深度学习论文PointNet (Qi et al., 2017)。有关PointNet的详细介绍，请参阅 this blog post。

欢迎关注“羽峰码字”

2. 使用PointNet进行点云分类[2]

1. 3D点云分类简介

无序3D点集（即点云）的分类，检测和分割是计算机视觉中的核心问题。

1.1 何为点云[1]

我们在做 3D 视觉的时候，处理的主要是点云，点云就是一些点的集合。相对于图像，点云有其不可替代的优势——深度，也就是说三维点云直接提供了三维空间的数据，而图像则需要通过透视几何来反推三维数据。

其实点云是某个坐标系下的点的数据集。点包含了丰富的信息，包括三维坐标 X，Y，Z、颜色、分类值、强度值、时间等等。点云在组成特点上分为两种，一种是有序点云，一种是无序点云。
有序点云：一般由深度图还原的点云，有序点云按照图方阵一行一行的，从左上角到右下角排列，当然其中有一些无效点因为。有序点云按顺序排列，可以很容易的找到它的相邻点信息。有序点云在某些处理的时候还是很便利的，但是很多情况下是无法获取有序点云的。
无序点云：无序点云就是其中的点的集合，点排列之间没有任何顺序，点的顺序交换后没有任何影响。是比较普遍的点云形式，有序点云也可看做无序点云来处理。

1.2 点云的获取[1]

点云不是通过普通的相机拍摄得到的，一般是通过三维成像传感器获得，比如双目相机、三维扫描仪、RGB-D 相机等。目前主流的 RGB-D 相机有微软的 Kinect 系列、Intel 的 realsense 系列、structure sensor（需结合 iPad 使用）等。点云可通过扫描的 RGB-D 图像，以及扫描相机的内在参数创建点云，方法是通过相机校准，使用相机内在参数计算真实世界的点（x，y）。因此，RGB-D 图像是网格对齐的图像，而点云则是更稀疏的结构。此外，获得点云的较好方法还包括 LiDAR 激光探测与测量，主要通过星载、机载和地面三种方式获取。

根据激光测量原理得到的点云，包括三维坐标（XYZ）和激光反射强度（Intensity），强度信息与目标的表面材质、粗糙度、入射角方向以及仪器的发射能量、激光波长有关。根据摄影测量原理得到的点云，包括三维坐标（XYZ）和颜色信息（RGB）。结合激光测量和摄影测量原理得到点云，包括三维坐标（XYZ）、激光反射强度（Intensity）和颜色信息（RGB）。

1.3 点云的属性[1]

空间分辨率、点位精度、表面法向量等。
点云可以表达物体的空间轮廓和具体位置，我们能看到街道、房屋的形状，物体距离摄像机的距离也是可知的；其次，点云本身和视角无关，可以任意旋转，从不同角度和方向观察一个点云，而且不同的点云只要在同一个坐标系下就可以直接融合。

2. 使用PointNet进行点云分类[2]

2.1 基础API配置

如果使用colab，请先使用！pip安装trimesh来安装trimesh。

import os
import glob
import trimesh
import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from matplotlib import pyplot as plt

tf.random.set_seed(1234)

2.2 下载数据及数据预处理

我们使用ModelNet10模型数据集，即ModelNet40数据集的较小的10类版本。首先下载数据：

DATA_DIR = tf.keras.utils.get_file(
    "modelnet.zip",
    "http://3dvision.princeton.edu/projects/2014/3DShapeNets/ModelNet10.zip",
    extract=True,
)
DATA_DIR = os.path.join(os.path.dirname(DATA_DIR), "ModelNet10")

Downloading data from http://3dvision.princeton.edu/projects/2014/3DShapeNets/ModelNet10.zip
473407488/473402300 [==============================] - 13s 0us/step

我们可以使用trimesh包来读取和可视化.off网格文件。

mesh = trimesh.load(os.path.join(DATA_DIR, "chair/train/chair_0001.off"))
mesh.show()

要将网格文件转换为点云，我们首先需要对网格表面上的点进行采样。 .sample（）执行统一的随机采样。在这里，我们在2048个位置采样并在matplotlib中可视化。

points = mesh.sample(2048)

fig = plt.figure(figsize=(5, 5))
ax = fig.add_subplot(111, projection="3d")
ax.scatter(points[:, 0], points[:, 1], points[:, 2])
ax.set_axis_off()
plt.show()

要生成tf.data.Dataset（），我们需要首先通过ModelNet数据文件夹进行解析。每个网格都被加载并采样到点云中，然后再添加到标准python列表中并转换为numpy数组。我们还将当前枚举索引值存储为对象标签，并使用字典稍后对其进行调用。

def parse_dataset(num_points=2048):

    train_points = []
    train_labels = []
    test_points = []
    test_labels = []
    class_map = {}
    folders = glob.glob(os.path.join(DATA_DIR, "[!README]*"))

    for i, folder in enumerate(folders):
        print("processing class: {}".format(os.path.basename(folder)))
        # store folder name with ID so we can retrieve later
        class_map[i] = folder.split("/")[-1]
        # gather all files
        train_files = glob.glob(os.path.join(folder, "train/*"))
        test_files = glob.glob(os.path.join(folder, "test/*"))

        for f in train_files:
            train_points.append(trimesh.load(f).sample(num_points))
            train_labels.append(i)

        for f in test_files:
            test_points.append(trimesh.load(f).sample(num_points))
            test_labels.append(i)

    return (
        np.array(train_points),
        np.array(test_points),
        np.array(train_labels),
        np.array(test_labels),
        class_map,
    )

将点数设置为样本和批处理大小，然后解析数据集。这可能需要5分钟才能完成。

NUM_POINTS = 2048
NUM_CLASSES = 10
BATCH_SIZE = 32

train_points, test_points, train_labels, test_labels, CLASS_MAP = parse_dataset(
    NUM_POINTS
)

processing class: bathtub
processing class: desk
processing class: monitor
processing class: sofa
processing class: chair
processing class: toilet
processing class: dresser
processing class: table
processing class: bed
processing class: night_stand

现在，我们的数据可以读取到tf.data.Dataset（）对象中。我们将改组缓冲区大小设置为数据集的整个大小，因为在此之前按类对数据进行排序。在使用点云数据时，数据扩充很重要。我们创建一个增强函数来抖动和随机化火车数据集。

def augment(points, label):
    # jitter points
    points += tf.random.uniform(points.shape, -0.005, 0.005, dtype=tf.float64)
    # shuffle points
    points = tf.random.shuffle(points)
    return points, label


train_dataset = tf.data.Dataset.from_tensor_slices((train_points, train_labels))
test_dataset = tf.data.Dataset.from_tensor_slices((test_points, test_labels))

train_dataset = train_dataset.shuffle(len(train_points)).map(augment).batch(BATCH_SIZE)
test_dataset = test_dataset.shuffle(len(test_points)).batch(BATCH_SIZE)

2.3 建立模型

每个卷积和完全连接的层（端层除外）由卷积/密集->批归一化-> ReLU激活组成。

def conv_bn(x, filters):
    x = layers.Conv1D(filters, kernel_size=1, padding="valid")(x)
    x = layers.BatchNormalization(momentum=0.0)(x)
    return layers.Activation("relu")(x)


def dense_bn(x, filters):
    x = layers.Dense(filters)(x)
    x = layers.BatchNormalization(momentum=0.0)(x)
    return layers.Activation("relu")(x)

PointNet由两个核心组件组成。主MLP网络和变压器网络（T-net）。 T-net旨在通过自己的小型网络学习仿射变换矩阵。 T网被使用了两次。第一次将输入要素（n，3）转换为规范表示。第二个是仿射变换，用于在特征空间（n，3）中对齐。根据原始论文，我们将变换约束为接近正交矩阵（即|| X * X ^ T-I || = 0）。

class OrthogonalRegularizer(keras.regularizers.Regularizer):
    def __init__(self, num_features, l2reg=0.001):
        self.num_features = num_features
        self.l2reg = l2reg
        self.eye = tf.eye(num_features)

    def __call__(self, x):
        x = tf.reshape(x, (-1, self.num_features, self.num_features))
        xxt = tf.tensordot(x, x, axes=(2, 2))
        xxt = tf.reshape(xxt, (-1, self.num_features, self.num_features))
        return tf.reduce_sum(self.l2reg * tf.square(xxt - self.eye))

然后，我们可以定义一个通用功能来构建T-net层。

def tnet(inputs, num_features):

    # Initalise bias as the indentity matrix
    bias = keras.initializers.Constant(np.eye(num_features).flatten())
    reg = OrthogonalRegularizer(num_features)

    x = conv_bn(inputs, 32)
    x = conv_bn(x, 64)
    x = conv_bn(x, 512)
    x = layers.GlobalMaxPooling1D()(x)
    x = dense_bn(x, 256)
    x = dense_bn(x, 128)
    x = layers.Dense(
        num_features * num_features,
        kernel_initializer="zeros",
        bias_initializer=bias,
        activity_regularizer=reg,
    )(x)
    feat_T = layers.Reshape((num_features, num_features))(x)
    # Apply affine transformation to input features
    return layers.Dot(axes=(2, 1))([inputs, feat_T])

然后，可以使用将t-net微型模型放在图中的各层中的相同方式来实现主网络。在这里，我们复制了原始论文中发布的网络体系结构，但由于使用的是10类较小的ModelNet数据集，因此每层的权重只有一半。

inputs = keras.Input(shape=(NUM_POINTS, 3))

x = tnet(inputs, 3)
x = conv_bn(x, 32)
x = conv_bn(x, 32)
x = tnet(x, 32)
x = conv_bn(x, 32)
x = conv_bn(x, 64)
x = conv_bn(x, 512)
x = layers.GlobalMaxPooling1D()(x)
x = dense_bn(x, 256)
x = layers.Dropout(0.3)(x)
x = dense_bn(x, 128)
x = layers.Dropout(0.3)(x)

outputs = layers.Dense(NUM_CLASSES, activation="softmax")(x)

model = keras.Model(inputs=inputs, outputs=outputs, name="pointnet")
model.summary()

2.4 训练模型

一旦定义了模型，就可以像使用任何其他标准分类模型一样使用.compile（）和.fit（）对其进行训练。

model.compile(
    loss="sparse_categorical_crossentropy",
    optimizer=keras.optimizers.Adam(learning_rate=0.001),
    metrics=["sparse_categorical_accuracy"],
)

model.fit(train_dataset, epochs=20, validation_data=test_dataset)

Epoch 1/20
125/125 [==============================] - 28s 221ms/step - loss: 3.5897 - sparse_categorical_accuracy: 0.2724 - val_loss: 5804697916006203392.0000 - val_sparse_categorical_accuracy: 0.3073
Epoch 2/20
125/125 [==============================] - 27s 215ms/step - loss: 3.1970 - sparse_categorical_accuracy: 0.3443 - val_loss: 836343949164544.0000 - val_sparse_categorical_accuracy: 0.3425
Epoch 3/20
125/125 [==============================] - 27s 215ms/step - loss: 2.8959 - sparse_categorical_accuracy: 0.4260 - val_loss: 15107376738729984.0000 - val_sparse_categorical_accuracy: 0.3084
Epoch 4/20
125/125 [==============================] - 27s 215ms/step - loss: 2.7148 - sparse_categorical_accuracy: 0.4939 - val_loss: 6823221.0000 - val_sparse_categorical_accuracy: 0.3304
Epoch 5/20
125/125 [==============================] - 27s 215ms/step - loss: 2.5500 - sparse_categorical_accuracy: 0.5560 - val_loss: 675110905872323182592.0000 - val_sparse_categorical_accuracy: 0.4493
Epoch 6/20
125/125 [==============================] - 27s 215ms/step - loss: 2.3595 - sparse_categorical_accuracy: 0.6081 - val_loss: 600389124096.0000 - val_sparse_categorical_accuracy: 0.5749
Epoch 7/20
125/125 [==============================] - 27s 215ms/step - loss: 2.2485 - sparse_categorical_accuracy: 0.6394 - val_loss: 680423464582760103936.0000 - val_sparse_categorical_accuracy: 0.4912
Epoch 8/20
125/125 [==============================] - 27s 215ms/step - loss: 2.1945 - sparse_categorical_accuracy: 0.6575 - val_loss: 44108689408.0000 - val_sparse_categorical_accuracy: 0.6410
Epoch 9/20
125/125 [==============================] - 27s 215ms/step - loss: 2.1318 - sparse_categorical_accuracy: 0.6725 - val_loss: 873314112.0000 - val_sparse_categorical_accuracy: 0.6112
Epoch 10/20
125/125 [==============================] - 27s 215ms/step - loss: 2.0140 - sparse_categorical_accuracy: 0.7018 - val_loss: 13168980992.0000 - val_sparse_categorical_accuracy: 0.6784
Epoch 11/20
125/125 [==============================] - 27s 215ms/step - loss: 1.9929 - sparse_categorical_accuracy: 0.7056 - val_loss: 36888236785664.0000 - val_sparse_categorical_accuracy: 0.6586
Epoch 12/20

2.5 可视化预测

我们可以使用matplotlib可视化我们训练有素的模型性能。

data = test_dataset.take(1)

points, labels = list(data)[0]
points = points[:8, ...]
labels = labels[:8, ...]

# run test data through model
preds = model.predict(points)
preds = tf.math.argmax(preds, -1)

points = points.numpy()

# plot points with predicted class and label
fig = plt.figure(figsize=(15, 10))
for i in range(8):
    ax = fig.add_subplot(2, 4, i + 1, projection="3d")
    ax.scatter(points[i, :, 0], points[i, :, 1], points[i, :, 2])
    ax.set_title(
        "pred: {:}, label: {:}".format(
            CLASS_MAP[preds[i].numpy()], CLASS_MAP[labels.numpy()[i]]
        )
    )
    ax.set_axis_off()
plt.show()