はじめに

SampleNet: Differentiable Point Cloud Samplingを読んだのでメモ． Neural Netに組み込んでend-to-endで学習可能な点群のサンプリングを提案．

Method

全体の構造はFig. 3の通り．課題として $n$ 点の三次元入力 $P\in\mathbb{R}^{n\times3}$ から $m\lt n$ 点 $R^\ast\in\mathbb{R}^{m\times 3}$ をサンプリングする．

提案するSampleNetは $P$ を入力とし，PointNet構造で $P$ より小さい点群集合 $Q$ を生成する（これはサンプリングではなく生成）．生成された $Q$ と元の点群 $P$ を使って， $P$ に対して尤もらしい点群を得るsoft projectionという演算からなる．

目的関数としては以下を最小化する．

$\displaystyle \mathcal{L}^{samp}_{total}=\mathcal{L}_{tesk}(R)+\alpha\mathcal{L}_{simplify}(Q,P)+\lambda\mathcal{L}_{project}$

$\mathcal{L}_{task}(R)$ はサンプリングとは関係のない，真に解きたい課題(分類やセグメンテーション)の損失， $\mathcal{L}_{simplify}(Q,P)$ は生成された点群 $Q$ と $P$ の距離， $\mathcal{L}_{project}$ は前述のsoft projectionの損失を表す．

Simplify

点群 $P\in\mathbb{R}^{n\times3}$ から次を満たす部分集合 $R^\ast\in\mathbb{R}^{m\times 3}$ を選び出すというのがここでのサンプリングの気持ち．

$\displaystyle R^\ast=\underset{R}{\mathrm{arg}\min}\mathcal{F}(T(R)),R\subseteq P,|R|=m\leq n$

$T$ はあるタスクに対するモデルを表しており， $\mathcal{F}$ はタスクの目的関数．要は解きたい課題に対する損失を最小化する部分集合を見つけたいということ．

ここではPointNet構造のニューラルネットで $R^\ast$ の代わりに適当な部分集合 $Q$ を生成する．尤もらしい $Q$ を生成するため，まず次のneareest neighbor lossを定義する．

$\displaystyle \mathcal{L}_{a}(Q,P)=\frac{1}{|Q|}\sum_{\mathbf{q}\in Q}\min_{\mathbf{p}\in P}\|\mathbf{q}-\mathbf{p}\|^2_2$

さらに，次のmaximal nearest neighbor lossを定義．

$\displaystyle \mathcal{L}_m(Q,P)=\max_{\mathbf{q}\in Q}\min_{\mathbf{p}\in P}\|\mathbf{q}-\mathbf{p}\|_2^2$

これらを使って $\mathcal{L}_{simplify}(Q,P)$ を次のように定義する．

$\displaystyle \mathcal{L}_{simplify}(Q,P)=\mathcal{L}_a(Q,P)+\beta\mathcal{L}_m(Q,P)+(\gamma+\delta|Q|)\mathcal{L}_a(P,Q)$

これにより $Q$ が $Q\subseteq P$ という関係を満たすことを期待する．さらに，タスクに対する損失最小化の気持ちを入れるため， $Q$ を生成するモデル（simplification network）は次の損失により学習される．

$\displaystyle \mathcal{L}_s(Q,P)=\mathcal{L}_{task}(Q)+\alpha\mathcal{L}_{simplify}(Q,P)$

Project

$Q$ はどうしても生成された点なので $P$ からのサンプルになることはできないため，少しでも $Q\subseteq P$ に近づけるため，soft projectionを行う．

やり方は非常に単純で $\mathbf{q}\in Q$ の各点に対し $P$ における $k$ nearest neighborを計算．それら $k$ nearest neighborの点群の線形和 $\mathbf{r}=\sum_{i\in\mathcal{N}_P(\mathbf{q})}w_i\mathbf{p}_i$ で新たな点 $\mathbf{r}$ を生成する．重み $w_i$ は次のように計算する．

$\displaystyle w_i=\frac{\exp(-d^2_i/t^2)}{\sum_{j\in\mathcal{N}_P(\mathbf{q})}\exp(-d^2_j/t^2)}$

$d_i=\|\mathbf{q}-\mathbf{p}_i\|_2$ で $t$ は学習可能な温度パラメータ(二乗は非負にするため)．これは温度が0に近づけば，最近傍の点を選ぶことになるため，極限においては $P$ の部分集合をサンプリングすることに等しい．そのため，soft projectionに関する損失は $\mathcal{L}_{projection}=t^2$ となる．