The task of Novel Class Discovery (NCD) in semantic segmentation entails training a model able to accurately segment unlabelled (novel) classes, relying on the available supervision from annotated (base) classes.
Although extensively investigated in 2D image data, the extension of the NCD task to the domain of 3D point clouds represents a pioneering effort, characterized by assumptions and challenges that are not present in the 2D case.
This paper represents an advancement in the analysis of point cloud data in four directions. Firstly, it introduces the novel task of NCD for point cloud semantic segmentation. Secondly, it demonstrates that directly transposing the only existing NCD method for 2D image semantic segmentation to 3D data yields suboptimal results. Thirdly, a new NCD approach based on online clustering, uncertainty estimation, and semantic distillation is presented. Lastly, a novel evaluation protocol is proposed to rigorously assess the performance of NCD in point cloud semantic segmentation.
Through comprehensive evaluations on the SemanticKITTI, SemanticPOSS, and S3DIS datasets, the paper demonstrates substantial superiority of the proposed method over the considered baselines.
We employ two specialized UNet-like deep neural networks optimized for 3D data to extract features from input point clouds.
The primary network (composed of the backbone \(f_g \) and the heads \( f_n, f_b \), and \( f_s \)) is initially untrained, serving as the target for concurrent segmentation of base and novel classes. The secondary network \( f_a \) is auxiliary and pre-trained for task-agnostic 3D scene understanding.
For base class points, traditional supervised training is used with human annotations.
Training for novel classes pursues two objectives: aligning features with the semantic
knowledge of the auxiliary network and applying a self-supervised approach for pseudo-label
generation.
A class-balanced queue is used to maintain an equal representation of novel classes in processed
batches, regardless of their presence in point clouds.
Pseudo-label confidences are used to select high-quality points for the queue.
The optimization function is a combination of two components: the segmentation loss \( \ell_\text{S} \) and the alignment loss \( \ell_\text{A} \). The segmentation loss involves ground-truth labels with pseudo-labels. The alignment loss considers the alignment of features with the semantic knowledge obtained from the auxiliary network.
@article{riz2023snops,
title={Novel Class Discovery for 3D Point Cloud Semantic Segmentation},
author={Riz, Luigi and Saltori, Cristiano and Wang, Yiming and Ricci, Elisa and Poiesi, Fabio},
journal={}
year={2023}
}