Image-segmentation-using-RGB-D

Learning depth-based semantic segmentation of street scenes

This project is maintained by kangkanbharadwaj

Image-segmentation-using-RGB-D

Learning depth-based semantic segmentation of street scenes

Abstract

This work addresses multi-class semantic segmentation of street scenes by exploring depth information with RGB data. Our dataset comprises of street images from Berlin taken from four different camera angles and scanned using a laser scanner and later processed to create the depth images from 3D point clouds by projection. Our work also proposes an architecture model comprising of a Residual Network as an encoder and a UNet decoder for the Berlin set that learns good quality feature representation. We achieve a mean accuracy of 58.35%, mean pixel accuracy of 94.36% and mean IOU (Intersection over Union) of 51.91% on the test set. We further analyze the benefits that the model ex- hibits on certain classes when trained including depth to the RGB data with that of the model based only on RGB information. An alternative approach of feeding the depth information using a separate encoder was carried out to study the performance variation in segmentation and if it can bring any significant hike to it’s quality. And finally we draw a performance contrast of our network to one of the state-of-the-art models on our dataset.

Introduction

image segmentation vs semantic segmentation            image segmentation using depth

Motivation

image segmentation vs semantic segmentation           

Goals

  1. Achieve quality segmentation using RGB-D
  2. Comparison study of RGB-D to RGB segmentation
  3. Explore alternative approach to feed depth
  4. Compare our model to state-of-the-art

Approach (Data acquisition)

image segmentation vs semantic segmentation

List of labels to work with

image segmentation vs semantic segmentation

Approach (Architecture ResNet-34 fused with UNet decoder)

image segmentation vs semantic segmentation

Experiments and Results

ResNet-34 vs ResNet-50 vs ResNet-101

image segmentation vs semantic segmentation

image segmentation vs semantic segmentation image segmentation vs semantic segmentation

ResNet-34 on testset

image segmentation vs semantic segmentation

image segmentation vs semantic segmentation image segmentation vs semantic segmentation

RGB-D vs RGB segmentation

image segmentation vs semantic segmentation

image segmentation vs semantic segmentation image segmentation vs semantic segmentation image segmentation vs semantic segmentation image segmentation vs semantic segmentation

Reproducing FuseNet approach on ResNet-34 fused with UNet decoder

image segmentation vs semantic segmentation

Early Fusion vs Late Fusion

image segmentation vs semantic segmentation

image segmentation vs semantic segmentation image segmentation vs semantic segmentation image segmentation vs semantic segmentation

Early Fusion (smaller model) vs Late Fusion

image segmentation vs semantic segmentation

image segmentation vs semantic segmentation

Early Fusion vs Late Fusion1 vs Late Fusion2

image segmentation vs semantic segmentation

image segmentation vs semantic segmentation image segmentation vs semantic segmentation image segmentation vs semantic segmentation image segmentation vs semantic segmentation

ResNet-34 vs DDNet

Incorporating dense connections in UNet decoder

image segmentation vs semantic segmentation

Results

image segmentation vs semantic segmentation

image segmentation vs semantic segmentation image segmentation vs semantic segmentation

image segmentation vs semantic segmentation

image segmentation vs semantic segmentation image segmentation vs semantic segmentation

Swapping encoders and decoders of both architectures

DPDB-UNet vs Res-DDNet

image segmentation vs semantic segmentation

image segmentation vs semantic segmentation image segmentation vs semantic segmentation image segmentation vs semantic segmentation image segmentation vs semantic segmentation

DPDB-UNet variations

Results

image segmentation vs semantic segmentation

image segmentation vs semantic segmentation image segmentation vs semantic segmentation image segmentation vs semantic segmentation image segmentation vs semantic segmentation

Conclusions

  1. Proposed architecture model learns good quality feature representation
  2. Depth can deliver performance hike
  3. Late fusion is counter-productive
  4. On Berlin set, ResNet-34 performs better than DDNet