Image Segmentation - Vivek's Digital Garden

# Image Segmentation ## 1. What is it? ![[Pasted image 20210317223631.png]] Assign a class to every pixel. Pixels int eh same class have similar characteristics. Three steps to image segmentation 1. determine shape of each object in the image 2. partition an image inot multiple segment and have each associated with an object 3. classify each pixel in an image ## 2. Type types of image segmentation - Semantic Segmentation - all objects form a single classification - for example all people one class and table second class - Popular model DeepLab, U-Net - Instance Segmentation - each object is colored differently - each person is a different object - Popular model MaskR-CNN ![[Pasted image 20210213144156.png]] ![[Pasted image 20210319234423.png]] ## 3. Architectures: Typically constructed as an encoder-decoder - encoder: Feature extractor,(typically CNNs) down sample images - decoder: generate pixel-wise, label map, up sample images ### 3.1. Fully Convolutional Networks (FCN) ![[Pasted image 20210317225738.png]] - Can reuse the "feature extractors" of well known Object detection models such as VGG-16, ResNet-50, etc. - The decoder is denoted FCN-32, FCN-16 or FCN-8. The number denotes the stride during up sampling. Smaller the stride better the resolution of the final pixel map (sharper transition between image and background) - Remember that pooling reduces the size of the output. Stride also reduces the size of the output. - For upsampling FCN-16, FCN-8 more upstream layers are used in conjuction with the downstream layer to produce the final output. This results in better resolution - ![[Pasted image 20210317232332.png]] #### 3.1.1. Upsampling Method Type types of layers in Tensorflow 1. Simple Scaling `UpSampling2D` - Type types - Nearest (copy value from nearest pixel), Bilinear (linear interplation from nearby pixels) 2. Transposed Convolution (Deconvolution) -`Conv2DTranspose` - Reverse of a convolution operation - ![[Pasted image 20210317232953.png]] - Results will be close to original but not exact because data lost at the edge of the image ### 3.2. SegNet ![[Pasted image 20210317230300.png]] Decoder is a mirror image of the Encoder ### 3.3. U-Net ![[Pasted image 20210317230401.png]] U-Net is also a fully convolutional neural network. Skip connections are used between encoder and decoder ### 3.4. Mask R-CNN ![[Pasted image 20210317230436.png]] Builds off the Faster R-CNN. Mask--R CNN adds another branch after feature extraction to perform upsampling It can be used for instance segmentation ## 4. Loss function for Image segmentation ### 4.1. Terms Area of overlap = sum(true positives) i.e intersection combined area = total pixels in predicted segment + total pixels in true segment i.e. A+B area of union = combined area - area of overlap i.e A U B ```python for i in range(classes): intersection = np.sum((y_pred == i) * (y_true ==i)) y_true_area = np.sum((y_true == i)) y_pred_area = np.sum((y_pred == i)) combined_area = y_true_area + y_pred_area union_area = combined_area - intersection ``` ### 4.2. IOU `iou = intersection / union_area` smoothing factor added for reducing noise ### 4.3. DICE Score DICE score = 2 * Area of overlap / Combined Area DICE veer towards average performance and IOU towards worst case performance