# Object Detection There are two steps in object detection (usually) 1. Region Proposal 2. Object detection and classification Techniques 1. Sliding window - slide a box over the image until you detect an image. Use multiple size boxes, scale image to multiple dimensions,etc to improve detection 2. Selective search Introduced in this paper http://www.huppelen.nl/publications/selectiveSearchDraft.pdf - It makes proposals of regions where the image could be detected. Once the object is detected the regions are coalesced into one. Its slow (by modern standards) NMS - Non-maximum suppression. i.e. suppress the box that has the lower IOU IOU - intersection over union. ## Object Detection Architectures ### 1.R-CNN Implements selective search with CNNs. 1. It takes an image and makes region proposals using the selective search method (~2000 regions) 2. Merge regions together based on color, texture, similarity etc. 3. Model extract regions from each of those region proposals with a CNN (pretrained). The regions may need to be warped to fit the CNN 4. Classify each region (using a SVM) and run a regression model to predict bounding box ![[Pasted image 20210221203352.png]] ### 2. Fast R-CNN Remove selective search algorithm (which is slow) 1. The algorithm expects the region proposals as inputs and doesn't generate them itself. 2. The features the CNN outputs is called a "feature map" because the features are position relative to each other as in the original image. 3. Take the input region proposals and extract the region of interest from the feature map and create "region proposal feature map". This is called Region of interest projection. There is one "Region proposal feature map" per region. and one "feature map" per image. 4. Take the "region proposal feature map" and down-sample to consistent height and width. (using a pooling layer). 5. Flatten (roi feature vector) and run it through fully connector layers to classify and create bounding boxes ![[Pasted image 20210221203428.png]] ### Faster R-CNN Added a region proposal network (RPN)(fully convolutional network) to the Fast R-CNN. It uses anchors or pirors ### Example [[Tensorflow Hub]] ## Object Detection API Object Detection API in tensorflow helps process images, run prediction and visualize object detection tasks. First close the tensoflow models repo `git clone https://github.com/tensorflow/models` There is a `objection-detection` folder in the `research` folder that contins the api for object detection. it can be installed with ``` sudo apt install -y protobuf-compiler cd models/research/ protoc object_detection/packages/tf2/setup.py . python -m pip install . ``` Once object detection api is install the following utilities can be used ```python from object_detection.utils import label_map_util from object_detection.utils import visualization_utils as viz_utils from object_detection.utils import ops as utils_ops ``` `label_maps_util` assigns index numbers to category names. categroy names are stored in a standard format in a `pbtxt` file `viz_utils` help drawing bounding box simpler. `viz_utils.visualize_boxes_and_labels_onimage_array` helps plot the box aroudn the pictures. `utils_ops` provide APIs to mask images when doing segmentation ## Using the Object Detection API The results of a model that uses the API has some standards. ```python results = hub_model(image) result = {key:value.numpy() for key,value in results.items()} result.keys() ``` might produce the following output ```python detection_scores # this is standard detection_keypoint_scores detection_classes # this is standard detection_keypoints num_detections detection_boxes # this is standard ``` Call the visualization utils for plotting it ```python viz_utils.visualize_boxes_and_labels_on_image_array( image=image_np_with_detections[0], boxes=resutl['detection_boxes'][0], classes=(result['detection_classes'][0] + label_id_offset).astype(int), scores=resutls['detection_scores'][0], category_index=category_indes, use_normalized_coordinates=True, min_score_thresh=0.40 ) ) ``` ## Retrain a RetinaNet In a Retinanet there is a "class subnet" in the output layer where the classification happens. "box subnet" is where the boxes are created. We are going to re train the "class subnet" but keep the box subnet. This is also called a "head" 1. Use config file to create the model 2. Here is the corresponding config file to the checkpoint we are loading ```python pipeline_config = 'https://github.com/tensorflow/models/blob/master/research/object_detection/configs/tf2/ssd_resnet50_v1_fpn_640x640_coco17_tpu-8.config' checkpoint_path = 'models/research/object_detection/test_data/checkpoint/ckpt-0' ``` ```python from object_detection.utils import config_util # load config files from object_detection.builders import model_builder # build the model configs = config_util.get_configs_from_pipeline_file(pipeline_config) model_config= configs['model'] model_config.ssd.num_classes = num_classes # override number of classes model_config.ssd.freeze_batchnorm = True # freeze it if you want detection_model=model_builder.build( model_config=model_config, is_training=True) ``` 2. Use checkpoint to load weights. This site has links to the various model zoos https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf2_detection_zoo.md you can de-compress this compressed file and you will see checkpoints http://download.tensorflow.org/models/object_detection/tf2/20200711/ssd_resnet50_v1_fpn_640x640_coco17_tpu-8.tar.gz 3. Use object detection api ```python from object_detection.utils import label_map_util from object_detection.utils import config_util from object_detection.utils import visuzlization_utils as viz_utils from object_detection.utils import colab_utils # for colab from object_detection.builders import model_builder ``` 4. Loading the checkpoints ![[Pasted image 20210310234222.png]] ```python fake_box_predictor = tf.compat.v2.train.Checkpoint( _base_tower_layers_for_heads=detection_model._box_predictor._base_tower_layers_for_heads, _box_prediction_head=detection_model._box_predictor._box_prediction_head ) # this is a throwaway predictor fake_model = tf.compat.v2.train.Checkpoint( _feature_extractor=detection_model._feature_extractor, _box_predictor=fake_box_predcitor I) ckpt = tf.compat.v2.train.Checkpoint(model=fake_model) ckpt.restore(checkpoint_path).expect_partial() # run model through dummy image so variables are created images, shapes = detection_model.preprocess(tf.zeros([1, 640, 640, 3])) prediction_dict = detection_model.predict(image, shapes) _ = detection_model.preprocess(prediction_dict, shapes) ``` 5. Custom training ```python for var in detection_model.trainable_variables: # if any([var.name.startswith(prefix) for prefix in prefixes_to_train]): to_fine_tune.append(var) with tf.GradientTape() as tape: preprocessed_images = tf.concat( [detection_model.preprocess(image_tensor)[0] for image_tensor in image_tensors], axis=0) prediction_dict = model.predict(preprocessed_images, shapes) losses_dict = model.loss(prediciton_dict, shapes) total_loss = losses_dict['Loss/localization_loss'] + losses_dict['Loss/classification_loss'] gradients = tape.gradient(total_loss, vars_to_fine_tune) optimizer.apply_gradients(zip(gradients, vars_to_fine_tune)) return total_loss ```