Interpretability of Models - Vivek's Digital Garden

# Interpretability of Models You need to know what features the computer is picking up otherwise you will end up with models that are grossly over fitted. A model could be picking up on background noise to classify an image instead of the main part of the image itself. This is a deep topic that can take a whole course by itself ## Class Activation Map A class activation map is a matrix that shows what part of the image the model was paying attention to when it was classifying the image. Class activation maps can be generated from the last convolution layer and the output of the [[Pooling#Global Average Pooling|Global Average Pooling]] . Steps to create CAM are: 1. Take the last features from CNN 2. Zoom /upscale the layer using `scipy.ndimage.zoom()` to the size of the final image 3. Get the prediction of the model(if it is a classification problem) 4. Get the weights of the dense network connected to the CNN that is leading to the specific class prediction 5. do a dot product of the the zoomed in features and the weights of the prediction ## Saliency Map It is like importance of the features not just where the pixels are. Gradient of the loss with respect to the input image. This tells how the loss would change for small values of change in the pixels ```python # Siberian Husky's class ID in ImageNet class_index = 251 # If you downloaded the cat, use this line instead # class_index = 282 # Tabby Cat in ImageNet # number of classes in the model's training data num_classes = 1001 # convert to one hot representation to match our softmax activation in the model definition expected_output = tf.one_hot([class_index] * image.shape[0], num_classes) with tf.GradientTape() as tape: # cast image to float inputs = tf.cast(image, tf.float32) # watch the input pixels tape.watch(inputs) # generate the predictions predictions = model(inputs) # get the loss loss = tf.keras.losses.categorical_crossentropy( expected_output, predictions ) # get the gradient with respect to the inputs gradients = tape.gradient(loss, inputs) # reduce the RGB image to grayscale grayscale_tensor = tf.reduce_sum(tf.abs(gradients), axis=-1) # normalize the pixel values to be in the range [0, 255]. # the max value in the grayscale tensor will be pushed to 255. # the min value will be pushed to 0. normalized_tensor = tf.cast( 255 * (grayscale_tensor - tf.reduce_min(grayscale_tensor)) / (tf.reduce_max(grayscale_tensor) - tf.reduce_min(grayscale_tensor)), tf.uint8, ) # remove the channel dimension to make the tensor a 2d tensor normalized_tensor = tf.squeeze(normalized_tensor) ``` As you can see from the code the key is to run the model with gradient tape and inputs to be a watch variable. Then you can calculate gradients with respect to the inputs. Then you can flatten the gradients (across various channels) and plot them on top of the original image to visualize the regions that are having the most impact(gradient) ![[Pasted image 20210320183125.png]] ## GradCAM This is a combination of Class Activation Maps and Saliency Maps. GradCAM is a gradient weighted class activation map. ```python def get_CAM(processed_image, actual_label, layer_name='block5_conv3'): model_grad = Model([model.inputs], [model.get_layer(layer_name).output, model.output]) with tf.GradientTape() as tape: conv_output_values, predictions = model_grad(processed_image) # watch the conv_output_values tape.watch(conv_output_values) ## Use binary cross entropy loss ## actual_label is 0 if cat, 1 if dog # get prediction probability of dog # If model does well, # pred_prob should be close to 0 if cat, close to 1 if dog pred_prob = predictions[:,1] # make sure actual_label is a float, like the rest of the loss calculation actual_label = tf.cast(actual_label, dtype=tf.float32) # add a tiny value to avoid log of 0 smoothing = 0.00001 # Calculate loss as binary cross entropy loss = -1 * (actual_label * tf.math.log(pred_prob + smoothing) + (1 - actual_label) * tf.math.log(1 - pred_prob + smoothing)) print(f"binary loss: {loss}") # get the gradient of the loss with respect to the outputs of the last conv layer grads_values = tape.gradient(loss, conv_output_values) grads_values = K.mean(grads_values, axis=(0,1,2)) conv_output_values = np.squeeze(conv_output_values.numpy()) grads_values = grads_values.numpy() # weight the convolution outputs with the computed gradients for i in range(512): conv_output_values[:,:,i] *= grads_values[i] heatmap = np.mean(conv_output_values, axis=-1) heatmap = np.maximum(heatmap, 0) heatmap /= heatmap.max() del model_grad, conv_output_values, grads_values, loss return heatmap ```