Gradient Tape - Vivek's Digital Garden

# Gradient Tape Gradient tape is key to automatic differentiation and [[backward propagation|backprop]] in [[Tensorflow]]. Using gradient "tape" you can record the forward prop operations. This helps Tensorflow figure out what needs to be done in backward prop. "Taping" works only on variables with `trainable` set to `True` (default) ```python t = tf.Variable(3.0) with tf.GradientTape() as tape: y = x**2 dy_dx = tape.gradient(y, x) dy_dx.numpy # should give 6 ``` By default, the resources held by a `GradientTape` are released as soon as `GradientTape.gradient()` method is called. To compute multiple gradients over the same computation, create a persistent gradient tape. This allows multiple calls to the `gradient()` method as resources are released when the tape object is garbage collected By default Gradient tape only watches trainable `tf.Variable`s. To record gradients with respect to a `tf.Tensor`, you need to call `GradientTape.watch(x)`: ```python x = tf.constant(3.0) with tf.GradientTape() as tape: tape.watch(x) y = x**2 # dy = 2x * dx dy_dx = tape.gradient(y, x) print(dy_dx.numpy()) # should give 6 ``` When a target is not connected to a source you will get a gradient of `None`. Integers and strings are not differentiable ## Example ```python # training data x_train = [-1.0, 0.0, 1.0, 2.0, 3.0, 4.0] y_train = [-3.0, -1.0, 1.0, 3.0, 5.0, 7.0] # y = 2*x - 1 # trainable variables w = tf.Variable(random.random(), trainable=True) b = tf.Variable(random.random(), trainable=True) # Loss function def simple_loss(real_y, pred_y): return tf.abs(real_y - pred_y) # learning rate learning_rate = 0.001 def fit_data(real_x, real_y): with tf.GradientTape(persistent=True) as tape: # forward prop - make prediction pred_y = w * real_x + b # calculate loss reg_loss = simple_loss(real_y, pred_y) # calculate gradient with respect to variables w_grad = tape.gradient(reg_loss, w) b_grad = tape.gradient(reg_loss, b) # update variables w.assign_sub(w_grad * learning_rate) b.assign_sub(b_grad * learning_rate) for _ in range(500): fit_data(x_train, y_train) print(f"y = {w.numpy()} * x + {b.numpy()}") ``` ## Gradient descent with Gradient Tape ```python def train_step(images, labels): logits = model(images, training=True) loss_value = loss_object(labels, logits) loss_history.append(loss_value.numpy().mean()) grads = tape.gradient(loss_value, model.trainable_variables) optimizer.apply_gradients(zip(grads, model.trainable_variables)) ``` ## Higher order gradients Consider the following $ \begin{align} y = x^3 \\ \frac{\partial{y}}{\partial{x}} = 3x^2\\ \frac{\partial^2{y}}{\partial{x^2}} = 6x \end{align} $ ```python x = tf.Variable(1.0) with tf.GradientTape() as tape_2: with tf.GradientTape() as tape_1: y = x * x * x dy_dx = tape_1.gradient(y, x) d2y_dx2 = tape2.gradient(dy_dx, x) print(d2y_dx2.numpy()) # should print 6 ```