# Gradient Tape
Gradient tape is key to automatic differentiation and [[backward propagation|backprop]] in [[Tensorflow]]. Using gradient "tape" you can record the forward prop operations. This helps Tensorflow figure out what needs to be done in backward prop.
"Taping" works only on variables with `trainable` set to `True` (default)
```python
t = tf.Variable(3.0)
with tf.GradientTape() as tape:
y = x**2
dy_dx = tape.gradient(y, x)
dy_dx.numpy
# should give 6
```
By default, the resources held by a `GradientTape` are released as soon as `GradientTape.gradient()` method is called. To compute multiple gradients over the same computation, create a persistent gradient tape. This allows multiple calls to the `gradient()` method as resources are released when the tape object is garbage collected
By default Gradient tape only watches trainable `tf.Variable`s.
To record gradients with respect to a `tf.Tensor`, you need to call `GradientTape.watch(x)`:
```python
x = tf.constant(3.0)
with tf.GradientTape() as tape:
tape.watch(x)
y = x**2
# dy = 2x * dx
dy_dx = tape.gradient(y, x)
print(dy_dx.numpy())
# should give 6
```
When a target is not connected to a source you will get a gradient of `None`.
Integers and strings are not differentiable
## Example
```python
# training data
x_train = [-1.0, 0.0, 1.0, 2.0, 3.0, 4.0]
y_train = [-3.0, -1.0, 1.0, 3.0, 5.0, 7.0] # y = 2*x - 1
# trainable variables
w = tf.Variable(random.random(), trainable=True)
b = tf.Variable(random.random(), trainable=True)
# Loss function
def simple_loss(real_y, pred_y):
return tf.abs(real_y - pred_y)
# learning rate
learning_rate = 0.001
def fit_data(real_x, real_y):
with tf.GradientTape(persistent=True) as tape:
# forward prop - make prediction
pred_y = w * real_x + b
# calculate loss
reg_loss = simple_loss(real_y, pred_y)
# calculate gradient with respect to variables
w_grad = tape.gradient(reg_loss, w)
b_grad = tape.gradient(reg_loss, b)
# update variables
w.assign_sub(w_grad * learning_rate)
b.assign_sub(b_grad * learning_rate)
for _ in range(500):
fit_data(x_train, y_train)
print(f"y = {w.numpy()} * x + {b.numpy()}")
```
## Gradient descent with Gradient Tape
```python
def train_step(images, labels):
logits = model(images, training=True)
loss_value = loss_object(labels, logits)
loss_history.append(loss_value.numpy().mean())
grads = tape.gradient(loss_value, model.trainable_variables)
optimizer.apply_gradients(zip(grads, model.trainable_variables))
```
## Higher order gradients
Consider the following
$
\begin{align}
y = x^3 \\
\frac{\partial{y}}{\partial{x}} = 3x^2\\
\frac{\partial^2{y}}{\partial{x^2}} = 6x
\end{align}
$
```python
x = tf.Variable(1.0)
with tf.GradientTape() as tape_2:
with tf.GradientTape() as tape_1:
y = x * x * x
dy_dx = tape_1.gradient(y, x)
d2y_dx2 = tape2.gradient(dy_dx, x)
print(d2y_dx2.numpy())
# should print 6
```