Mini-batch gradient descent vs Momentum vs Adam
- Get link
- X
- Other Apps
In [22]:
# train 3-layer model
layers_dims = [train_X.shape[0], 5, 2, 1]
parameters = model(train_X, train_Y, layers_dims, optimizer = "gd")
# Predict
predictions = predict(train_X, train_Y, parameters)
# Plot decision boundary
plt.title("Model with Gradient Descent optimization")
axes = plt.gca()
axes.set_xlim([-1.5,2.5])
axes.set_ylim([-1,1.5])
plot_decision_boundary(lambda x: predict_dec(parameters, x.T), train_X, train_Y)
Cost after epoch 0: 0.702405 Cost after epoch 1000: 0.668101 Cost after epoch 2000: 0.635288 Cost after epoch 3000: 0.600491 Cost after epoch 4000: 0.573367 Cost after epoch 5000: 0.551977 Cost after epoch 6000: 0.532370 Cost after epoch 7000: 0.514007 Cost after epoch 8000: 0.496472 Cost after epoch 9000: 0.468014
Accuracy: 0.796666666667
In [23]:
# train 3-layer model
layers_dims = [train_X.shape[0], 5, 2, 1]
parameters = model(train_X, train_Y, layers_dims, beta = 0.9, optimizer = "momentum")
# Predict
predictions = predict(train_X, train_Y, parameters)
# Plot decision boundary
plt.title("Model with Momentum optimization")
axes = plt.gca()
axes.set_xlim([-1.5,2.5])
axes.set_ylim([-1,1.5])
plot_decision_boundary(lambda x: predict_dec(parameters, x.T), train_X, train_Y)
Cost after epoch 0: 0.702413 Cost after epoch 1000: 0.668167 Cost after epoch 2000: 0.635388 Cost after epoch 3000: 0.600591 Cost after epoch 4000: 0.573444 Cost after epoch 5000: 0.552058 Cost after epoch 6000: 0.532458 Cost after epoch 7000: 0.514101 Cost after epoch 8000: 0.496652 Cost after epoch 9000: 0.468160
Accuracy: 0.796666666667
In [24]:
# train 3-layer model
layers_dims = [train_X.shape[0], 5, 2, 1]
parameters = model(train_X, train_Y, layers_dims, optimizer = "adam")
# Predict
predictions = predict(train_X, train_Y, parameters)
# Plot decision boundary
plt.title("Model with Adam optimization")
axes = plt.gca()
axes.set_xlim([-1.5,2.5])
axes.set_ylim([-1,1.5])
plot_decision_boundary(lambda x: predict_dec(parameters, x.T), train_X, train_Y)
Cost after epoch 0: 0.702166 Cost after epoch 1000: 0.167845 Cost after epoch 2000: 0.141316 Cost after epoch 3000: 0.138788 Cost after epoch 4000: 0.136066 Cost after epoch 5000: 0.134240 Cost after epoch 6000: 0.131127 Cost after epoch 7000: 0.130216 Cost after epoch 8000: 0.129623 Cost after epoch 9000: 0.129118
Accuracy: 0.94
5.4 - Summary
optimization method | accuracy | cost shape |
Gradient descent | 79.7% | oscillations |
Momentum | 79.7% | oscillations |
Adam | 94% | smoother |
Momentum usually helps, but given the small learning rate and the simplistic dataset, its impact is almost negligeable. Also, the huge oscillations you see in the cost come from the fact that some mini-batches are more difficult than others for the optimization algorithm.
Adam on the other hand clearly outperforms mini-batch gradient descent and Momentum. If you run the model for more epochs on this simple dataset, all three methods will lead to very good results. However, you've seen that Adam converges a lot faster.
Some advantages of Adam include:
- Relatively low memory requirements (though higher than gradient descent and gradient descent with momentum)
- Usually works well even with little tuning of hyperparameters (except
α )
- Get link
- X
- Other Apps
Comments
Post a Comment