Skip to content

Instantly share code, notes, and snippets.

@acl21
Last active October 14, 2020 09:57
Show Gist options
  • Save acl21/baf51844c288f28ac0d6a34e9edf1c4a to your computer and use it in GitHub Desktop.
Save acl21/baf51844c288f28ac0d6a34e9edf1c4a to your computer and use it in GitHub Desktop.
Nesterov Accelerated Gradient Descent
def do_nesterov_accelerated_gradient_descent():
w, b, eta = init_w, init_b, 1.0
prev_v_w, prev_v_b, gamma = 0, 0, 0.9
for i in range(max_epochs):
dw, db = 0, 0
# do partial update
v_w = gamma * prev_v_w
v_b = gamma * prev_v_b
for x,y in zip(X,Y):
# calculate gradients after partial update
dw += grad_w(w - v_w, b - v_b, x, y)
db += grad_b(w - v_w, b - v_b, x, y)
# now do the full update
v_w = gamma * prev_v_w + eta*dw
v_b = gamma * prev_v_b + eta*db
w = w - v_w
b = b - v-b
prev_v_w = v_w
prev_v_b = v_b
@sanfernoronha
Copy link

Can you explain why you're calculating v_w and v_b in line 7 and 8 when you are not using it anywhere within the loop and updating it with something else in line 14 and 15?

@DaisyGAN
Copy link

DaisyGAN commented Oct 14, 2020

Can you explain why you're calculating v_w and v_b in line 7 and 8 when you are not using it anywhere within the loop and updating it with something else in line 14 and 15?

Because it's wrong, this is just regular SGDM, if you wish to better understand NAG I suggest refering to;
https://jamesmccaffrey.wordpress.com/2017/07/24/neural-network-nesterov-momentum/#jp-carousel-6124

The author is still active on Github so I assume he is aware of this mistake, there are other such seemingly to be mistakes in his uploads (although one cannot verify without a comment before the "mistake" in question to explain why he chose to do this, one can only assume without a comment it is a mistake), but then, when one has to infer the process from algebraic notation I find most people get these sort of implementations wrong.

Including myself.

@acl21
Copy link
Author

acl21 commented Oct 14, 2020

I missed the earlier comment from @sanfernoronha, my apologies. Please see my changes to lines 11 to 12. It should make sense now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment