Skip to content

Instantly share code, notes, and snippets.

@sagarmainkar
Created August 25, 2018 04:40
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@sivi299
Copy link

sivi299 commented Aug 3, 2021

Hi,
There seems to be a flaw in the cost function

cost = (1/2*m) * np.sum(np.square(predictions-y))

Shouldn't it be

cost = 1/(2*m) * np.sum(np.square(predictions-y))

Nice walkthrough

@kenwyee
Copy link

kenwyee commented Aug 4, 2021

I was just about to make the same observation as sivi299 regarding the cost function.
In this case, since m is fixed from iteration to iteration when doing the gradient descent, I don't think it matters when it comes to optimizing the theta variable. As written, it's proportional to the mean-squared error, but it should optimize towards the same theta all the same.

The relative magnitudes of the cost function history curves differ between the gradient_descent and minibatch_gradient_descent due to different batch sizes when the cal_cost function is called, but since each algorithm uses the same number of points from iteration to iteration internally it should be OK.

@paocarvajal1912
Copy link

Hi, this is fantastic material; thanks so much.
I think there is a typo in equation (8). Shouldn't the X's subindex be j? Meaning X_j instead of X_0?
Regards,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment