Skip to content

Instantly share code, notes, and snippets.

@bhishanpdl
Last active December 14, 2018 05:57
Show Gist options
  • Save bhishanpdl/57ac011a5f7585c83dd9d69a5764ce2a to your computer and use it in GitHub Desktop.
Save bhishanpdl/57ac011a5f7585c83dd9d69a5764ce2a to your computer and use it in GitHub Desktop.

Dependent Sample

e.g drug test on sampe samples before and after

xbar = df['diff'].mean()
s = df['diff'].std()
n = df.shape[0]
std_err = s/np.sqrt(n)

mu0 = 0.0
T_score = (xbar - mu0) / std_err

n = 10
nsided = 1
dof = n-1
p_value = scipy.stats.t.sf(np.abs(T_score), dof)*nsided # sf is survival function

# REJECT IF p-value < significance.

Independent samples, known variance

SE = sqrt(sigma1**2/n1 + sigma2**2/n2)
Z = (xbar - mu0) / sqrt(sigma1**2/n1 + sigma2**2/n2)

independent samples, unknown variances but equal

# e.g nx,ny = 10,8
# eg apple prices in NY and LA.

# pooled variance
sp**2 =  (nx-1) * sx**2 + (ny-1) * sy**2 / ( nx + ny -2)

# standard error
std_err = sqrt(sp**2/nx + sp**2/ny)

# T-score
T = (xbar - ybar) / std_err

# p-value
nsided = 2
dof = nx + ny -2
p_value = scipy.stats.t.sf(np.abs(T),dof)*nsided 

Rules of thumb

  • Use T-score is sample size is small or variances are not known.

  • Reject null hypothesis when T-score is bigger than 2.

  • Generally, for Z and T, a values higher than 4 is extremely significant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment