-
-
Save petermchale/a4fc2ca750048d21a0cbb8fafcc690af to your computer and use it in GitHub Desktop.
Nice gist! Regarding the term that should be added and subtracted, I think you accidentally typed up the square inside of the expectation rather than outside of the expectation. This is necessary for the "rearranging terms, we may write the right-hand side above as" math to carry through.
@bradyneal, I understand what @sdangi meant. My issue is that I don't see why the suggested change is correct. I could be wrong, though. Happy to look over the algebra if you or @sdangi are interested in providing it. Best, Peter
@bradyneal @sdangi : you are correct! Thanks! I've updated the jupyter notebook accordingly.
In section 'Reducible and irreducible error' why is E_e[2\epsilon (f - h)] equal to 0?
I agree that E_e[\epsilon f] = 0, but why E_e[\epsilon h] = 0? The hypothesis h is learned from some test set (X,Y) and Y depends on \epsilon and so the parameters of the learned model do. Thus one cannot write E_e[\epsilon h] = h E_e[\epsilon].
Could you explain this?
@rafgonsi : ... In performing the triple integral over
An entirely analogous result to that outlined in this gist is obtained when one computes the error of an estimator of a parameter. Namely the mean square error of any estimator is equal to its variance plus (the square of) its bias. See section 7.7 at https://www.sciencedirect.com/science/article/pii/B9780123948113500071
In active machine learning, we assume that the learner is unbiased, and focus on algorithms that minimize the learner's variance, as shown in Cohn et al (1996): https://arxiv.org/abs/cs/9603104 (Eq. 4 is difficult to interpret precisely, though, in the absence of further reading).
This analysis presented in this gist has also been published on Cross Validated: https://stats.stackexchange.com/a/287904/146385
Also see the section entitled "The Bias-Variance Decomposition" in Christopher Bishop's 2006 book: https://link.springer.com/book/9780387310732
@sdangi, I don't see why.