petermchale/analysis.ipynb

Last active March 7, 2025 01:14

Star (11) You must be signed in to star a gist
Fork (1) You must be signed in to fork a gist

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/petermchale/a4fc2ca750048d21a0cbb8fafcc690af.js"></script>
Save petermchale/a4fc2ca750048d21a0cbb8fafcc690af to your computer and use it in GitHub Desktop.

Download ZIP

A derivation of the bias-variance decomposition of test error in machine learning.

Raw

analysis.ipynb

Sorry, something went wrong. Reload?

Sorry, we cannot display this file.

Sorry, this file is invalid so it cannot be displayed.

Author

petermchale commented Mar 27, 2018

@sdangi, I don't see why.

bradyneal commented Mar 31, 2018

Nice gist! Regarding the term that should be added and subtracted, I think you accidentally typed up the square inside of the expectation rather than outside of the expectation. This is necessary for the "rearranging terms, we may write the right-hand side above as" math to carry through.

Author

petermchale commented Apr 5, 2018 •

edited

Loading

@bradyneal, I understand what @sdangi meant. My issue is that I don't see why the suggested change is correct. I could be wrong, though. Happy to look over the algebra if you or @sdangi are interested in providing it. Best, Peter

bradyneal commented Apr 7, 2018

Author

petermchale commented Apr 13, 2018

@bradyneal @sdangi : you are correct! Thanks! I've updated the jupyter notebook accordingly.

rafgonsi commented Jun 20, 2018

In section 'Reducible and irreducible error' why is E_e[2\epsilon (f - h)] equal to 0?

I agree that E_e[\epsilon f] = 0, but why E_e[\epsilon h] = 0? The hypothesis h is learned from some test set (X,Y) and Y depends on \epsilon and so the parameters of the learned model do. Thus one cannot write E_e[\epsilon h] = h E_e[\epsilon].

Could you explain this?

Author

petermchale commented Aug 28, 2018

@rafgonsi : ... In performing the triple integral over $X$, $\cal{D}$ and $\epsilon$, I fix two variables ($X$ and $\cal{D}$) and vary the third ($\epsilon$). Since $X$ is fixed, so are $f$ and $h$, which may therefore be "pulled out" of the innermost integral over $\epsilon$.

Author

petermchale commented Mar 29, 2021

An entirely analogous result to that outlined in this gist is obtained when one computes the error of an estimator of a parameter. Namely the mean square error of any estimator is equal to its variance plus (the square of) its bias. See section 7.7 at https://www.sciencedirect.com/science/article/pii/B9780123948113500071

Author

petermchale commented Dec 2, 2023 •

edited

Loading

In active machine learning, we assume that the learner is unbiased, and focus on algorithms that minimize the learner's variance, as shown in Cohn et al (1996): https://arxiv.org/abs/cs/9603104 (Eq. 4 is difficult to interpret precisely, though, in the absence of further reading).

Author

petermchale commented Feb 6, 2025 •

edited

Loading

This analysis presented in this gist has also been published on Cross Validated: https://stats.stackexchange.com/a/287904/146385

Author

petermchale commented Mar 7, 2025 •

edited

Loading

Also see the section entitled "The Bias-Variance Decomposition" in Christopher Bishop's 2006 book: https://link.springer.com/book/9780387310732