Skip to content

Instantly share code, notes, and snippets.

@hnlee
Last active September 20, 2019 03:00
Show Gist options
  • Save hnlee/cf858a1572909bbaaf6181513c0d5d93 to your computer and use it in GitHub Desktop.
Save hnlee/cf858a1572909bbaaf6181513c0d5d93 to your computer and use it in GitHub Desktop.
Measuring ML performance

Measuring machine learning performance

Loss function

Loss function: measures the difference between prediction and reality, i.e. the accuracy of the model

  • Remember: when training a model, we use an optimization algorithm to pick parameters that minimize the loss function
  • The same loss function can be used to measure the model's performance against the test set
  • In a way, minimax is a loss function!

What kinds of loss functions are there?

Cross-validation

Remember: labeled input data is randomly partitioned into training and test sets. The test data is used to validate the model.

Why?

  • Training error: calculating the loss function on the training data
  • Test error: calculating the loss function on the test data
  • Training error only measures how well the model fits the training data
  • Test error provides an estimate for how well the model will perform on new data, i.e. generalizability
  • Overfitting: when a model fits the training data too closely and performs badly on test data

Cross-validation: partitioning labeled input data into training and test sets one or more times to assess generalizability

  • Holdout method: doing this partition once
    • Easy to do
    • But sometimes measure of error will change dramatically if the data is partitioned differently or if the ratio of training to test data sizes changes
  • Leave-one-out (LOO) cross-validation:
    • Given n data points in labeled input data, split so that there is one data point in the test set and the rest are in the training set
    • Repeat, taking out a different data point for the test set each time
    • Average the loss function calculation across the n iterations
    • Exhaustive but computationally expensive
  • k-fold cross-validation:
    • Split labeled input data into k equally sized subsamples so that one subsample is used for the test set and k - 1 subsamples are used for the training set
    • Repeat, taking out a different subsample for the test set each time
    • Average the loss function calculation across the k iterations
    • Less comprehensive than leave-one-out but faster
    • Fernando-Delgado et al., 2014 use 4-fold cross-validation to evaluate the classifiers
  • See Wikipedia and scikit-learn documentation for more variations
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment