What is a classifier?

Most of the following is paraphrased from Elements of Statistical Learning, which is available as a PDF.

Variables can be inputs or outputs

Inputs can also be called: predictors, independent variables, features
Outputs can also be called: responses, dependent variables

Variables can be quantitative or qualitative

Qualitative variables are also known as categorical variables, discrete variables, factors
Mathematically, qualitative variables are represented through numerical codes.
- When there are only two categories or classes, it can be represented as a boolean with values of 0 or 1.
- There are multiple methods for coding multiclass categorical variables. See: dummy variables, scikit-learn's Encoder classes

Supervised learning: "using inputs to predict the values of the outputs"

Regression: predicting quantitative outputs
Classification: predicting qualitative outputs
Classifier: algorithm used to solve a classification problem
Classification usually means specifically supervised learning

Unsupervised learning: "describe associations and patterns among a set of inputs"

Clustering: using unsupervised learning to group inputs into categories
Unsupervised learning is often used to generate new input variables for supervised learning.

How supervised learning works:

Training data: set of input data with known output values used to learn a prediction rule or model
Test data: set of input data with known output values used to evaluate the performance of a given model
A model presents a hypothesis about the relationship between the outputs and inputs, usually in the form of a mathematical formula
- This formula has parameters or coefficients that need to be fit to the training data
- Loss function: used to measure the fit, i.e. evaluate the accuracy of the model's predictions
- Training a model: finding the set of parameters that optimizes the fit by minimizing the loss function
- There are many optimization algorithms available, one of the most popular is stochastic gradient descent

Fernando-Delgado et al., 2014 uses the term two-class data sets to describe binary classification problems. That means the output being predicted has only two categories. Multiclass classification is a trickier problem: the output being predicted has multiple categories.

Some useful background context: no free lunch theorem states that no one type of classifier is best for all classification problems.

The paper evaluates 176 classifiers on 121 classification problems to empirically compare their overall performance.

hnlee/paper-discussion-201910-resources-01.md

What is a classifier?