Supervised Learning / Evaluation

Model Evaluation

How to trust your machine learning models before deploying them into production.

Why Evaluate Models?

Once you have trained a machine learning model, the most critical question remains: How well does it actually perform? Testing your model on the exact same data it was trained on will yield an artificially high score because the model has already "seen" those answers. This phenomenon is known as overfitting.

Proper model evaluation requires testing the model against unseen data. By establishing robust evaluation pipelines, we ensure that models generalize well to the real world.

Train / Test Split

The most fundamental evaluation technique. You divide your dataset into two distinct parts: usually 80% for training the model, and a completely hidden 20% for testing it afterward.

X_train, X_test = split(X, test_size=0.2)

K-Fold Cross Validation

To eliminate the luck of a "good split", K-Fold divides the data into K separate chunks. The model trains K times, each time using a different chunk as the test set and the remaining as training.

Visualization of Fold 1 testing

Hyperparameter Tuning

Models have knobs and levers called hyperparameters (like the depth of a tree, or the learning rate). To find the best settings, we perform a Grid Search or Random Search, evaluating the model at every combination of settings to find the absolute best variation.

Grid Search

Exhaustive search over all possible specified parameters.

Random Search

Samples completely random combinations. Faster and often just as good.

Bayesian Opt

Uses past results to probabilistically pick the next best parameter to try.