Training, Validation, Test Datasets

This week I have the flu. I moved my desk closer to the bed and… blacked out, then… binge-watched a few movies 😄 , and after that… studied a lot actually.

Some of my learning sessions felt blurry, but satisfactory nonetheless.

I continued with one course on Machine Learning I have recently started and there was one concept that I suppose can be confusing. How datasets for ML are used.

Training, validation, and test datasets are mostly associated with supervised learning. The general idea of splitting data for training, evaluation, and testing can extend to unsupervised learning in some cases and even in reinforcement learning, though the approach may differ.

The chart below illustrates the outline presented in the course.

Training, validation, and test datasets chart

As I understand it, this split approach is used in supervised Learning. We trained model using this approach, with Linear Regression and Gradient Boosting Regression algorithms.

I’m excited to see how other algorithms work and whether my view of this chart will evolve over time.