top of page

Publications
Trusting the Black Box: A Practical Approach to Model Reliability
(Version 1.00 – January 15 th , 2026) Mr. Jorge RODRIGUEZ When data scientists build a model, the first stop is usually the evaluation metric mean squared error, log-loss, F1 score, and the like. We track performance across training, validation, and test sets to check that the model isn’t just memorizing patterns but can actually generalize. On paper, that’s what makes a model look “ready” for the real world. But the world outside of datasets is messy and that’s where the gap
Jan 159 min read
What is the optimal evaluation metric for multilabel classification models?
(Version 1.18 – January 07, 2026) Ms. Yumi HEO What is multilabel classification? Imagine classifying the colour of a single skin mole not just as light brown or dark brown but as both light brown and dark brown simultaneously. This is the core task of multilabel classification: training a machine learning model to assign multiple relevant labels to a single input. However, accurately evaluating this model often encounters a greater challenge than evaluating a standard classi
Jan 94 min read
bottom of page
