PUBLICATIONS

Where Research is Shared and Insights are Published.

Handling Class Imbalance in Image Datasets

(Version 1.00 – March 30th, 2026) Ms. Yumi HEO The problem with imbalanced data Imagine training a model to detect a rare disease from medical scans. Your dataset has 950 healthy scans and only 50 diseased ones. If you train naively, the model quickly learns to simply predict "healthy" every single time. It could achieve 95% accuracy but this is completely useless. This is the class imbalance problem. It is everywhere in the real world and it's also common in image datasets.

Mar 306 min read

Cleaning the Noise: How Signals and Sounds Become Meaningful

(Version 1.00 – February 19th, 2026) Mr. Guy THANAPONPAIBOON Ever wonder how your phone understands your voice in a noisy room, or how music streaming services instantly recognize a song from just a few seconds of audio? The magic often lies in something called signal and sound preprocessing. It's the unsung hero that takes raw, messy audio and transforms it into something intelligent systems can actually understand and use. It's the essential preparation that transforms raw,

Feb 195 min read

Trusting the Black Box: A Practical Approach to Model Reliability

(Version 1.00 – January 15 th , 2026) Mr. Jorge RODRIGUEZ When data scientists build a model, the first stop is usually the evaluation metric mean squared error, log-loss, F1 score, and the like. We track performance across training, validation, and test sets to check that the model isn’t just memorizing patterns but can actually generalize. On paper, that’s what makes a model look “ready” for the real world. But the world outside of datasets is messy and that’s where the gap

Jan 159 min read

What is the optimal evaluation metric for multilabel classification models?

(Version 1.18 – January 07, 2026) Ms. Yumi HEO What is multilabel classification? Imagine classifying the colour of a single skin mole not just as light brown or dark brown but as both light brown and dark brown simultaneously. This is the core task of multilabel classification: training a machine learning model to assign multiple relevant labels to a single input. However, accurately evaluating this model often encounters a greater challenge than evaluating a standard classi

Jan 94 min read