The Crisis of Confidence in Statistical Purity

Alright, we’ve journeyed through the surprising generalization of huge neural nets (first part) and navigated the murky waters of imbalanced data, using techniques like SMOTE that felt necessary but theoretically a bit… dodgy (second part). We tweaked our data $\mathcal{S}’$ or our loss function $L’$ to get better performance on metrics like F1 score, especially for those pesky rare classes. Great! Mission accomplished? Well, hold on. When our fraud detection model, trained on SMOTE’d data, now confidently outputs $P(\text{fraud} | x) = 0....

November 19, 2025 · 11 min · Pablo Olivares