Using binary predictions vs probability scores in ROC_AUC_SCORE
If you ever competed in a binary classification kaggle competition that uses roc_auc_score as the evaluation metric, you might have faced this connundrum. That when you use binary predictions in your submission, somehow the roc_auc_score calculated on the public test set is less compared to when you use probability scores. It doesn’t make any sense. But there is a reason behind why this happens. Let me explain

The above code block shows the difference in roc_auc_score calculated using binary predictions (y_pred_bin) and probability scores (y_pred_proba).
We can see by plotting the roc curves for the two scenarios that area under the roc curve using binary predictions is clearly smaller than the area under the roc curve using probability scores. Here is the roc curve plotting code


So that explains why roc_auc_score for probability scores is higher.
Now comes the question why is the area under one curve is smaller than other. The reason is as follows:
- When we give to roc_auc_score binary prediction we get only 3 points to our curve — (0, 0), (fpr, tpr), (1, 1). In case of binary predictions (0, 1) the threshold values returned by the sklearn.metric.roc_curve method are [2,1,0]. y_pred_bin has two distinct values 0 and 1, in addition to them we have max(y_pred_bin) + 1. That is why the roc_curve is drawn using only three points as we have just three threshold values.
- When we give to roc_auc_score proba prediction we get more than 3 points to our curve. The threshold values returned by roc_curve method are each of the distinct probability predictions in addition to max(y_pred_proba) + 1. Thus you will have as many threshold values as you have distinct probability predictions + 1. And for each threshold value you will have a false positive rate (fpr) and a true positive rate (tpr)
So next time you see a jump in your roc_auc_score when you switch to using probability scores instead of predicted labels, don’t get surprised.
Reference: https://www.kaggle.com/competitions/autismdiagnosis/discussion/324427