Using binary predictions vs probability scores in ROC_AUC_SCORE

Anupam Yadav
2 min readMay 18, 2022

--

If you ever competed in a binary classification kaggle competition that uses roc_auc_score as the evaluation metric, you might have faced this connundrum. That when you use binary predictions in your submission, somehow the roc_auc_score calculated on the public test set is less compared to when you use probability scores. It doesn’t make any sense. But there is a reason behind why this happens. Let me explain

auc score calculation

The above code block shows the difference in roc_auc_score calculated using binary predictions (y_pred_bin) and probability scores (y_pred_proba).

We can see by plotting the roc curves for the two scenarios that area under the roc curve using binary predictions is clearly smaller than the area under the roc curve using probability scores. Here is the roc curve plotting code

roc curve plotting

So that explains why roc_auc_score for probability scores is higher.

Now comes the question why is the area under one curve is smaller than other. The reason is as follows:

  1. When we give to roc_auc_score binary prediction we get only 3 points to our curve — (0, 0), (fpr, tpr), (1, 1). In case of binary predictions (0, 1) the threshold values returned by the sklearn.metric.roc_curve method are [2,1,0]. y_pred_bin has two distinct values 0 and 1, in addition to them we have max(y_pred_bin) + 1. That is why the roc_curve is drawn using only three points as we have just three threshold values.
  2. When we give to roc_auc_score proba prediction we get more than 3 points to our curve. The threshold values returned by roc_curve method are each of the distinct probability predictions in addition to max(y_pred_proba) + 1. Thus you will have as many threshold values as you have distinct probability predictions + 1. And for each threshold value you will have a false positive rate (fpr) and a true positive rate (tpr)

So next time you see a jump in your roc_auc_score when you switch to using probability scores instead of predicted labels, don’t get surprised.

Reference: https://www.kaggle.com/competitions/autismdiagnosis/discussion/324427

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

--

--

Anupam Yadav
Anupam Yadav

Written by Anupam Yadav

Engineer, software developer, explorer, learner.

No responses yet

Write a response