The AUC-ROC Curve: Decoding Classifier Performance and Discriminatory Power

SharkYun
4 min readAug 14, 2023
  • Intro
  • What is the AUC-ROC curve
  • How AUC-ROC work
  • How to interpret the AUC-ROC curve
  • Why AUC-ROC curve is powerful (with case study)

Intro

From the previous article I mentioned the confusion matrix and the derived performance metrics from it. In the article, we will be discuss how them are related with the AUC-ROC curve.

Building upon the concepts introduced in the previous article, where I elaborated on the confusion matrix and the performance metrics derived from it, we will now delve into the relationship between these metrics and the AUC-ROC curve in this current article.

What is the AUC-ROC curve

The AUC-ROC (Area Under the Receiver Operating Characteristic) curve is a graphical representation that showcases the relationship between the true positive rate (TPR)(sensitivity) and the false positive rate (FPR) as the classification threshold varies.

Lets take a look into what a AUC-ROC curve normal looks like:

AUC-ROC curve Demo I SharkYun

How the AUC-ROC curve works

First of all, lets briefly talk about how a model make predict, in the binary case, it will predict 0 or 1, which the model will generate the probability of each classes. In Binary case, it will have P(x=0) and P(x=1), at most of the case the threshold will be 0.5, which either P(x=0) or P(x=1) >0.5, then the model will make the prediction on the one larger than the threshold.

TPR = TP / (TP+FN); FPR = FP / (FP+TN)

  • The curve illustrates how well a model distinguishes between positive and negative instances across different thresholds.
  • AUC represents the overall performance of the model: higher AUC indicates better discrimination between classes.

Therefore, the AUC-ROC curve is plotting the curve of the true positive rate and the false positive rate for different threshold from 0.0 to 1.0 and see how the model may discriminative toward any of the class.

How to interpret the AUC-ROC curve

Since the AUC represents the area under the ROC curve. It’s a single value that quantifies the overall performance of the classifier.

The AUC ranges from 0 to 1, where:

  • AUC = 0.5 implies that the model’s performance is no better than random guessing.
  • AUC > 0.5 and < 1 implies better-than-random performance, where higher values indicate better discrimination between classes.
  • AUC = 1 implies that the model is a perfect classifier, meaning it achieves a TPR of 1 for all FPR values. This suggests that the model can completely separate the two classes without any errors.

The model with AUC-ROC curve that is closer to left top is generally a good model

AUC-ROC curve Demo II SharkYun

Why the AUC-ROC curve powerful

Let me use the example I assume last time, where 3% of data is positive and rest 97% is negative, we have the classifier that directly predict 100% negative which gives 97% accuracy, lets try to plot this classifier into the AUC-ROC curve:

Since the classifier is always predicting negative, so it would have a single point on the ROC curve: (0, 0), indicating a false positive rate (FPR) of 0 and a true positive rate (TPR) of 0. The AUC is calculated based on the area under the ROC curve. In this case, the ROC curve is just a single point at (0, 0), so the AUC would be 0.

Interpretation: An AUC of 0 implies that the model’s performance is equivalent to random guessing. Since it doesn’t predict any positive instances, it cannot achieve any true positives, and thus its ability to distinguish between the two classes is nonexistent.

Conclusion

While precision, recall, and F1 score are threshold-specific, the AUC-ROC curve considers multiple thresholds simultaneously. This property makes it a valuable tool for evaluating models, especially when dealing with imbalanced datasets or assessing the overall discriminatory ability of the classifier.

Follow and upvote if you found this is helpful~

Reference:

--

--

SharkYun

Data science notes and Personal experiences | UCLA 2023'