ROC and AUC — How to Evaluate Machine Learning Models in No Time

rabbi khan
8 min readDec 8, 2020

Photo by Priscilla Du Preez on Unsplash

Model selection should be easy. And it is — if you know how to calculate and interpret ROC curves and AUC scores. That’s what you’ll learn in this article — in 10 minutes if you’re coding along. In 5 if you aren’t.

After reading, you’ll know:

  • What ROC and AUC are
  • How to use ROC and AUC in Python

ROC and AUC demistyfied

You can use ROC (Receiver Operating Characteristic) curves to evaluate different thresholds for classification machine learning problems. In a nutshell, ROC curve visualizes a confusion matrix for every threshold.

But what are thresholds?

Every time you train a classification model, you can access prediction probabilities. If a probability is greater than 0.5, the instance is classified as positive. Here, 0.5 is the decision threshold. You can adjust it to reduce the number of false positives or false negatives.

ROC curve shows a False positive rate on the X-axis. This metric informs you about the proportion of negative class classified as positive (Read: COVID negative classified as COVID positive).

On the Y-axis, it shows a True positive rate. This metric is sometimes called Recall or Sensitivity, so keep that in mind. It informs you about the positive class proportion that was correctly classified (Read: COVID positive and classified as COVID positive).

Refer to the following image for a refresher in the confusion matrix and TPR/FPR calculation:

Image 1 — Confusion matrix and TPR/FPR calculation (image by author)

Great, but what is AUC?

AUC represents the area under the ROC curve. Higher the AUC, the better the model at correctly classifying instances. Ideally, the ROC curve should extend to the top left corner. The AUC score would be 1 in that scenario.

Let’s go over a couple of examples. Below you’ll see random data drawn from a normal distribution. Means and variances differ to represent centers for different classes (positive and negative).

For a great model, the distributions are entirely separated:

Image 2 — A model with AUC = 1 (image by author)

You can see that this yields an AUC score of 1, indicating that the model classifies every instance correctly.

Can AUC be 0? Yes — it means the model is reciprocating the classes. In other words, it’s predicting positive classes and negative and vice versa. Take a look at the image below:

Image 3 — A model with AUC = 0 (image by author)

Can you think of a quick way of turning a 0% accurate model into a 100% one? Let’ me know in the comment section below.

Finally, there’s a scenario when AUC is 0.5. It means the model is useless. Just think about it, you ask a model whether someone is positive or negative, and it tells you: well, maybe it’s positive, maybe it’s negative (50:50). That’s useless for binary classification tasks.

Here’s how the ROC curve looks like when AUC is 0.5:

Image 4 — A model with AUC = 0.5 (image by author)

Now you know the theory. Let’s connect it with practice next.

Using ROC and AUC in Python

You’ll use the White wine quality dataset for the practical part. Here’s how to load it with Python:

The first couple of rows look like this:

Image 5 — White wine dataset head (image by author)

Initially, this is not a binary classification dataset, but you can convert it to one. Let’s say the wine is Good if the quality is 7 or above, and Bad otherwise:

There’s your binary classification dataset. Let’s visualize the counts of good and bad wines next. Here’s the code:

And here’s the chart:

Image 6 — Class distribution of the target variable (image by author)

And there’s nothing more to do with regards to preparation. You can make a train/test split next:

Great! The snippet below shows you how to train logistic regression, decision tree, random forests, and extreme gradient boosting models. It also shows you how to grab probabilities for the positive class. It will come in handy later:

You can visualize the ROC curves and calculate the AUC now. The only requirement is to remap the Good and Bad class names to 1 and 0, respectively.

The following code snippet visualizes the ROC curve for the four trained models and shows their AUC score on the legend:

Here’s the corresponding visualization:

Image 7 — ROC curves for different machine learning models (image by author)

No perfect models here, but all of them are far away from the baseline (unusable model). The random forest algorithm is the best, with a 0.93 AUC score. That’s amazing for the preparation and feature engineering we did.


In a nutshell, you can use ROC curves and AUC scores to choose the best machine learning model for your dataset. Image 7 shows you how easy it is to interpret the ROC curves, even when there are multiple curves on the same chart.

If you need a completely automated solution, look only at the AUC and select the model with the highest score.

What’s your approach to model selection? Let me know in the comment section.

Hands-on real-world examples, research, tutorials, and cutting-edge techniques delivered Monday to Thursday. Make learning your daily ritual. Take a look

Get this newsletter

Emails will be sent to

Not you?


There’s a reason the Normal Distribution is called “normal”. Its presence can be felt throughout data science and machine learning, as well as in a variety of unexpected real-world scenarios. From the distribution of heights and weights, to the volume of milk collected from cows, to SAT scores — the normal distribution is seemingly omnipresent!

mage Courtesy of energepic on Pexels

Carl Friedrich Gauss first described the normal distribution in an essay introducing least squares and maximum likelihood released in 1809. While history has given Gauss naming rights (it’s called the Gaussian distribution after all), it was Pierre-Simon Laplace who, building from Gauss’ work, formulated the Central Limit Theorem (CLT). …

Photo by Manyu Varma on Unsplash

When I first set out to become a Data Scientist I’d often hear/read various myths being thrown around the Data Science community. Now, I believe many of these myths have matured and evolved into even more annoying myths. You’d often find that these myths usually stem from the misconceptions of people who’ve struggled to break into the field, people of high influence with the field expressing subjective views that are misunderstood by followers, or common confusion in general.

“Data Science is all About Model Building”

Let’s start by introducing Kaggle; Kaggle is a very popular Data Science platform that hosts a flurry of various Data Science competitions. The team that comes up with the best solution, meaning that they have the best score on the leaderboard, are deemed winners along with other people also built high scoring models. …

GAN is basically an unsupervised learning algorithm that works on a neural network and generates samples from an image and also differentiates it from the original images which are used. For this, it uses two neural networks first one is a generator, and the second one is a discriminator.

In this article, we will use GAN to create a cartoon of an image that we will provide to the model. We will generate different cartoons by running the model on different epochs.

We will use Google Colab for this project, let us start with some basic commands which will be required for setting up google collab for our project. …

Photo by NASA on Unsplash

The climate change currently is a hot topic, with many experts claiming a significant increase of the average temperature over the whole world. Nevertheless some people don’t believe these experts and claim that the climate didn’t change, and other people question the influence of the human species on the current development.

While I am by no means an expert for climate or weather, I was wondering if I could follow the claims of an increase of the average temperature by analyzing appropriate data. Depending on the chosen data source, following this idea can be a technically challenging and insightful journey into weather data. …

Photo by Jonatan Lewczuk on Unsplash

Summary: In this post, I will discuss the details of the Numpy library’s features and applications in data science. Code samples will be shown to illustrate specific techniques.

1-What is Data Science

Data science combines many fields, including statistics, scientific methods, and data analysis, to extract value from data.

Data: An underutilized resource for machine learning
Data science is one of the most exciting areas of today. But why is it so important?