Loss Function Explained For Noobs (How Models Know They Are Wrong)

Table of Contents

# Introduction

I know that when beginners start learning machine learning, things seem easy at first. You follow a tutorial that asks you to load a dataset, train a model, and then you see something like this: loss = "mse" or criterion = nn.CrossEntropyLoss().

And just like that, the tutorial starts talking about equations, gradients, optimization, and Greek letters. If you have ever nodded along without really understanding what a loss function does, you are not alone. Loss functions are often explained backward. Most tutorials start with the formula when they should start with the idea. This article is part of my noob series, where I will make things easier for you to understand. So, let’s get started.

# What Is a Loss Function?

A loss function is how a machine learning model knows how wrong it is. That is literally the whole concept. The model makes a prediction. The loss function compares that prediction with the correct answer. Then it gives the model a number that says, “This is how bad your mistake was.”

A high loss means the model was very wrong.

A low loss means the model was close.

During training, the model keeps adjusting itself to make the loss smaller.

That is how learning happens. If you have played a dart game, it is very similar. You throw the dart. To improve, you need feedback. You need to know whether your dart was slightly off, far away, too high, or too far left. Without that feedback, you cannot improve. So, the bullseye is basically the correct answer and the dart is the prediction. You measure the distance between the dart and the bullseye. The loss function measures how far away the dart landed. That distance becomes the model’s feedback signal. Here’s how it would look if you prefer a visualization.

Just like the distance from the center matters, throwing too close is not the same as being way off. Similarly, for models, just knowing that the answer is wrong is not enough. The model needs to know how badly it failed in order to improve.

Now that we have an understanding of what a loss function is and why we need it, let’s look at some of the common loss functions used in machine learning.

# Mean Squared Error

The most common loss for predicting numbers is mean squared error (MSE). It is often used when the model is predicting numbers like house prices, temperatures, or delivery times. The idea is very simple.

Error: For each prediction, take the gap between the guess and the truth.
Squared: Multiply each gap by itself.
Mean: Average all those squared gaps.

You can write it in Python like this:

def mean_squared_error(predictions, actuals):
    squared_errors = [(p - a) ** 2 for p, a in zip(predictions, actuals)]
    return sum(squared_errors) / len(squared_errors)

Now, I know that taking the errors and then averaging over the predictions makes sense intuitively, but understanding why we square them can be confusing. This is done for two reasons:

Squaring makes every error positive. An error of +3 and an error of -3 are equally bad, and squaring turns both into 9, so they stop cancelling each other out.
Squaring punishes big mistakes far more harshly than small ones. This is good for lots of use cases. For example, if you are predicting house prices, being wrong by \$1,000 versus \$200,000 should be punished accordingly.

# Mean Absolute Error

Another common loss function is mean absolute error (MAE). MAE also measures the gap between predictions and actual values, but it does not square the error. Instead, it simply takes the absolute value.

Here’s the Python function to write it:

def mean_absolute_error(predictions, actuals):
    absolute_errors = [abs(p - a) for p, a in zip(predictions, actuals)]
    return sum(absolute_errors) / len(absolute_errors)

So, it punishes large errors, but not as harshly as MSE does.

An error of 10 costs 10 and an error of 20 costs 20.
If your data naturally has some outliers and you do not want your model to overreact, MAE is a good choice.

Let me show a quick graph that compares the MSE and MAE curves.

# Cross-Entropy Loss

So far, we have talked about predicting numbers. But many machine learning problems are about predicting categories.

Is this email spam or not?

Is this a picture of a cat, dog, or fish?

Is a certain transaction fraudulent or not?

For classification tasks, models usually output probabilities like:

Dog: 70%
Cat: 20%
Fish: 10%

If the image really is a dog, that is a good prediction. But if it is a cat, then the model needs to be penalized for assigning a lower probability to the correct answer.

So, the intuition is:

Correct and confident — low loss
Correct but unsure — medium loss
Wrong and confident — high loss

This is why cross-entropy is so widely used for classification. It does not just care about whether the model was right. It also cares about how confident the model was.

# Loss vs. Accuracy

Now that we have gone through different loss functions, I also want to clarify the difference between loss and accuracy. They are not the same thing.

Accuracy tells you how many predictions were correct.

But loss tells you how bad the model’s mistakes were.

If you have two models — Model A and Model B — and both get 90 out of 100 predictions correct, they will have the same accuracy. But one model may be very confident on the right answers and only slightly wrong on the incorrect ones, while the other may be barely correct on many examples and extremely confident when wrong.

In that case, the accuracy would be the same, but the loss would be different.

# The Training Loop

Once the model has a loss number, it can improve. The training loop looks like this:

The model makes predictions.
The loss function measures the mistakes.
The optimizer updates the model.
The model tries again.
The loss hopefully gets smaller.

When training a model, we also plot the loss over time. In the beginning, the model makes many mistakes and is poor at making predictions, so the loss is high. But as training progresses, the loss decreases and the model gets better at making predictions.

A healthy training curve often looks like this:

High loss at the start → sharp drop → gradual flattening

as you can see in the figure below.

The flattening is normal. It means the model has learned the easy patterns and is now making smaller improvements. But if the training loss goes down while the validation loss starts going up, that can be a warning sign of overfitting — which means the model may be memorizing the training data instead of learning patterns that generalize.

# Final Thoughts

A loss function is the model’s mistake score.

It tells the model how wrong its predictions are, and it gives training a clear goal: make that number smaller.

Once you understand loss functions, many other machine learning ideas become easier to grasp — including gradient descent, backpropagation, optimization, overfitting, and evaluation metrics.

You do not need to start with scary equations. Start with the idea:

The model guesses.
The loss function scores the guess.
The model updates itself to reduce the score.

That is the heart of machine learning.

Loss is how a model knows it is wrong.

Training is how it learns to be less wrong.

This brings us to the end of this article. We will continue to cover some interesting concepts throughout our noob series.

Kanwal Mehreen is a machine learning engineer and a technical writer with a profound passion for data science and the intersection of AI with medicine. She co-authored the ebook “Maximizing Productivity with ChatGPT”. As a Google Generation Scholar 2022 for APAC, she champions diversity and academic excellence. She’s also recognized as a Teradata Diversity in Tech Scholar, Mitacs Globalink Research Scholar, and Harvard WeCode Scholar. Kanwal is an ardent advocate for change, having founded FEMCodes to empower women in STEM fields.

What's Hot

SAP and Google Cloud deploy agentic commerce architecture

Loss Function Explained For Noobs (How Models Know They Are Wrong)

Do Fitness Trackers Still Work If You Have Tattoos?

Loss Function Explained For Noobs (How Models Know They Are Wrong)

10 Python Libraries for Building LLM Applications

Zomato's MCP Server for ChatGPT & Claude

Black Swans in Artificial Intelligence — Dan Rose AI

Machine Learning System Design: 10 Interview Problems Solved

Advanced Join Techniques: LATERAL Joins, Semi Joins, Anti Joins

10 Tips & Features to Work Faster

Black Swans in Artificial Intelligence — Dan Rose AI

Every Clue That Tony Stark Was Always Doctor Doom

We let ChatGPT judge impossible superhero debates — here’s how it ruled

Most Popular

Black Swans in Artificial Intelligence — Dan Rose AI

Every Clue That Tony Stark Was Always Doctor Doom

We let ChatGPT judge impossible superhero debates — here’s how it ruled

Subscribe to Updates

What's Hot

Loss Function Explained For Noobs (How Models Know They Are Wrong)

# Introduction

# What Is a Loss Function?

# Mean Squared Error

# Mean Absolute Error

# Cross-Entropy Loss

# Loss vs. Accuracy

# The Training Loop

# Final Thoughts

Related posts:

10 Python Libraries for Building LLM Applications

Zomato's MCP Server for ChatGPT & Claude

Black Swans in Artificial Intelligence — Dan Rose AI

Related Posts

Subscribe to Updates