Image by Author
# Introduction
Everyone focuses on solving the problem, but almost no one tests the solution. Sometimes, a perfectly working script can break with just one new row of data or a slight change in the logic.
In this article, we will solve a Tesla interview question in Python and show how versioning and unit tests turn a fragile script into a reliable solution by following three steps. We will start with the interview question and end with automated testing using GitHub Actions.
Image by Author
We will go through these three steps to make a data solution production-ready.
First, we will solve a real interview question from Tesla. Next, we will add unit tests to ensure the solution stays reliable over time. Finally, we will use GitHub Actions to automate testing and version control.
# Solving A Real Interview Question From Tesla
New Products
Calculate the net change in the number of products launched by companies in 2020 compared to 2019. Your output should include the company names and the net difference.
(Net difference = Number of products launched in 2020 – The number launched in 2019.)
In this interview question from Tesla, you are asked to measure product growth across two years.
The task is to return each company’s name along with the difference in product count between 2020 and 2019.
// Understanding The Dataset
Let us first look at the dataset we are working with. Here are the column names.
| Column Name | Data Type |
|---|---|
| year | int64 |
| company_name | object |
| product_name | object |
Let us preview the dataset.
| Year | Company_name | Product_name |
|---|---|---|
| 2019 | Toyota | Avalon |
| 2019 | Toyota | Camry |
| 2020 | Toyota | Corolla |
| 2019 | Honda | Accord |
| 2019 | Honda | Passport |
This dataset contains three columns: year, company_name, and product_name. Each row represents a car model released by a company in a given year.
// Writing The Python Solution
We will use basic pandas operations to group, compare, and calculate the net product change per company. The function we will write splits the data into subsets for 2019 and 2020.
Next, it merges them by company names and counts the number of unique products launched each year.
import pandas as pd
import numpy as np
from datetime import datetime
df_2020 = car_launches[car_launches['year'].astype(str) == '2020']
df_2019 = car_launches[car_launches['year'].astype(str) == '2019']
df = pd.merge(df_2020, df_2019, how='outer', on=[
'company_name'], suffixes=['_2020', '_2019']).fillna(0)
The final output subtracts 2019 counts from 2020 to get the net difference. Here is the entire code.
import pandas as pd
import numpy as np
from datetime import datetime
df_2020 = car_launches[car_launches['year'].astype(str) == '2020']
df_2019 = car_launches[car_launches['year'].astype(str) == '2019']
df = pd.merge(df_2020, df_2019, how='outer', on=[
'company_name'], suffixes=['_2020', '_2019']).fillna(0)
df = df[df['product_name_2020'] != df['product_name_2019']]
df = df.groupby(['company_name']).agg(
{'product_name_2020': 'nunique', 'product_name_2019': 'nunique'}).reset_index()
df['net_new_products'] = df['product_name_2020'] - df['product_name_2019']
result = df[['company_name', 'net_new_products']]
// Viewing The Expected Output
Here is the expected output.
| Company_name | Net_new_products |
|---|---|
| Chevrolet | 2 |
| Ford | -1 |
| Honda | -3 |
| Jeep | 1 |
| Toyota | -1 |
# Making The Solution Reliable With Unit Tests
Solving a data problem once does not mean it will keep working. A new row or a logic tweak can silently break your script. For instance, imagine you accidentally rename a column in your code, changing this line:
df['net_new_products'] = df['product_name_2020'] - df['product_name_2019']
to this:
df['new_products'] = df['product_name_2020'] - df['product_name_2019']
The logic still runs, but your output (and tests) will suddenly fail because the expected column name no longer matches. Unit tests fix that. They check if the same input still gives the same output, every time. If something breaks, the test fails and shows exactly where. We will do this in three steps, from turning the interview question’s solution into a function to writing a test that checks the output against what we expect.
Image by Author
// Turning The Script Into A Reusable Function
Before writing tests, we need to make our solution reusable and easy to test. Converting it into a function allows us to run it with different datasets and verify the output automatically, without having to rewrite the same code every time. We changed the original code into a function that accepts a DataFrame and returns a result. Here is the code.
def calculate_net_new_products(car_launches):
df_2020 = car_launches[car_launches['year'].astype(str) == '2020']
df_2019 = car_launches[car_launches['year'].astype(str) == '2019']
df = pd.merge(df_2020, df_2019, how='outer', on=[
'company_name'], suffixes=['_2020', '_2019']).fillna(0)
df = df[df['product_name_2020'] != df['product_name_2019']]
df = df.groupby(['company_name']).agg({
'product_name_2020': 'nunique',
'product_name_2019': 'nunique'
}).reset_index()
df['net_new_products'] = df['product_name_2020'] - df['product_name_2019']
return df[['company_name', 'net_new_products']]
// Defining Test Data And Expected Output
Before running any tests, we need to know what “correct” looks like. Defining the expected output gives us a clear benchmark to compare our function’s results against. So, we will build a small test input and clearly define what the correct output should be.
import pandas as pd
# Sample test data
test_data = pd.DataFrame({
'year': [2019, 2019, 2020, 2020],
'company_name': ['Toyota', 'Toyota', 'Toyota', 'Toyota'],
'product_name': ['Camry', 'Avalon', 'Corolla', 'Yaris']
})
# Expected output
expected_output = pd.DataFrame({
'company_name': ['Toyota'],
'net_new_products': [0] # 2 in 2020 - 2 in 2019
})
// Writing And Running Unit Tests
The following test code checks if your function returns exactly what you expect.
If not, the test fails and tells you why, down to the last row or column.
The test below uses the function from the previous step (calculate_net_new_products()) and the expected output we defined.
import unittest
class TestProductDifference(unittest.TestCase):
def test_net_new_products(self):
result = calculate_net_new_products(test_data)
result = result.sort_values('company_name').reset_index(drop=True)
expected = expected_output.sort_values('company_name').reset_index(drop=True)
pd.testing.assert_frame_equal(result, expected)
if __name__ == '__main__':
unittest.main()
# Automating Tests With Continuous Integration
Writing tests is a good start, but only if they actually run. You could run the tests manually after every change, but that does not scale, it is easy to forget, and team members may use different setups. Continuous Integration (CI) solves this by running tests automatically whenever code changes are pushed to the repository.
GitHub Actions is a free CI tool that does this on every push, keeping your solution reliable even when the code, data, or logic changes. It runs your tests automatically on every push, so your solution stays reliable even when the code, data, or logic changes. Here is how to apply CI with GitHub Actions.
Image by Author
// Organizing Your Project Files
To apply CI to an interview query, you first need to push your solution to a GitHub repository. (To learn how to create a GitHub repo, please read this).
Then, set up the following files:
solution.py: Interview questions solution from Step 2.1expected_output.py: Defines test input and expected output from Step 2.2test_solution.py: Unit test usingunittestfrom Step 2.3requirements.txt: Dependencies (e.g., pandas).github/workflows/test.yml: GitHub Actions workflow filedata/car_launches.csv: Input dataset used by the solution
// Understanding The Repository Layout
The repository is organized this way so GitHub Actions can find everything it needs in your GitHub repository without extra setup. It keeps things simple, consistent, and easy for both you and others to work with.
my-query-solution/
├── data/
│ └── car_launches.csv
├── solution.py
├── expected_output.py
├── test_solution.py
├── requirements.txt
└── .github/
└── workflows/
└── test.yml
// Creating A GitHub Actions Workflow
Now that you have all the files, the last one you need is test.yml. This file tells GitHub Actions how to run your tests automatically when code changes.
First, we name the workflow and tell GitHub when to run it.
name: Run Unit Tests
on:
push:
branches: [ main ]
pull_request:
branches: [ main ]
This means the tests will run every time someone pushes code or opens a pull request on the main branch. Next, we create a job that defines what will happen inside the workflow.
jobs:
test:
runs-on: ubuntu-latest
The job runs on GitHub’s Ubuntu environment, which gives you a clean setup each time. Now we add steps inside that job. The first one checks out your repository so GitHub Actions can access your code.
- name: Checkout repository
uses: actions/checkout@v4
Then we set up Python and choose the version we want to use.
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: "3.10"
After that, we install all the dependencies listed in requirements.txt.
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -r requirements.txt
Finally, we run all unit tests in the project.
- name: Run unit tests
run: python -m unittest discover
This last step runs your tests automatically and shows any errors if something breaks. Here is the full file for reference:
name: Run Unit Tests
on:
push:
branches: [ main ]
pull_request:
branches: [ main ]
jobs:
test:
runs-on: ubuntu-latest
steps:
- name: Checkout repository
uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: "3.10"
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -r requirements.txt
- name: Run unit tests
run: python -m unittest discover
// Reviewing Test Results In GitHub Actions
Once you have uploaded all the files to your GitHub repository, go to the Actions tab by clicking Actions, as you can see from the screenshot below.
Once you click on Actions, you will see a green checkmark if everything ran successfully, like in the screenshot below.
Click into the “Update test.yml” to see what actually happened. You will get a full breakdown, from setting up Python to running the test. If all tests pass:
- Each step will have a check mark.
- That confirms everything worked as expected.
- It means your code behaves as intended at every stage, based on the tests you defined.
- The output matches the goals you set when creating those tests.
Let us see:
As you can see, our unit test completed in just 1 second, and the entire CI process finished in 17 seconds, verifying everything from setup to test execution.
// When A Small Change Breaks The Test
Not every change will pass the test. Let us say you accidentally rename a column in solution.py, and send the changes to GitHub, for example:
# Original (works fine)
df['net_new_products'] = df['product_name_2020'] - df['product_name_2019']
# Accidental change
df['new_products'] = df['product_name_2020'] - df['product_name_2019']
Let us now see the test results in the action tab.
We have an error. Let us click it to see the details.
The unit tests did not pass, so let us click “Run unit tests” to see the full error message.
As you can see, our tests found the issue with a KeyError: 'net_new_products', because the column name in the function no longer matches what the test expects.
That is how you keep your code under constant check. If you or someone on your team makes a mistake, the tests act as your safety net.
# Using Version Control To Track And Test Changes
Versioning helps you track every change you make, whether it is in your logic, your tests, or your dataset. Say you want to try a new way to group the data. Instead of editing the main script directly, create a new branch:
git checkout -b refactor-grouping
Here is what is next:
- Make your changes, commit them, and run the tests.
- If all tests pass, meaning the code works as expected, merge it.
- If not, revert the branch without affecting the main code.
That is the power of version control: every change is tracked, testable, and reversible.
# Final Thoughts
Most people stop after getting the right answer. But real-world data solutions ask more than that. They reward those who can build queries that hold up over time, not just once.
With versioning, unit tests, and a simple CI setup, even a one-off interview question becomes a reliable, reusable part of your portfolio.
Nate Rosidi is a data scientist and in product strategy. He’s also an adjunct professor teaching analytics, and is the founder of StrataScratch, a platform helping data scientists prepare for their interviews with real interview questions from top companies. Nate writes on the latest trends in the career market, gives interview advice, shares data science projects, and covers everything SQL.
