Versioning and Testing Data Solutions: Applying CI and Unit Tests on Interview-style Queries

Image by Author

# Introduction

Everyone focuses on solving the problem, but almost no one tests the solution. Sometimes, a perfectly working script can break with just one new row of data or a slight change in the logic.

In this article, we will solve a Tesla interview question in Python and show how versioning and unit tests turn a fragile script into a reliable solution by following three steps. We will start with the interview question and end with automated testing using GitHub Actions.

Image by Author

We will go through these three steps to make a data solution production-ready.

First, we will solve a real interview question from Tesla. Next, we will add unit tests to ensure the solution stays reliable over time. Finally, we will use GitHub Actions to automate testing and version control.

# Solving A Real Interview Question From Tesla

New Products
Calculate the net change in the number of products launched by companies in 2020 compared to 2019. Your output should include the company names and the net difference.
(Net difference = Number of products launched in 2020 – The number launched in 2019.)

In this interview question from Tesla, you are asked to measure product growth across two years.

The task is to return each company’s name along with the difference in product count between 2020 and 2019.

// Understanding The Dataset

Let us first look at the dataset we are working with. Here are the column names.

Column Name	Data Type
year	int64
company_name	object
product_name	object

Let us preview the dataset.

Year	Company_name	Product_name
2019	Toyota	Avalon
2019	Toyota	Camry
2020	Toyota	Corolla
2019	Honda	Accord
2019	Honda	Passport

This dataset contains three columns: year, company_name, and product_name. Each row represents a car model released by a company in a given year.

// Writing The Python Solution

We will use basic pandas operations to group, compare, and calculate the net product change per company. The function we will write splits the data into subsets for 2019 and 2020.

Next, it merges them by company names and counts the number of unique products launched each year.

import pandas as pd
import numpy as np
from datetime import datetime

df_2020 = car_launches[car_launches['year'].astype(str) == '2020']
df_2019 = car_launches[car_launches['year'].astype(str) == '2019']
df = pd.merge(df_2020, df_2019, how='outer', on=[
    'company_name'], suffixes=['_2020', '_2019']).fillna(0)

The final output subtracts 2019 counts from 2020 to get the net difference. Here is the entire code.

import pandas as pd
import numpy as np
from datetime import datetime

df_2020 = car_launches[car_launches['year'].astype(str) == '2020']
df_2019 = car_launches[car_launches['year'].astype(str) == '2019']
df = pd.merge(df_2020, df_2019, how='outer', on=[
    'company_name'], suffixes=['_2020', '_2019']).fillna(0)
df = df[df['product_name_2020'] != df['product_name_2019']]
df = df.groupby(['company_name']).agg(
    {'product_name_2020': 'nunique', 'product_name_2019': 'nunique'}).reset_index()
df['net_new_products'] = df['product_name_2020'] - df['product_name_2019']
result = df[['company_name', 'net_new_products']]

// Viewing The Expected Output

Here is the expected output.

Company_name	Net_new_products
Chevrolet	2
Ford	-1
Honda	-3
Jeep	1
Toyota	-1

# Making The Solution Reliable With Unit Tests

Solving a data problem once does not mean it will keep working. A new row or a logic tweak can silently break your script. For instance, imagine you accidentally rename a column in your code, changing this line:

df['net_new_products'] = df['product_name_2020'] - df['product_name_2019']

to this:

df['new_products'] = df['product_name_2020'] - df['product_name_2019']

The logic still runs, but your output (and tests) will suddenly fail because the expected column name no longer matches. Unit tests fix that. They check if the same input still gives the same output, every time. If something breaks, the test fails and shows exactly where. We will do this in three steps, from turning the interview question’s solution into a function to writing a test that checks the output against what we expect.

Image by Author

// Turning The Script Into A Reusable Function

Before writing tests, we need to make our solution reusable and easy to test. Converting it into a function allows us to run it with different datasets and verify the output automatically, without having to rewrite the same code every time. We changed the original code into a function that accepts a DataFrame and returns a result. Here is the code.

def calculate_net_new_products(car_launches):
    df_2020 = car_launches[car_launches['year'].astype(str) == '2020']
    df_2019 = car_launches[car_launches['year'].astype(str) == '2019']

    df = pd.merge(df_2020, df_2019, how='outer', on=[
        'company_name'], suffixes=['_2020', '_2019']).fillna(0)

    df = df[df['product_name_2020'] != df['product_name_2019']]

    df = df.groupby(['company_name']).agg({
        'product_name_2020': 'nunique',
        'product_name_2019': 'nunique'
    }).reset_index()

    df['net_new_products'] = df['product_name_2020'] - df['product_name_2019']
    return df[['company_name', 'net_new_products']]

// Defining Test Data And Expected Output

Before running any tests, we need to know what “correct” looks like. Defining the expected output gives us a clear benchmark to compare our function’s results against. So, we will build a small test input and clearly define what the correct output should be.

import pandas as pd

# Sample test data
test_data = pd.DataFrame({
    'year': [2019, 2019, 2020, 2020],
    'company_name': ['Toyota', 'Toyota', 'Toyota', 'Toyota'],
    'product_name': ['Camry', 'Avalon', 'Corolla', 'Yaris']
})

# Expected output
expected_output = pd.DataFrame({
    'company_name': ['Toyota'],
    'net_new_products': [0]  # 2 in 2020 - 2 in 2019
})

// Writing And Running Unit Tests

The following test code checks if your function returns exactly what you expect.

If not, the test fails and tells you why, down to the last row or column.

The test below uses the function from the previous step (calculate_net_new_products()) and the expected output we defined.

import unittest

class TestProductDifference(unittest.TestCase):
    def test_net_new_products(self):
        result = calculate_net_new_products(test_data)
        result = result.sort_values('company_name').reset_index(drop=True)
        expected = expected_output.sort_values('company_name').reset_index(drop=True)

        pd.testing.assert_frame_equal(result, expected)

if __name__ == '__main__':
    unittest.main()

# Automating Tests With Continuous Integration

Writing tests is a good start, but only if they actually run. You could run the tests manually after every change, but that does not scale, it is easy to forget, and team members may use different setups. Continuous Integration (CI) solves this by running tests automatically whenever code changes are pushed to the repository.

GitHub Actions is a free CI tool that does this on every push, keeping your solution reliable even when the code, data, or logic changes. It runs your tests automatically on every push, so your solution stays reliable even when the code, data, or logic changes. Here is how to apply CI with GitHub Actions.

Image by Author

// Organizing Your Project Files

To apply CI to an interview query, you first need to push your solution to a GitHub repository. (To learn how to create a GitHub repo, please read this).

Then, set up the following files:

solution.py: Interview questions solution from Step 2.1
expected_output.py: Defines test input and expected output from Step 2.2
test_solution.py: Unit test using unittest from Step 2.3
requirements.txt: Dependencies (e.g., pandas)
.github/workflows/test.yml: GitHub Actions workflow file
data/car_launches.csv: Input dataset used by the solution

// Understanding The Repository Layout

The repository is organized this way so GitHub Actions can find everything it needs in your GitHub repository without extra setup. It keeps things simple, consistent, and easy for both you and others to work with.

my-query-solution/
├── data/
│   └── car_launches.csv
├── solution.py
├── expected_output.py 
├── test_solution.py
├── requirements.txt
└── .github/
    └── workflows/
        └── test.yml

// Creating A GitHub Actions Workflow

Now that you have all the files, the last one you need is test.yml. This file tells GitHub Actions how to run your tests automatically when code changes.

First, we name the workflow and tell GitHub when to run it.

name: Run Unit Tests

on:
  push:
    branches: [ main ]
  pull_request:
    branches: [ main ]

This means the tests will run every time someone pushes code or opens a pull request on the main branch. Next, we create a job that defines what will happen inside the workflow.

jobs:
  test:
    runs-on: ubuntu-latest

The job runs on GitHub’s Ubuntu environment, which gives you a clean setup each time. Now we add steps inside that job. The first one checks out your repository so GitHub Actions can access your code.

    - name: Checkout repository
      uses: actions/checkout@v4

Then we set up Python and choose the version we want to use.

    - name: Set up Python
      uses: actions/setup-python@v5
      with:
        python-version: "3.10"

After that, we install all the dependencies listed in requirements.txt.

    - name: Install dependencies
      run: |
        python -m pip install --upgrade pip
        pip install -r requirements.txt

Finally, we run all unit tests in the project.

    - name: Run unit tests
      run: python -m unittest discover

This last step runs your tests automatically and shows any errors if something breaks. Here is the full file for reference:

name: Run Unit Tests

on:
  push:
    branches: [ main ]
  pull_request:
    branches: [ main ]

jobs:
  test:
    runs-on: ubuntu-latest
    
    steps:
    - name: Checkout repository
      uses: actions/checkout@v4
      
    - name: Set up Python
      uses: actions/setup-python@v5
      with:
        python-version: "3.10"
        
    - name: Install dependencies
      run: |
        python -m pip install --upgrade pip
        pip install -r requirements.txt
        
    - name: Run unit tests
      run: python -m unittest discover

// Reviewing Test Results In GitHub Actions

Once you have uploaded all the files to your GitHub repository, go to the Actions tab by clicking Actions, as you can see from the screenshot below.

Once you click on Actions, you will see a green checkmark if everything ran successfully, like in the screenshot below.

Click into the “Update test.yml” to see what actually happened. You will get a full breakdown, from setting up Python to running the test. If all tests pass:

Each step will have a check mark.
That confirms everything worked as expected.
It means your code behaves as intended at every stage, based on the tests you defined.
The output matches the goals you set when creating those tests.

Let us see:

As you can see, our unit test completed in just 1 second, and the entire CI process finished in 17 seconds, verifying everything from setup to test execution.

// When A Small Change Breaks The Test

Not every change will pass the test. Let us say you accidentally rename a column in solution.py, and send the changes to GitHub, for example:

# Original (works fine)
df['net_new_products'] = df['product_name_2020'] - df['product_name_2019']

# Accidental change
df['new_products'] = df['product_name_2020'] - df['product_name_2019']

Let us now see the test results in the action tab.

We have an error. Let us click it to see the details.

The unit tests did not pass, so let us click “Run unit tests” to see the full error message.

As you can see, our tests found the issue with a KeyError: 'net_new_products', because the column name in the function no longer matches what the test expects.

That is how you keep your code under constant check. If you or someone on your team makes a mistake, the tests act as your safety net.

# Using Version Control To Track And Test Changes

Versioning helps you track every change you make, whether it is in your logic, your tests, or your dataset. Say you want to try a new way to group the data. Instead of editing the main script directly, create a new branch:

git checkout -b refactor-grouping

Here is what is next:

Make your changes, commit them, and run the tests.
If all tests pass, meaning the code works as expected, merge it.
If not, revert the branch without affecting the main code.

That is the power of version control: every change is tracked, testable, and reversible.

# Final Thoughts

Most people stop after getting the right answer. But real-world data solutions ask more than that. They reward those who can build queries that hold up over time, not just once.

With versioning, unit tests, and a simple CI setup, even a one-off interview question becomes a reliable, reusable part of your portfolio.

Nate Rosidi is a data scientist and in product strategy. He’s also an adjunct professor teaching analytics, and is the founder of StrataScratch, a platform helping data scientists prepare for their interviews with real interview questions from top companies. Nate writes on the latest trends in the career market, gives interview advice, shares data science projects, and covers everything SQL.

What's Hot

Ford posts its biggest loss since the Global Financial Crisis

Using synthetic biology and AI to address global antimicrobial resistance threat | MIT News

Barclays bets on AI to cut costs and boost returns

Versioning and Testing Data Solutions: Applying CI and Unit Tests on Interview-style Queries

3 Ways to Anonymize and Protect User Data in Your ML Pipeline

10 Most Popular GitHub Repositories for Learning AI

AI Rapper Turns Out To Be A Racist

How to Create Your AI Caricature Using ChatGPT Image?

How to Improve Student Retention: AI-Powered Early Intervention That Works in 2026

AI Agents Explained in 3 Levels of Difficulty

BMW Will Put eFuel In Cars Made In Germany From 2028

Best Sonic Lego Deals – Dr. Eggman’s Drillster Gets Big Price Cut

What is Fine-Tuning? Your Ultimate Guide to Tailoring AI Models in 2025

Most Popular

BMW Will Put eFuel In Cars Made In Germany From 2028

Best Sonic Lego Deals – Dr. Eggman’s Drillster Gets Big Price Cut

What is Fine-Tuning? Your Ultimate Guide to Tailoring AI Models in 2025

Subscribe to Updates

What's Hot

Versioning and Testing Data Solutions: Applying CI and Unit Tests on Interview-style Queries

# Introduction

# Solving A Real Interview Question From Tesla

// Understanding The Dataset

// Writing The Python Solution

// Viewing The Expected Output

# Making The Solution Reliable With Unit Tests

// Turning The Script Into A Reusable Function

// Defining Test Data And Expected Output

// Writing And Running Unit Tests

# Automating Tests With Continuous Integration

// Organizing Your Project Files

// Understanding The Repository Layout

// Creating A GitHub Actions Workflow

// Reviewing Test Results In GitHub Actions

// When A Small Change Breaks The Test

# Using Version Control To Track And Test Changes

# Final Thoughts

Related posts:

3 Ways to Anonymize and Protect User Data in Your ML Pipeline

10 Most Popular GitHub Repositories for Learning AI

AI Rapper Turns Out To Be A Racist

Related Posts

Subscribe to Updates