Close Menu

    Subscribe to Updates

    Get the latest news from tastytech.

    What's Hot

    Could Brazilian oil emerge as one of the big winners of the Iran war? | Energy News

    May 25, 2026

    Auditing Model Bias with Balanced Datasets with Mimesis

    May 25, 2026

    Pope Leo Calls For AI To Serve Humanity And Not Concentrate Power

    May 25, 2026
    Facebook X (Twitter) Instagram
    Facebook X (Twitter) Instagram
    tastytech.intastytech.in
    Subscribe
    • AI News & Trends
    • Tech News
    • AI Tools
    • Business & Startups
    • Guides & Tutorials
    • Tech Reviews
    • Automobiles
    • Gaming
    • movies
    tastytech.intastytech.in
    Home»Business & Startups»Auditing Model Bias with Balanced Datasets with Mimesis
    Auditing Model Bias with Balanced Datasets with Mimesis
    Business & Startups

    Auditing Model Bias with Balanced Datasets with Mimesis

    gvfx00@gmail.comBy gvfx00@gmail.comMay 25, 2026No Comments5 Mins Read
    Share
    Facebook Twitter LinkedIn Pinterest Email



     

    Table of Contents

    Toggle
    • # Introduction
    • # Step-by-Step Guide
    • # Wrapping Up
      • Related posts:
    • 5 Gemini Prompts for JEE Preparation [MUST READ]
    • 5 Open Source Image Editing AI Models
    • System to Stop Fraud Rings

    # Introduction

     
    Whether they are well-established classifiers or state-of-the-art massive models like large language models (LLMs), building machine learning solutions often entails a risk: algorithms might silently adopt prejudices inherent in the historical training dataset they were trained on. But in a high-stakes scenario or one where data is sensitive, how can we audit whether a model is biased without compromising real-world information?

    This hands-on article guides you in training a simple classification model for “loan approval” on biased data. Based on this, we will use Mimesis, an open-source library that can help generate a perfectly balanced, counterfactual dataset. You’ll be able to test “fake” users with identical financial backgrounds but different demographic characteristics, thereby determining whether the model discriminates against certain groups or not.

     

    # Step-by-Step Guide

     
    Start by installing the Mimesis library if you are new to using it, or you are working on a cloud notebook environment like Colab:

     

    Before auditing a model, we actually need to get one! In this example, we will synthetically generate a dataset of 1,000 bank customers, with just two features: gender and income. These features are categorical and numerical, respectively. The data creation will be intentionally manipulated so that the gender attribute unfairly influences the binary outcome: loan approval. Specifically, for labeling the dataset, we will consider a scenario in which men are generally approved, whereas women are only approved when they have remarkably high income.

    The process to create this clearly biased dataset and train a decision tree classifier on it is shown below:

    import pandas as pd
    import numpy as np
    from sklearn.tree import DecisionTreeClassifier
    
    # 1. Simulating biased historical data (1000 instances)
    np.random.seed(42)
    n_train = 1000
    genders = np.random.choice(['Male', 'Female'], n_train)
    incomes = np.random.randint(30000, 120000, n_train)
    
    approvals = []
    for gender, income in zip(genders, incomes):
        if gender == 'Male':
            # Historically, males are approved
            approvals.append(1)
        else:
            # Only females with high income are approved
            approvals.append(1 if income > 80000 else 0)
    
    train_df = pd.DataFrame({'Gender': genders, 'Income': incomes, 'Approved': approvals})
    
    # Converting categories to numbers for the machine learning model
    train_df['Gender_Code'] = train_df['Gender'].map({'Male': 1, 'Female': 0})
    
    # 2. Training a Decision Tree classifier
    model = DecisionTreeClassifier(max_depth=3)
    model.fit(train_df[['Gender_Code', 'Income']], train_df['Approved'])

     

    The next step shows Mimesis in action. We will use this library to generate a small set of test subjects using the Generic class. This will be done by defining three base financial profiles that contain random UUIDs (universally unique identifiers) and a moderate income ranging between 40K and 70K. Notice that these profiles will not have gender information incorporated yet:

    from mimesis import Generic
    
    generic = Generic('en')
    
    # Generating 3 base financial profiles
    base_profiles = []
    for _ in range(3):
        profile = {
            'Applicant_ID': generic.cryptographic.uuid(),
            'Income': generic.random.randint(40000, 70000) # Moderate income
        }
        base_profiles.append(profile)

     

    For example, the three newly created profiles may look something like:

    [{'Applicant_ID': '1f1721e1-19af-4bd1-8488-6abf01404ef9', 'Income': 44815},
     {'Applicant_ID': '5c862597-7f55-43f4-9d6e-ac9cc0b9083e', 'Income': 47436},
     {'Applicant_ID': '3479d4cf-0d9b-4f06-9c43-1c3b7e787830', 'Income': 58194}]

     

    Let’s finish building our counterfactual set of examples, which constitutes the core of our auditing process! For each of the three base profiles, we will create two cloned counterfactual instances: one being male and the other being female. For each pair of test customers, their application ID and income will be totally identical, so the only difference will be the gender: any difference in how our trained decision tree model treats them will undoubtedly be proof of gender bias.

    counterfactual_data = []
    
    for profile in base_profiles:
        # Version A: Male Counterfactual
        counterfactual_data.append({
            'Applicant_ID': profile['Applicant_ID'], 
            'Gender': 'Male', 
            'Gender_Code': 1, 
            'Income': profile['Income']
        })
        
        # Version B: Female Counterfactual
        counterfactual_data.append({
            'Applicant_ID': profile['Applicant_ID'], 
            'Gender': 'Female', 
            'Gender_Code': 0, 
            'Income': profile['Income']
        })
    
    audit_df = pd.DataFrame(counterfactual_data)

     

    This is what the three pairs of customers may look like:

    1f1721e1-19af-4bd1-8488-6abf01404ef9	Male	1	44815
    1	1f1721e1-19af-4bd1-8488-6abf01404ef9	Female	0	44815
    2	5c862597-7f55-43f4-9d6e-ac9cc0b9083e	Male	1	47436
    3	5c862597-7f55-43f4-9d6e-ac9cc0b9083e	Female	0	47436
    4	3479d4cf-0d9b-4f06-9c43-1c3b7e787830	Male	1	58194
    5	3479d4cf-0d9b-4f06-9c43-1c3b7e787830	Female	0	58194

     

    A key point to insist on here: we have just used Mimesis to instantly build perfectly matched “clones” of loan applicants with identical income but different genders. This underlines the library’s value in providing total statistical control, isolating a protected attribute.

    Now it’s time to probe the model and see what it reveals.

    # Asking the model to predict approval for our counterfactuals
    audit_df['Predicted_Approval'] = model.predict(audit_df[['Gender_Code', 'Income']])
    
    # Formatting the output for readability (1 = Approved, 0 = Denied)
    audit_df['Predicted_Approval'] = audit_df['Predicted_Approval'].map({1: 'Approved', 0: 'Denied'})
    
    print("\n--- Model Audit Results ---")
    print(audit_df[['Applicant_ID', 'Gender', 'Income', 'Predicted_Approval']].sort_values('Applicant_ID'))

     

    The decision-making results yielded by our model could not be clearer:

    --- Model Audit Results ---
                               Applicant_ID  Gender  Income Predicted_Approval
    0  1f1721e1-19af-4bd1-8488-6abf01404ef9    Male   44815           Approved
    1  1f1721e1-19af-4bd1-8488-6abf01404ef9  Female   44815             Denied
    4  3479d4cf-0d9b-4f06-9c43-1c3b7e787830    Male   58194           Approved
    5  3479d4cf-0d9b-4f06-9c43-1c3b7e787830  Female   58194             Denied
    2  5c862597-7f55-43f4-9d6e-ac9cc0b9083e    Male   47436           Approved
    3  5c862597-7f55-43f4-9d6e-ac9cc0b9083e  Female   47436             Denied

     

    Notice that for the exact same Applicant_ID and Income, male clones are approved for the loan. Meanwhile, female clones with such moderate income are generally denied. The Mimesis functionalities we used based on profiles helped us hold all other variables constant, thereby successfully isolating and exposing the model’s discriminatory decision-making.

     

    # Wrapping Up

     
    Throughout this hands-on article, we have shown how Mimesis can be used to generate balanced, counterfactual data examples — without privacy or sensitive data constraints — that can help audit a model’s behavior and identify whether the model is behaving in a biased manner or not. Next steps to take if your model is biased may include:

    • Augmenting your training data with more balanced profiles to correct historical skewness or bias.
    • Depending on the model type, using model re-weighting strategies.
    • Utilizing open-source toolkits for fairness — for instance, AI Fairness 360 — which are helpful for bias mitigation in machine learning pipelines.

     
     

    Iván Palomares Carrascosa is a leader, writer, speaker, and adviser in AI, machine learning, deep learning & LLMs. He trains and guides others in harnessing AI in the real world.

    Related posts:

    7 Practical Ways to Reduce Claude Code Token Usage

    10 Agentic AI Concepts Explained in Under 10 Minutes

    11 Business & Tech Factors to Consider Before You Start

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticlePope Leo Calls For AI To Serve Humanity And Not Concentrate Power
    Next Article Could Brazilian oil emerge as one of the big winners of the Iran war? | Energy News
    gvfx00@gmail.com
    • Website

    Related Posts

    Business & Startups

    Build a Browser Agent with Playwright MCP and Claude Desktop

    May 24, 2026
    Business & Startups

    Which Library Should You Choose?

    May 23, 2026
    Business & Startups

    Alibaba’s New Agent-First LLM for Coding

    May 22, 2026
    Add A Comment
    Leave A Reply Cancel Reply

    Top Posts

    Black Swans in Artificial Intelligence — Dan Rose AI

    October 2, 2025165 Views

    Every Clue That Tony Stark Was Always Doctor Doom

    October 20, 2025103 Views

    We let ChatGPT judge impossible superhero debates — here’s how it ruled

    December 31, 202585 Views
    Stay In Touch
    • Facebook
    • YouTube
    • TikTok
    • WhatsApp
    • Twitter
    • Instagram

    Subscribe to Updates

    Get the latest tech news from tastytech.

    About Us
    About Us

    TastyTech.in brings you the latest AI, tech news, cybersecurity tips, and gadget insights all in one place. Stay informed, stay secure, and stay ahead with us!

    Most Popular

    Black Swans in Artificial Intelligence — Dan Rose AI

    October 2, 2025165 Views

    Every Clue That Tony Stark Was Always Doctor Doom

    October 20, 2025103 Views

    We let ChatGPT judge impossible superhero debates — here’s how it ruled

    December 31, 202585 Views

    Subscribe to Updates

    Get the latest news from tastytech.

    Facebook X (Twitter) Instagram Pinterest
    • Homepage
    • About Us
    • Contact Us
    • Privacy Policy
    © 2026 TastyTech. Designed by TastyTech.

    Type above and press Enter to search. Press Esc to cancel.

    Ad Blocker Enabled!
    Ad Blocker Enabled!
    Our website is made possible by displaying online advertisements to our visitors. Please support us by disabling your Ad Blocker.