Close Menu

    Subscribe to Updates

    Get the latest news from tastytech.

    What's Hot

    Captains Cup’, ‘Jenny LeClue – Detectivu’ and More – TouchArcade

    May 21, 2026

    Does Queen Maeve Come Back for ‘The Boys’ Finale?

    May 21, 2026

    2026 Mazda BT-50 SP review

    May 21, 2026
    Facebook X (Twitter) Instagram
    Facebook X (Twitter) Instagram
    tastytech.intastytech.in
    Subscribe
    • AI News & Trends
    • Tech News
    • AI Tools
    • Business & Startups
    • Guides & Tutorials
    • Tech Reviews
    • Automobiles
    • Gaming
    • movies
    tastytech.intastytech.in
    Home»Business & Startups»Anonymizing Production Data for Data Science with Mimesis
    Anonymizing Production Data for Data Science with Mimesis
    Business & Startups

    Anonymizing Production Data for Data Science with Mimesis

    gvfx00@gmail.comBy gvfx00@gmail.comMay 20, 2026No Comments5 Mins Read
    Share
    Facebook Twitter LinkedIn Pinterest Email



     

    Table of Contents

    Toggle
    • # Introduction
    • # Step-by-Step Procedure
    • # Wrapping Up
      • Related posts:
    • Make PPTs, PDFs, and Excel Sheets in Seconds With Kimi K2.5
    • Abacus AI Honest Review And Pricing: The AI That Lets You Vibe Code, Build Agents & Replace 10+ Tool...
    • 5 Useful Python Scripts for Automated Data Quality Checks

    # Introduction

     
    Production data is typically subject to notable privacy and compliance constraints. For this reason, anonymizing such data becomes critical in virtually every real-world data science project involving the launch of a data-driven product, service, or solution.

    Mimesis is an open-source Python library that stands out for its ability to generate realistic “fake” data in a high-performance fashion. Mimesis runs locally and provides a free, robust data pipeline solution. This article will show you how to utilize this library for anonymizing sensitive production data, based on a step-by-step example you can easily try in your IDE or a notebook environment.

     

    # Step-by-Step Procedure

     
    Assuming you are new to Mimesis, you may need to install it in your Python environment with a command like:

     

    Remember to add ! at the beginning of the pip command if you are working in a Google Colab notebook environment or similar.

    Now we are ready to start! We will consider a scenario revolving around a software product’s tier-based subscription system. For simplicity, we will synthetically generate a toy dataset containing data about customers and their subscription type. There is highly sensitive data in some of the dataset variables, as you can observe below:

    import pandas as pd
    
    # Creation of a mock "production" customer dataset
    production_data = {
        'user_id': [101, 102, 103, 104],
        'real_name': ['Alice Smith', 'Bob Jones', 'Charlie Brown', 'Diana Prince'],
        'email': ['alice.smith@corp.com', 'bjones@startup.io', 'cbrown@domain.org', 'diana@amazon.com'],
        'phone': ['555-0100', '555-0101', '555-0102', '555-0103'],
        'subscription_tier': ['Premium', 'Basic', 'Basic', 'Enterprise']
    }
    
    df = pd.DataFrame(production_data)
    print("--- Original Sensitive Data ---")
    print(df.head())

     

    While subscription tiers are not necessarily sensitive data in our example, user names, emails, and phone numbers are. With the aid of Mimesis, we can initialize a provider: a sort of tailored data anonymization template suited to the type of data we have. Since our data observations are associated with people, we can import and use the Person class — a provider that, given a specific language like English and aided by a random seed, can be used to generate fake substitutes for real, sensitive personal data:

    from mimesis import Person
    from mimesis.locales import Locale
    
    # Initializing a Person provider for English locales
    person = Person(locale=Locale.EN, seed=42)

     

    From this point onwards, the process to anonymize personally identifiable information (PII) is quite simple. All it takes is replacing the sensitive columns — specified by us — with freshly generated data from the Mimesis person locale generator. This is done by iterating through the DataFrame object containing the whole dataset and calling suitable Mimesis functions to realistically create substitutes for the data, depending on each given attribute:

    # 1. Replacing real names with fake, realistic names
    df['real_name'] = [person.full_name() for _ in range(len(df))]
    
    # 2. Replacing real emails with fake ones
    df['email'] = [person.email() for _ in range(len(df))]
    
    # 3. Replacing real phone numbers
    df['phone'] = [person.telephone() for _ in range(len(df))]
    
    # 4. Renaming the column to reflect that it is no longer the real name
    df.rename(columns={'real_name': 'anon_name'}, inplace=True)

     

    Notice above how Mimesis’ Person class provides dedicated functions for generating full names, emails, and telephone numbers, among others. In addition, the name column is renamed to reflect that the name included in the updated dataset is no longer real but anonymized.

    We now verify the results by looking at the transformed DataFrame. The sensitive PII fields have completely changed: they are now overwritten with legitimate-looking synthetic data, keeping the overall dataset structured and important information for downstream analyses like subscription_tier absolutely intact.

    print("\n--- Anonymized Data for Data Science Analyses ---")
    print(df.head())

     

    Output:

    --- Anonymized Data for Data Science Analyses ---
       user_id         anon_name                    email            phone  \
    0      101    Anthony Reilly    archived1911@duck.com     +13312271333   
    1      102           Kai Day    suspect2087@yahoo.com  +1-205-759-3586   
    2      103  Cleveland Osborn     urgent1912@yahoo.com     +13691067988   
    3      104       Zack Holder  johnson1881@example.com  +1-574-481-3676   
    
      subscription_tier  
    0           Premium  
    1             Basic  
    2             Basic  
    3        Enterprise  

     

    Fantastic! We have just applied a few simple steps to anonymize several sensitive data fields typically found in real-world, production data science projects and analyses — all for free, thanks to Mimesis being open-source.

    To finalize, here are some best practices and observations for conducting the anonymization process we just covered:

    • We replaced the columns directly in the DataFrame. Depending on your context, consider whether this is the right approach, or whether you may want to store the new information in a separate DataFrame if there is a risk of losing the original data.
    • Mimesis operates in a data-consistent fashion, so generated data matches the expected data types.
    • Seeding helps keep generated information consistent across different runs and facilitates reproducibility.

     

    # Wrapping Up

     
    In this article, we have shown how to use Mimesis — a powerful Python library for anonymized and fake data generation — to transform a sensitive production dataset into a version that can be safely used for further analysis without compromising private information like real people’s PII.
     
     

    Iván Palomares Carrascosa is a leader, writer, speaker, and adviser in AI, machine learning, deep learning & LLMs. He trains and guides others in harnessing AI in the real world.

    Related posts:

    Gemma 4 Tool Calling Explained: Step-by-Step Guide

    5 Useful Python Scripts to Automate Boring Everyday Tasks

    Abacus AI Review: Features, AI Agents & Automation Explained (Honest Guide)

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticleGemini Spark Gives Google Way Too Much Access to Your Data
    Next Article Alibaba’s AI Agent Chip Strategy Goes Beyond Nvidia
    gvfx00@gmail.com
    • Website

    Related Posts

    Business & Startups

    Best Small Language Models on Hugging Face Right Now!

    May 21, 2026
    Business & Startups

    SQL Window Functions Beyond Basics: Solving Real Business Problems

    May 21, 2026
    Business & Startups

    Top 9 AI Events and Conferences in 2026 that you Must Attend

    May 21, 2026
    Add A Comment
    Leave A Reply Cancel Reply

    Top Posts

    Black Swans in Artificial Intelligence — Dan Rose AI

    October 2, 2025163 Views

    Every Clue That Tony Stark Was Always Doctor Doom

    October 20, 202599 Views

    We let ChatGPT judge impossible superhero debates — here’s how it ruled

    December 31, 202582 Views
    Stay In Touch
    • Facebook
    • YouTube
    • TikTok
    • WhatsApp
    • Twitter
    • Instagram

    Subscribe to Updates

    Get the latest tech news from tastytech.

    About Us
    About Us

    TastyTech.in brings you the latest AI, tech news, cybersecurity tips, and gadget insights all in one place. Stay informed, stay secure, and stay ahead with us!

    Most Popular

    Black Swans in Artificial Intelligence — Dan Rose AI

    October 2, 2025163 Views

    Every Clue That Tony Stark Was Always Doctor Doom

    October 20, 202599 Views

    We let ChatGPT judge impossible superhero debates — here’s how it ruled

    December 31, 202582 Views

    Subscribe to Updates

    Get the latest news from tastytech.

    Facebook X (Twitter) Instagram Pinterest
    • Homepage
    • About Us
    • Contact Us
    • Privacy Policy
    © 2026 TastyTech. Designed by TastyTech.

    Type above and press Enter to search. Press Esc to cancel.

    Ad Blocker Enabled!
    Ad Blocker Enabled!
    Our website is made possible by displaying online advertisements to our visitors. Please support us by disabling your Ad Blocker.