Close Menu

    Subscribe to Updates

    Get the latest news from tastytech.

    What's Hot

    The Sims 4’s much maligned paid mods have made it to consoles, but Kits are coming back to PC

    April 17, 2026

    A New Stephen King Sci-Fi Movie Is Officially on Prime Video

    April 17, 2026

    Toyota Kluger future secured in Australia, but EV still a no-go

    April 17, 2026
    Facebook X (Twitter) Instagram
    Facebook X (Twitter) Instagram
    tastytech.intastytech.in
    Subscribe
    • AI News & Trends
    • Tech News
    • AI Tools
    • Business & Startups
    • Guides & Tutorials
    • Tech Reviews
    • Automobiles
    • Gaming
    • movies
    tastytech.intastytech.in
    Home»Business & Startups»Docker for Python & Data Projects: A Beginner’s Guide
    Docker for Python & Data Projects: A Beginner’s Guide
    Business & Startups

    Docker for Python & Data Projects: A Beginner’s Guide

    gvfx00@gmail.comBy gvfx00@gmail.comApril 17, 2026No Comments12 Mins Read
    Share
    Facebook Twitter LinkedIn Pinterest Email



    Image by Author

     

    Table of Contents

    Toggle
    • # Introduction
    • # Prerequisites
    • # Containerizing a Python Script with Pinned Dependencies
        • // Structuring the Project
        • // Writing the Script
        • // Pinning Dependencies
        • // Defining the Dockerfile
        • // Building and Running
    • # Serving a Machine Learning Model with FastAPI
        • // Structuring the Project
        • // Writing the App
        • // Defining the Dockerfile
        • // Building and Running
    • # Building a Multi-Service Pipeline with Docker Compose
        • // Structuring the Project
        • // Defining the Compose File
        • // Writing the Loader Script
        • // Starting Everything
    • # Scheduling Jobs with a Cron Container
        • // Structuring the Project
        • // Writing the Fetch Script
        • // Defining the Crontab
        • // Defining the Dockerfile
        • // Building and Running
    • # Wrapping Up
      • Related posts:
    • Top Artificial Intelligence Companies To Work With In 2024
    • AI Joins The Dark Side
    • Why I Quit My 6 Figure Side Hustle for a Full-Time Data Science Job

    # Introduction

     
    Python and data projects have a dependency problem. Between Python versions, virtual environments, system-level packages, and operating system differences, getting someone else’s code to run on your machine can sometimes take longer than understanding the code itself.

    Docker solves this by packaging your code and its entire environment — Python version, dependencies, system libraries — into a single artifact called the image. From the image you can start containers that run identically on your laptop, your teammate’s machine, and a cloud server. You stop debugging environments and start shipping work.

    In this article, you’ll learn Docker through practical examples with a focus on data projects: containerizing a script, serving a machine learning model with FastAPI, wiring up a multi-service pipeline with Docker Compose, and scheduling a job with a cron container.

     

    # Prerequisites

     
    Before working through the examples, you’ll need:

    • Docker and Docker Compose installed for your operating system. Follow the official installation guide for your platform.
    • Familiarity with the command line and Python.
    • Familiarity with writing a Dockerfile, building an image, and running a container from that image.

    If you’d like a quick refresher, here are a couple of articles to get you up to speed:

    You don’t need deep Docker knowledge to follow along. Each example explains what’s happening as it goes.

     

    # Containerizing a Python Script with Pinned Dependencies

     
    Let’s start with the most common use case: you have a Python script and a requirements.txt, and you want it to run reliably anywhere.

    We’ll build a data cleaning script that reads a raw sales CSV file, removes duplicates, fills in missing values, and writes a cleaned version to disk.

     

    // Structuring the Project

    The project is organized as follows:

    data-cleaner/
    ├── Dockerfile
    ├── requirements.txt
    ├── clean_data.py
    └── data/
        └── raw_sales.csv

     

    // Writing the Script

    Here’s the data cleaning script that uses Pandas to do the heavy lifting:

    # clean_data.py
    import pandas as pd
    import os
    
    INPUT_PATH = "data/raw_sales.csv"
    OUTPUT_PATH = "data/cleaned_sales.csv"
    
    print("Reading data...")
    df = pd.read_csv(INPUT_PATH)
    print(f"Rows before cleaning: {len(df)}")
    
    # Drop duplicate rows
    df = df.drop_duplicates()
    
    # Fill missing numeric values with column median
    for col in df.select_dtypes(include="number").columns:
        df[col] = df[col].fillna(df[col].median())
    
    # Fill missing text values with 'Unknown'
    for col in df.select_dtypes(include="object").columns:
        df[col] = df[col].fillna('Unknown')
    
    print(f"Rows after cleaning: {len(df)}")
    df.to_csv(OUTPUT_PATH, index=False)
    print(f"Cleaned file saved to {OUTPUT_PATH}")

     

    // Pinning Dependencies

    Pinning exact versions is important. Without it, pip install pandas might install different versions on different machines. Pinned versions guarantee everyone gets the same behavior. You can define the exact versions in the requirements.txt file like so:

    pandas==2.2.0
    openpyxl==3.1.2

     

    // Defining the Dockerfile

    This Dockerfile builds a minimal, cache-friendly image for the cleaning script:

    # Use a slim Python 3.11 base image
    FROM python:3.11-slim
    
    # Set the working directory inside the container
    WORKDIR /app
    
    # Copy and install dependencies first (for layer caching)
    COPY requirements.txt .
    RUN pip install --no-cache-dir -r requirements.txt
    
    # Copy the script into the container
    COPY clean_data.py .
    
    # Default command to run when the container starts
    CMD ["python", "clean_data.py"]

     
    There are a few things worth explaining here. We use python:3.11-slim instead of the full Python image because it’s significantly smaller and strips out packages you don’t need.

    We copy requirements.txt before copying the rest of the code and this is intentional. Docker builds images in layers and caches each one. If you only change clean_data.py, Docker won’t reinstall all your dependencies on the next build. It reuses the cached pip layer and jumps straight to copying your updated script. That small ordering decision can save you minutes of rebuild time.

     

    // Building and Running

    With the image built, you can run the container and mount your local data folder:

    # Build the image and tag it
    docker build -t data-cleaner .
    
    # Run it, mounting your local data/ folder into the container
    docker run --rm -v $(pwd)/data:/app/data data-cleaner

     
    The -v $(pwd)/data:/app/data flag mounts your local data/ folder into the container at /app/data. This is how the script reads your CSV and how the cleaned output gets written back to your machine. Nothing is baked into the image and the data stays on your filesystem.

    The --rm flag automatically removes the container after it finishes. Since this is a one-off script, there’s no reason to keep a stopped container lying around.

     

    # Serving a Machine Learning Model with FastAPI

     
    You’ve trained a model and you want to make it available over HTTP so other services can send data and get predictions back. FastAPI works great for this: it’s fast, lightweight, and handles input validation with Pydantic.

     

    // Structuring the Project

    The project separates the model artifact from the application code:

    ml-api/
    ├── Dockerfile
    ├── requirements.txt
    ├── app.py
    └── model.pkl

     

    // Writing the App

    The following app loads the model once at startup and exposes a /predict endpoint:

    # app.py
    from fastapi import FastAPI, HTTPException
    from pydantic import BaseModel
    import pickle
    import numpy as np
    
    app = FastAPI(title="Sales Forecast API")
    
    # Load the model once at startup
    with open("model.pkl", "rb") as f:
        model = pickle.load(f)
    
    class PredictRequest(BaseModel):
        region: str
        month: int
        marketing_spend: float
        units_in_stock: int
    
    class PredictResponse(BaseModel):
        region: str
        predicted_revenue: float
    
    @app.get("/health")
    def health():
        return {"status": "ok"}
    
    @app.post("/predict", response_model=PredictResponse)
    def predict(request: PredictRequest):
        try:
            features = [[
                request.month,
                request.marketing_spend,
                request.units_in_stock
            ]]
            prediction = model.predict(features)
            return PredictResponse(
                region=request.region,
                predicted_revenue=round(float(prediction[0]), 2)
            )
        except Exception as e:
            raise HTTPException(status_code=500, detail=str(e))

     
    The PredictRequest class does the input validation for you. If someone sends a request with a missing field or a string where a number is expected, FastAPI rejects it with a clear error message before your model code even runs. The model is loaded once at startup — not on every request — which keeps response times fast.

    The /health endpoint is a small but important addition: Docker, load balancers, and cloud platforms use it to check whether your service is actually up and ready.

     

    // Defining the Dockerfile

    This Dockerfile bakes the model directly into the image so the container is fully self-contained:

    FROM python:3.11-slim
    
    WORKDIR /app
    
    COPY requirements.txt .
    RUN pip install --no-cache-dir -r requirements.txt
    
    # Copy the model and the app together
    COPY model.pkl .
    COPY app.py .
    
    EXPOSE 8000
    
    CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8000"]

     
    The model.pkl is baked into the image at build time. This means the container is completely self-contained, and you don’t need to mount anything when you run it. The --host 0.0.0.0 flag tells Uvicorn to listen on all network interfaces inside the container, not just localhost. Without this, you won’t be able to reach the API from outside the container.

     

    // Building and Running

    Build the image and start the API server:

    docker build -t ml-api .
    docker run --rm -p 8000:8000 ml-api

     
    Test it with curl:

    curl -X POST http://localhost:8000/predict \
      -H "Content-Type: application/json" \
      -d '{"region": "North", "month": 3, "marketing_spend": 5000.0, "units_in_stock": 320}'

     

    # Building a Multi-Service Pipeline with Docker Compose

     
    Real data projects rarely involve just one process. You might need a database, a script that loads data into it, and a dashboard that reads from it — all running together.

    Docker Compose lets you define and run multiple containers as a single application. Each service has its own container, but they all share a private network so they can talk to each other.

     

    // Structuring the Project

    The pipeline splits each service into its own subdirectory:

    pipeline/
    ├── docker-compose.yml
    ├── loader/
    │   ├── Dockerfile
    │   ├── requirements.txt
    │   └── load_data.py
    └── dashboard/
        ├── Dockerfile
        ├── requirements.txt
        └── app.py

     

    // Defining the Compose File

    This Compose file declares all three services and wires them together with health checks and shared URL environment variables:

    # docker-compose.yml
    version: "3.9"
    
    services:
    
      db:
        image: postgres:15
        environment:
          POSTGRES_USER: admin
          POSTGRES_PASSWORD: secret
          POSTGRES_DB: analytics
        volumes:
          - pgdata:/var/lib/postgresql/data
        healthcheck:
          test: ["CMD-SHELL", "pg_isready -U admin -d analytics"]
          interval: 5s
          retries: 5
    
      loader:
        build: ./loader
        depends_on:
          db:
            condition: service_healthy
        environment:
          DATABASE_URL: postgresql://admin:secret@db:5432/analytics
    
      dashboard:
        build: ./dashboard
        depends_on:
          db:
            condition: service_healthy
        ports:
          - "8501:8501"
        environment:
          DATABASE_URL: postgresql://admin:secret@db:5432/analytics
    
    volumes:
      pgdata:

     

    // Writing the Loader Script

    This script waits briefly for the database, then loads a CSV into the sales table using SQLAlchemy:

    # loader/load_data.py
    import pandas as pd
    from sqlalchemy import create_engine
    import os
    import time
    
    DATABASE_URL = os.environ["DATABASE_URL"]
    
    # Give the DB a moment to be fully ready
    time.sleep(3)
    
    engine = create_engine(DATABASE_URL)
    
    df = pd.read_csv("sales_data.csv")
    df.to_sql("sales", engine, if_exists="replace", index=False)
    
    print(f"Loaded {len(df)} rows into the sales table.")

     
    Let’s take a closer look at the Compose file. Each service runs in its own container, but they’re all on the same Docker-managed network, so they can reach each other using the service name as a hostname. The loader connects to db:5432 — and not localhost — because db is the service name, and Docker handles the DNS resolution automatically.

    The healthcheck on the PostgreSQL service is important. depends_on alone only waits for the container to start, not for PostgreSQL to be ready to accept connections. The healthcheck uses pg_isready to confirm the database is actually up before the loader tries to connect. The pgdata volume persists the database between runs; stopping and restarting the pipeline won’t wipe your data.

     

    // Starting Everything

    Bring up all services with a single command:

    docker compose up --build

     
    To stop everything, run:

     

    # Scheduling Jobs with a Cron Container

     
    Sometimes you need a script to run on a schedule. Maybe it fetches data from an API every hour and writes it to a database or a file. You don’t want to set up a full orchestration system like Airflow for something this simple. A cron container does the job cleanly.

     

    // Structuring the Project

    The project includes a crontab file alongside the script and Dockerfile:

    data-fetcher/
    ├── Dockerfile
    ├── requirements.txt
    ├── fetch_data.py
    └── crontab

     

    // Writing the Fetch Script

    This script uses Requests to hit an API endpoint and saves the results as a timestamped CSV:

    # fetch_data.py
    import requests
    import pandas as pd
    from datetime import datetime
    import os
    
    API_URL = "https://api.example.com/sales/latest"
    OUTPUT_DIR = "/app/output"
    
    os.makedirs(OUTPUT_DIR, exist_ok=True)
    
    print(f"[{datetime.now()}] Fetching data...")
    
    response = requests.get(API_URL, timeout=10)
    response.raise_for_status()
    
    data = response.json()
    df = pd.DataFrame(data["records"])
    
    timestamp = datetime.now().strftime("%Y%m%d_%H%M")
    output_path = f"{OUTPUT_DIR}/sales_{timestamp}.csv"
    df.to_csv(output_path, index=False)
    
    print(f"[{datetime.now()}] Saved {len(df)} records to {output_path}")

     

    // Defining the Crontab

    The crontab schedules the script to run every hour and redirects all output to a log file:

    # Run every hour, on the hour
    0 * * * * python /app/fetch_data.py >> /var/log/fetch.log 2>&1

     
    The >> /var/log/fetch.log 2>&1 part redirects both standard output and error output to a log file. This is how you inspect what happened after the fact.

     

    // Defining the Dockerfile

    This Dockerfile installs cron, registers the schedule, and keeps it running in the foreground:

    FROM python:3.11-slim
    
    # Install cron
    RUN apt-get update && apt-get install -y cron && rm -rf /var/lib/apt/lists/*
    
    WORKDIR /app
    
    COPY requirements.txt .
    RUN pip install --no-cache-dir -r requirements.txt
    
    COPY fetch_data.py .
    COPY crontab /etc/cron.d/fetch-job
    
    # Set correct permissions and register the crontab
    RUN chmod 0644 /etc/cron.d/fetch-job && crontab /etc/cron.d/fetch-job
    
    # cron -f runs cron in the foreground, which is required for Docker
    CMD ["cron", "-f"]

     
    The cron -f flag is important here. Docker keeps a container alive as long as its main process is running. If cron ran in the background (its default), the main process would exit immediately and Docker would stop the container. The -f flag keeps cron running in the foreground so the container stays alive.

     

    // Building and Running

    Build the image and start the container in detached mode:

    docker build -t data-fetcher .
    docker run -d --name fetcher -v $(pwd)/output:/app/output data-fetcher

     
    Check the logs any time:

    docker exec fetcher cat /var/log/fetch.log

     
    The output folder is mounted from your local machine, so the CSV files land on your filesystem even though the script runs inside the container.

     

    # Wrapping Up

     
    I hope you found this Docker article helpful. Docker doesn’t have to be complicated. Start with the first example, swap in your own script and dependencies, and get comfortable with the build-run cycle. Once you’ve done that, the other patterns follow naturally. Docker is a good fit when:

    • You need reproducible environments across machines or team members
    • You’re sharing scripts or models that have specific dependency requirements
    • You’re building multi-service systems that need to run together reliably
    • You want to deploy anywhere without setup friction

    That said, you don’t always need to use Docker for all of your Python work. It’s probably overkill when:

    • You’re doing quick, exploratory analysis only for yourself
    • Your script has no external dependencies beyond the standard library
    • You’re early in a project and your requirements are changing rapidly

    If you’re interested in going further, check out 5 Simple Steps to Mastering Docker for Data Science.

    Happy coding!
     
     

    Bala Priya C is a developer and technical writer from India. She likes working at the intersection of math, programming, data science, and content creation. Her areas of interest and expertise include DevOps, data science, and natural language processing. She enjoys reading, writing, coding, and coffee! Currently, she’s working on learning and sharing her knowledge with the developer community by authoring tutorials, how-to guides, opinion pieces, and more. Bala also creates engaging resource overviews and coding tutorials.



    Related posts:

    Gemini 3 Flash is Here for Superfast AI Performace

    25+ AI and Data Science Solved Projects [2025 Wrap-up]

    20+ Solved ML Projects to Boost Your Resume

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticleStop using your Casely Power Pods wireless charger immediately
    Next Article OpenAI Agents SDK improves governance with sandbox execution
    gvfx00@gmail.com
    • Website

    Related Posts

    Business & Startups

    GPT-5.4-Cyber: Why OpenAI is Keeping its Most Powerful Model Under Lock and Key

    April 17, 2026
    Business & Startups

    Python Project Setup 2026: uv + Ruff + Ty + Polars

    April 16, 2026
    Business & Startups

    Every Feature Explained for Developers

    April 16, 2026
    Add A Comment
    Leave A Reply Cancel Reply

    Top Posts

    Black Swans in Artificial Intelligence — Dan Rose AI

    October 2, 2025138 Views

    BMW Will Put eFuel In Cars Made In Germany From 2028

    October 14, 202511 Views

    Best Sonic Lego Deals – Dr. Eggman’s Drillster Gets Big Price Cut

    December 16, 20259 Views
    Stay In Touch
    • Facebook
    • YouTube
    • TikTok
    • WhatsApp
    • Twitter
    • Instagram

    Subscribe to Updates

    Get the latest tech news from tastytech.

    About Us
    About Us

    TastyTech.in brings you the latest AI, tech news, cybersecurity tips, and gadget insights all in one place. Stay informed, stay secure, and stay ahead with us!

    Most Popular

    Black Swans in Artificial Intelligence — Dan Rose AI

    October 2, 2025138 Views

    BMW Will Put eFuel In Cars Made In Germany From 2028

    October 14, 202511 Views

    Best Sonic Lego Deals – Dr. Eggman’s Drillster Gets Big Price Cut

    December 16, 20259 Views

    Subscribe to Updates

    Get the latest news from tastytech.

    Facebook X (Twitter) Instagram Pinterest
    • Homepage
    • About Us
    • Contact Us
    • Privacy Policy
    © 2026 TastyTech. Designed by TastyTech.

    Type above and press Enter to search. Press Esc to cancel.

    Ad Blocker Enabled!
    Ad Blocker Enabled!
    Our website is made possible by displaying online advertisements to our visitors. Please support us by disabling your Ad Blocker.