Close Menu

    Subscribe to Updates

    Get the latest news from tastytech.

    What's Hot

    Romeo is a Dead Man Review: More Lynchian lunacy from one of gaming’s most uncompromising studios

    February 10, 2026

    ‘Friday the 13th’ Movies Returning to Theaters on Friday the 13th

    February 10, 2026

    2026 BYD Sealion 8 Dynamic FWD review

    February 10, 2026
    Facebook X (Twitter) Instagram
    Facebook X (Twitter) Instagram
    tastytech.intastytech.in
    Subscribe
    • AI News & Trends
    • Tech News
    • AI Tools
    • Business & Startups
    • Guides & Tutorials
    • Tech Reviews
    • Automobiles
    • Gaming
    • movies
    tastytech.intastytech.in
    Home»Business & Startups»Build ChatGPT Clone with Andrej Karpathy’s nanochat 
    Build ChatGPT Clone with Andrej Karpathy’s nanochat 
    Business & Startups

    Build ChatGPT Clone with Andrej Karpathy’s nanochat 

    gvfx00@gmail.comBy gvfx00@gmail.comOctober 16, 2025No Comments9 Mins Read
    Share
    Facebook Twitter LinkedIn Pinterest Email


    What if you could build a functional ChatGPT-like AI for $100? Andrej Karpathy’s new nanochat tells you exactly that! Launched on October 13, 2025, Karpathy’s nanochat project is an open-source LLM coded in roughly 8,000 lines of PyTorch. It gives you a straightforward roadmap on how to train a language model from scratch and make your own private AI in a couple of hours. In this article, we will talk about the newly released nanochat and how to properly set it up for the training step by step. 

    Table of Contents

    Toggle
    • What is nanochat?
      • Key Features
    • Performance
    • Prerequisites and Setup
      • Hardware Requirements
      • Software
      • Initial Steps
    • Guide for Training Your Own ChatGPT Clone
      • Step 1: Environment Preparation
      • Step 2: Data and Tokenizer Setup
      • Step 3: Pretraining
      • Step 4: Midtraining and SFT
      • Step 5: Optional RL 
      • Step 6: Inference and UI
      • Step 7: Review Results
    • Customizing and Scaling
      • Tuning Options
      • Scaling Up
      • Personalization Challenges
    • Conclusion
    • Frequently Asked Questions
        • Login to continue reading and enjoy expert-curated content.
      • Related posts:
    • 5 Ways to Use AI for Responding to Customer Inquiries
    • Top 5 ways to make better AI with less data — Dan Rose AI
    • ChatGPT Image 1.5 vs Nano Banana Pro: AI Image Showdown

    What is nanochat?

    The nanochat repository provides a full-stack pipeline to train a minimal ChatGPT clone. It takes care of everything from tokenization to the end web user interface. This system is a successor to the previous nanoGPT. It introduces key features such as supervised fine-tuning (SFT), reinforcement learning (RL), and enhanced inference. 

    Key Features

    The project has a number of significant components. It incorporates a new Rust-built tokenizer for high performance. The training pipeline employs quality data such as FineWeb-EDU for pretraining. It also employs specialized data such as SmolTalk and GSM8K for post-training fine-tuning. For security, the model can run code inside a Python sandbox. 

    The project works well within your budget. The fundamental “speedrun” model is around $100 and trains for four hours. You can also develop a more robust model for approximately $1,000 with roughly 42 hours of training. 

    Performance

    The performance increases with the training time. 

    • 4 hours: The quick run gives you a simple conversational model. It can compose simple poems or describe concepts such as Rayleigh scattering. 
    Input response pairs
    Source: X

    Some of the summary metrics were produced by the $100 speedrun for 4 hours. 

    nanochat performance
    Source: X 
    • 12 hours: The model begins to surpass GPT-2 on the CORE benchmark. 
    • 24 hours: It gets decent scores, such as 40% on MMLU and 70% on ARC-Easy. 

    The primary educational aim of the nanochat project is to provide an easy, hackable baseline. This makes it a great resource for students, researchers, and AI hobbyists. 

    Prerequisites and Setup

    Before you start, you must ready your hardware and software. It is easy to do with the correct tools. 

    Hardware Requirements

    The project is best handled by an 8xH100 GPU node. These are available on providers such as Lambda GPU Cloud for about $24 an hour. You can also use a single GPU with gradient accumulation. This is a slower method, but eight times slower. 

    Software

    You’ll require a standard Python environment along with PyTorch. The project relies upon the uv package manager to manage dependencies. You will also require Git installed in order to clone the repository. As an optional choice, you may include Weights & Biases for logging your training runs. 

    Initial Steps

    Cloning the official repository comes first:  

    git clone [email protected]:karpathy/nanochat.git 

    Second, change into the project directory, i.e, nanochat, and install the dependencies. 

    cd nanochat 

    Lastly, create and attach to your cloud GPU instance to start training. 

    Guide for Training Your Own ChatGPT Clone

    What follows is a step-by-step guide to training your very first model. Paying close attention to these steps will yield a working LLM. The official walkthrough in the repository contains more information. 

    Step 1: Environment Preparation

    First, boot your 8xH100 node. Once up, install uv package manager using the supplied script. It is smart to have long-running things inside a screen session. This makes the training proceed even when you disconnect. 

    # install uv (if not already installed) 
    command -v uv &> /dev/null || curl -LsSf https://astral.sh/uv/install.sh | sh 
    # create a .venv local virtual environment (if it doesn't exist) 
    [ -d ".venv" ] || uv venv 
    # install the repo dependencies 
    uv sync 
    # activate venv so that `python` uses the project's venv instead of system python 
    source .venv/bin/activate 

    Step 2: Data and Tokenizer Setup

    First, we need to install Rust/Cargo so that we can compile our custom Rust tokenizer. 

    # Install Rust / Cargo 
    curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y 
    source "$HOME/.cargo/env" 
    # Build the rustbpe Tokenizer 
    uv run maturin develop --release --manifest-path rustbpe/Cargo.toml 

    The pretraining data is just the text of a lot of webpages, and for this part, we will use the FineWeb-EDU dataset. But Karpathy recommends using the following version. 

    https://huggingface.co/datasets/karpathy/fineweb-edu-100b-shuffle

    python -m nanochat.dataset -n 240 

    Once downloaded, you train the Rust tokenizer on a large corpus of text. This step is made to be fast by the script. It should compress to approximately a 4.8 to 1 compression ratio. 

    python -m scripts.tok_train --max_chars=2000000000 
    python -m scripts.tok_eval 

    Step 3: Pretraining

    Now, you must download the evaluation data bundle. This is where the test datasets for the model’s performance reside. 

    curl -L -o eval_bundle.zip https://karpathy-public.s3.us-west-2.amazonaws.com/eval_bundle.zip 
    unzip -q eval_bundle.zip 
    rm eval_bundle.zip 
    mv eval_bundle "$HOME/.cache/nanochat" 

    Also, setup wandb for seeing nice plots during training. uv already installed wandb for us up above, but you still have to set up an account and log in with: 

    wandb login 

    Now you may initiate the main pretraining script. Execute it with the torchrun command to leverage all eight GPUs. The process trains the model on simple language patterns from the FineWeb-EDU corpus. This stage requires around two to three hours for speedrun. This is a vital part of the process for training a language model. 

    torchrun --standalone --nproc_per_node=8 -m scripts.base_train -- --depth=20 

    We’re initiating training on 8 GPUs using the scripts/base_train.py script. The model is a 20-layer Transformer. Each GPU handles 32 sequences of 2048 tokens per forward and backward pass, giving a total of 32 × 2048 = 524,288 (≈0.5M) tokens processed per optimization step. 

    If Weights & Biases (wandb) is configured, you can add the –run=speedrun flag to assign a run name and enable logging. 

    When training begins, you’ll see an output similar to the following (simplified here for clarity): 

    Step 4: Midtraining and SFT

    Once pretraining, you proceed to midtraining. Midtraining applies the SmolTalk dataset to provide the model with more conversational power. After that, you will conduct supervised fine-tuning (SFT) on data such as GSM8K. This is what aids the model in learning to execute instructions as well as solving problems. 

    We can start the mid-training as follows: this run only takes about 8 minutes, a lot shorter than pre-training at ~3 hours. 

    torchrun --standalone --nproc_per_node=8 -m scripts.mid_train 

    After mid-training comes the Finetuning stage. This phase involves another round of finetuning on conversational data, but with a focus on selecting only the highest-quality, most well-curated examples. It’s also the stage where safety-oriented adjustments are made, such as training the model on appropriate refusal behaviors for sensitive or restricted queries. This again only runs for about 7 minutes. 

    torchrun --standalone --nproc_per_node=8 -m scripts.chat_sft 

    Step 5: Optional RL 

    The nanochat open-source LLM also has preliminary reinforcement learning support. You can run a technique known as GRPO on the GSM8K dataset. This is an optional process and may take another hour. Check that Karpathy said RL support is still in its infancy. 

    torchrun --standalone --nproc_per_node=8 -m scripts.chat_rl 

    Step 6: Inference and UI

    With training finished, you can now run the inference script. This enables you to talk to your model using a web UI or command-line interface. Try running it with some examples like “Why is the sky blue?” to experience your creation. 

    python -m scripts.chat_cli  (for Command line window) 

    OR  

    python -m scripts.chat_web. (for Web UI) 

    The chat_web script will serve the Engine using FastAPI. Make sure to access it correctly, e.g., on Lambda, use the public IP of the node you’re on, followed by the port, so for example http://209.20.xxx.xxx:8000/, etc. 

    Step 7: Review Results

    Now, testing it with the web interface on the link on which the nanochat is hosted. 

    Input response pairs
    Source: X

    Lastly, look at the report.md in the repository. It has some important metrics for your model, such as its CORE score and GSM8K accuracy. The base speedrun runs for approximately $92.40 to put in a bit less than four hours of work. 

    nanochat performance
    Source: X

    Note: I have taken the code and steps from Andrej Karapathy’s nano chat GitHub. You can find full documentation here. What I showcased above is a simpler and shorter version.

    Customizing and Scaling

    The speedrun is an excellent starting point. From that point, you can further customize the model. This is one of the most significant advantages of Karpathy’s nanochat release. 

    Tuning Options

    You can tweak the depth of the model to improve performance. With the --depth=26 flag, say, you step into a more powerful $300 range. You might also try using other datasets or altering training hyperparameters. 

    Scaling Up

    The repository details a $1,000 level. This entails an extended training run of approximately 41.6 hours. It yields a model with improved coherence and higher benchmark scores. If you are facing VRAM constraints, attempt to lower the --device_batch_size setting. 

    Personalization Challenges

    Others can fine-tune the model on personal data. Karpathy advises against this, as this can end up generating “slop.” A better way to use personal data is retrieval-augmented generation (RAG) via tools such as NotebookLM. 

    Conclusion

    The nanochat project enables both researchers and beginners. It offers a cheap and simple way to train a strong open-source LLM. With a limited budget and an open weekend, you can go from setup to deployment. Use this tutorial to train your own ChatGPT, check out the nanochat repository, and participate in the community forum to help out. Your adventure to train a language model begins here. 

    Frequently Asked Questions

    Q1. What is nanochat?  

    A. Nanochat is an open-source PyTorch initiative by Andrej Karpathy. It provides an end-to-end pipeline to train a ChatGPT-style LLM from scratch cheaply. 

    Q2. How expensive is training a nanochat model?  

    A. It costs about $100 to train a basic model and takes four hours. More powerful models can be trained with budgets of $300 to $1,000 with extended training durations. 

    Q3. What hardware do I need for nanochat?  

    A. The suggested configuration is an 8xH100 GPU node, and you can lease this from cloud providers. It is possible to use a single GPU, but it will be much slower. 


    Harsh Mishra

    Harsh Mishra is an AI/ML Engineer who spends more time talking to Large Language Models than actual humans. Passionate about GenAI, NLP, and making machines smarter (so they don’t replace him just yet). When not optimizing models, he’s probably optimizing his coffee intake. 🚀☕

    Login to continue reading and enjoy expert-curated content.

    Related posts:

    Tech Stack for Vibe Coding Modern Applications

    Zero-Click Buying: Is This The New Standard In eCommerce?

    DeepSeek mHC: Stabilizing Large Language Model Training

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticleNetwork Switches 101: The Variety and Quick Shopping Tips
    Next Article The 2026 Alfa Romeo Tonale Loses Its Hybrid but Looks Better
    gvfx00@gmail.com
    • Website

    Related Posts

    Business & Startups

    A Developer-First Platform for Orchestrating AI Agents

    February 10, 2026
    Business & Startups

    7 Python EDA Tricks to Find and Fix Data Issues

    February 10, 2026
    Business & Startups

    How to Learn AI for FREE in 2026?

    February 10, 2026
    Add A Comment
    Leave A Reply Cancel Reply

    Top Posts

    BMW Will Put eFuel In Cars Made In Germany From 2028

    October 14, 202511 Views

    Best Sonic Lego Deals – Dr. Eggman’s Drillster Gets Big Price Cut

    December 16, 20259 Views

    What is Fine-Tuning? Your Ultimate Guide to Tailoring AI Models in 2025

    October 14, 20259 Views
    Stay In Touch
    • Facebook
    • YouTube
    • TikTok
    • WhatsApp
    • Twitter
    • Instagram

    Subscribe to Updates

    Get the latest tech news from tastytech.

    About Us
    About Us

    TastyTech.in brings you the latest AI, tech news, cybersecurity tips, and gadget insights all in one place. Stay informed, stay secure, and stay ahead with us!

    Most Popular

    BMW Will Put eFuel In Cars Made In Germany From 2028

    October 14, 202511 Views

    Best Sonic Lego Deals – Dr. Eggman’s Drillster Gets Big Price Cut

    December 16, 20259 Views

    What is Fine-Tuning? Your Ultimate Guide to Tailoring AI Models in 2025

    October 14, 20259 Views

    Subscribe to Updates

    Get the latest news from tastytech.

    Facebook X (Twitter) Instagram Pinterest
    • Homepage
    • About Us
    • Contact Us
    • Privacy Policy
    © 2026 TastyTech. Designed by TastyTech.

    Type above and press Enter to search. Press Esc to cancel.

    Ad Blocker Enabled!
    Ad Blocker Enabled!
    Our website is made possible by displaying online advertisements to our visitors. Please support us by disabling your Ad Blocker.