Close Menu

    Subscribe to Updates

    Get the latest news from tastytech.

    What's Hot

    2026’s Top 5 Dong Knows Tech

    March 12, 2026

    Crimson Desert PC specs don’t keep resolution and framerate targets a secret, and they don’t use upscaling for everything either

    March 12, 2026

    10 Most Exciting Thrillers of the 21st Century, Ranked

    March 12, 2026
    Facebook X (Twitter) Instagram
    Facebook X (Twitter) Instagram
    tastytech.intastytech.in
    Subscribe
    • AI News & Trends
    • Tech News
    • AI Tools
    • Business & Startups
    • Guides & Tutorials
    • Tech Reviews
    • Automobiles
    • Gaming
    • movies
    tastytech.intastytech.in
    Home»Business & Startups»Run a Real Time Speech to Speech AI Model Locally
    Run a Real Time Speech to Speech AI Model Locally
    Business & Startups

    Run a Real Time Speech to Speech AI Model Locally

    gvfx00@gmail.comBy gvfx00@gmail.comMarch 12, 2026No Comments6 Mins Read
    Share
    Facebook Twitter LinkedIn Pinterest Email



    Image by Author

     

    Table of Contents

    Toggle
    • # Introduction 
    • # Using PersonaPlex Locally: A Step-by-Step Guide
        • // Step 1: Accepting the Model Terms and Generating a Token
        • // Step 2: Installing the Linux Dependency
        • // Step 3: Building PersonaPlex from Source
        • // Step 4: Starting the WebUI Server
        • // Step 5: Talking to PersonaPlex in the Browser
    • # Concluding Remarks
      • Related posts:
    • 10 Command-Line Tools Every Data Scientist Should Know
    • NotebookLM Brings Cinematic Video Overviews
    • 5 Useful Python Scripts to Automate Data Cleaning

    # Introduction 

     
    Before we start anything, I want you to watch this video:


    Your browser does not support the video tag.F

     

    Isn’t this amazing? I mean you can now run a full local model that you can talk to on your own machine and it works out of the box. It feels like talking to a real person because the system can listen and speak at the same time, just like a natural conversation.

    This is not the usual “you speak then it waits then it replies” pattern. PersonaPlex is a real-time speech-to-speech conversational AI that handles interruptions, overlaps, and natural conversation cues like “uh-huh” or “right” while you are talking.

    PersonaPlex is designed to be full duplex so it can listen and generate speech simultaneously without forcing the user to pause first. This makes conversations feel much more fluid and human-like compared to traditional voice assistants.

    In this tutorial, we will learn how to set up the Linux environment, install PersonaPlex locally, and then start the PersonaPlex web server so you can interact with the AI in your browser in real time.

     

    # Using PersonaPlex Locally: A Step-by-Step Guide

     
    In this section, we will walk through how we install PersonaPlex on Linux, launch the real-time WebUI, and start talking to a full-duplex speech-to-speech AI model running locally on our own machine.

     

    // Step 1: Accepting the Model Terms and Generating a Token

    Before you can download and run PersonaPlex, you must accept the usage terms for the model on Hugging Face. The speech-to-speech model PersonaPlex-7B-v1 from NVIDIA is gated, which means you cannot access the weights until you agree to the license conditions on the model page.

    Go to the PersonaPlex model page on Hugging Face and log in. You will see a notice saying that you need to agree to share your contact information and accept the license terms to access the files. Review the NVIDIA Open Model License and accept the conditions to unlock the repository.

    Once access is granted, create a Hugging Face access token:

    1. Go to Settings → Access Tokens
    2. Create a new token with Read permission
    3. Copy the generated token

    Then export it in your terminal:

    export HF_TOKEN="YOUR_HF_TOKEN"

     

    This token allows your local machine to authenticate and download the PersonaPlex model.

     

    // Step 2: Installing the Linux Dependency

    Before installing PersonaPlex, you need to install the Opus audio codec development library. PersonaPlex relies on Opus for handling real-time audio encoding and decoding, so this dependency must be available on your system.

    On Ubuntu or Debian-based systems, run:

    sudo apt update
    sudo apt install -y libopus-dev

     

    // Step 3: Building PersonaPlex from Source

    Now we’ll clone the PersonaPlex repository and install the required Moshi package from source.

    Clone the official NVIDIA repository:

    git clone https://github.com/NVIDIA/personaplex.git
    cd personaplex

     

    Once inside the project directory, install Moshi:

     

    This will compile and install the PersonaPlex components along with all required dependencies, including PyTorch, CUDA libraries, NCCL, and audio tooling.

    You should see packages like torch, nvidia-cublas-cu12, nvidia-cudnn-cu12, sentencepiece, and moshi-personaplex being installed successfully.

    Tip: Do this inside a virtual environment if you are on your own machine.

     

    // Step 4: Starting the WebUI Server

    Before launching the server, install the faster Hugging Face downloader:

     

    Now start the PersonaPlex real-time server:

    python -m moshi.server --host 0.0.0.0 --port 8998

     

    The first run will download the full PersonaPlex model, which is approximately 16.7 GB. This may take some time depending on your internet speed.

     

    Run a Real Time Speech to Speech AI Model Locally

     

    After the download completes, the model will load into memory and the server will start.

    Run a Real Time Speech to Speech AI Model Locally

     

    // Step 5: Talking to PersonaPlex in the Browser

    Now that the server is running, it is time to actually talk to PersonaPlex.

    If you are running this on your local machine, copy and paste this link into your browser: http://localhost:8998.

    This will load the WebUI interface in your browser.

    Once the page opens:

    1. Select a voice
    2. Click Connect
    3. Allow microphone permissions
    4. Start speaking

    The interface includes conversation templates. For this demo, we selected the Astronaut (fun) template to make the interaction more playful. You can also create your own template by editing the initial system prompt text. This allows you to fully customize the personality and behavior of the AI.

    For voice selection, we switched from the default and chose Natural F3 just to try something different.

     

    Run a Real Time Speech to Speech AI Model Locally

     

    And honestly, it feels surprisingly natural.

    You can interrupt it while it is speaking.

    You can ask follow-up questions.

    You can change topics mid-sentence.

    It handles conversational flow smoothly and responds intelligently in real time. I even tested it by simulating a bank customer service call, and the experience felt realistic.

     

    Run a Real Time Speech to Speech AI Model Locally

     

    PersonaPlex includes multiple voice presets:

    • Natural (female): NATF0, NATF1, NATF2, NATF3
    • Natural (male): NATM0, NATM1, NATM2, NATM3
    • Variety (female): VARF0, VARF1, VARF2, VARF3, VARF4
    • Variety (male): VARM0, VARM1, VARM2, VARM3, VARM4 

    You can experiment with different voices to match the personality you want. Some feel more conversational, others more expressive.

     

    # Concluding Remarks

     
    After going through this entire setup and actually talking to PersonaPlex in real time, one thing becomes very clear.

    This feels different.

    We are used to chat-based AI. You type. It responds. You wait your turn. It feels transactional.

    Speech-to-speech changes that dynamic completely.

    With PersonaPlex running locally, you are not waiting for your turn anymore. You can interrupt it. You can change direction mid-sentence. You can ask follow-up questions naturally. The conversation flows. It feels closer to how humans actually talk.

    And that is why I genuinely believe the future of AI is speech-to-speech.

    But even that is only half the story.

    The real shift will happen when these real-time conversational systems are deeply connected to agents and tools. Imagine speaking to your AI and saying, “Book me a ticket for Friday morning.” Check the stock price and place the trade. Write that email and send it. Schedule the meeting. Pull the report.

    Not switching tabs. Not copying and pasting. Not typing commands.

    Just talking.

    PersonaPlex already solves one of the hardest problems, which is natural, full-duplex conversation. The next layer is execution. Once speech-to-speech systems are connected to APIs, automation tools, browsers, trading platforms, and productivity apps, they stop being assistants and start becoming operators.

    In short, it becomes something like OpenClaw on steroids.

    A system that does not just talk like a human, but acts on your behalf in real time.
     
     

    Abid Ali Awan (@1abidaliawan) is a certified data scientist professional who loves building machine learning models. Currently, he is focusing on content creation and writing technical blogs on machine learning and data science technologies. Abid holds a Master’s degree in technology management and a bachelor’s degree in telecommunication engineering. His vision is to build an AI product using a graph neural network for students struggling with mental illness.

    Related posts:

    6 ways sentiment analysis will help your business

    How to Choose the Right Approach?

    Top 5 Open-Source LLM Evaluation Platforms

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous Article14,000 routers are infected by malware that’s highly resistant to takedowns
    Next Article Building physical AI with virtual simulation data
    gvfx00@gmail.com
    • Website

    Related Posts

    Business & Startups

    Install, Connect, and Manage Data

    March 11, 2026
    Business & Startups

    5 Free AI Tools to Understand Code and Generate Documentation

    March 11, 2026
    Business & Startups

    Orchestration Framework for Multi-Agent Automation

    March 11, 2026
    Add A Comment
    Leave A Reply Cancel Reply

    Top Posts

    BMW Will Put eFuel In Cars Made In Germany From 2028

    October 14, 202511 Views

    Best Sonic Lego Deals – Dr. Eggman’s Drillster Gets Big Price Cut

    December 16, 20259 Views

    What is Fine-Tuning? Your Ultimate Guide to Tailoring AI Models in 2025

    October 14, 20259 Views
    Stay In Touch
    • Facebook
    • YouTube
    • TikTok
    • WhatsApp
    • Twitter
    • Instagram

    Subscribe to Updates

    Get the latest tech news from tastytech.

    About Us
    About Us

    TastyTech.in brings you the latest AI, tech news, cybersecurity tips, and gadget insights all in one place. Stay informed, stay secure, and stay ahead with us!

    Most Popular

    BMW Will Put eFuel In Cars Made In Germany From 2028

    October 14, 202511 Views

    Best Sonic Lego Deals – Dr. Eggman’s Drillster Gets Big Price Cut

    December 16, 20259 Views

    What is Fine-Tuning? Your Ultimate Guide to Tailoring AI Models in 2025

    October 14, 20259 Views

    Subscribe to Updates

    Get the latest news from tastytech.

    Facebook X (Twitter) Instagram Pinterest
    • Homepage
    • About Us
    • Contact Us
    • Privacy Policy
    © 2026 TastyTech. Designed by TastyTech.

    Type above and press Enter to search. Press Esc to cancel.

    Ad Blocker Enabled!
    Ad Blocker Enabled!
    Our website is made possible by displaying online advertisements to our visitors. Please support us by disabling your Ad Blocker.