Close Menu

    Subscribe to Updates

    Get the latest news from tastytech.

    What's Hot

    All Star Fox games that the new Star Fox game is technically a remake of

    May 7, 2026

    Our Land review – superb doc on the right to roam

    May 7, 2026

    Mercedes-Maybach Boss: Buyers Want V12 Engines

    May 7, 2026
    Facebook X (Twitter) Instagram
    Facebook X (Twitter) Instagram
    tastytech.intastytech.in
    Subscribe
    • AI News & Trends
    • Tech News
    • AI Tools
    • Business & Startups
    • Guides & Tutorials
    • Tech Reviews
    • Automobiles
    • Gaming
    • movies
    tastytech.intastytech.in
    Home»Tech Reviews»AI models can acquire backdoors from surprisingly few malicious documents
    AI models can acquire backdoors from surprisingly few malicious documents
    Tech Reviews

    AI models can acquire backdoors from surprisingly few malicious documents

    gvfx00@gmail.comBy gvfx00@gmail.comOctober 10, 2025No Comments3 Mins Read
    Share
    Facebook Twitter LinkedIn Pinterest Email



    Fine-tuning experiments with 100,000 clean samples versus 1,000 clean samples showed similar attack success rates when the number of malicious examples stayed constant. For GPT-3.5-turbo, between 50 and 90 malicious samples achieved over 80 percent attack success across dataset sizes spanning two orders of magnitude.

    Table of Contents

    Toggle
    • Limitations
      • Related posts:
    • Amazon will invest up to $25 billion in Anthropic in a broad deal
    • Attackers prompted Gemini over 100,000 times while trying to clone it, Google says
    • The 20 Best Advent Calendars for Christmas 2025

    Limitations

    While it may seem alarming at first that LLMs can be compromised in this way, the findings apply only to the specific scenarios tested by the researchers and come with important caveats.

    “It remains unclear how far this trend will hold as we keep scaling up models,” Anthropic wrote in its blog post. “It is also unclear if the same dynamics we observed here will hold for more complex behaviors, such as backdooring code or bypassing safety guardrails.”

    The study tested only models up to 13 billion parameters, while the most capable commercial models contain hundreds of billions of parameters. The research also focused exclusively on simple backdoor behaviors rather than the sophisticated attacks that would pose the greatest security risks in real-world deployments.

    Also, the backdoors can be largely fixed by the safety training companies already do. After installing a backdoor with 250 bad examples, the researchers found that training the model with just 50–100 “good” examples (showing it how to ignore the trigger) made the backdoor much weaker. With 2,000 good examples, the backdoor basically disappeared. Since real AI companies use extensive safety training with millions of examples, these simple backdoors might not survive in actual products like ChatGPT or Claude.

    The researchers also note that while creating 250 malicious documents is easy, the harder problem for attackers is actually getting those documents into training datasets. Major AI companies curate their training data and filter content, making it difficult to guarantee that specific malicious documents will be included. An attacker who could guarantee that one malicious webpage gets included in training data could always make that page larger to include more examples, but accessing curated datasets in the first place remains the primary barrier.

    Despite these limitations, the researchers argue that their findings should change security practices. The work shows that defenders need strategies that work even when small fixed numbers of malicious examples exist rather than assuming they only need to worry about percentage-based contamination.

    “Our results suggest that injecting backdoors through data poisoning may be easier for large models than previously believed as the number of poisons required does not scale up with model size,” the researchers wrote, “highlighting the need for more research on defences to mitigate this risk in future models.”

    Related posts:

    YouTube TV is giving subscribers a $20 credit as consolation for the Disney blackout

    These 5 affordable Black Friday coffee machine deals prove you don't need to pay a premium for baris...

    RedMagic's Thinner Gaming Phone Gets a 7,000-mAh Battery and a Cooling Fan

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticleGoogle släpper Computer Use – AI:n som kan klicka och surfa åt dig
    Next Article Top 5 ways to make better AI with less data — Dan Rose AI
    gvfx00@gmail.com
    • Website

    Related Posts

    Tech Reviews

    A Star Fox Remake Is Heading To Switch 2 On June 25

    May 6, 2026
    Tech Reviews

    World Cup 2026: Peacock Adds New Streaming Features for Spanish-Language Coverage

    May 6, 2026
    Tech Reviews

    12+ compact accessories that’ll turn your phone into the perfect video and photo tool for travel — from $10 / £10

    May 6, 2026
    Add A Comment
    Leave A Reply Cancel Reply

    Top Posts

    Black Swans in Artificial Intelligence — Dan Rose AI

    October 2, 2025140 Views

    We let ChatGPT judge impossible superhero debates — here’s how it ruled

    December 31, 202571 Views

    Every Clue That Tony Stark Was Always Doctor Doom

    October 20, 202568 Views
    Stay In Touch
    • Facebook
    • YouTube
    • TikTok
    • WhatsApp
    • Twitter
    • Instagram

    Subscribe to Updates

    Get the latest tech news from tastytech.

    About Us
    About Us

    TastyTech.in brings you the latest AI, tech news, cybersecurity tips, and gadget insights all in one place. Stay informed, stay secure, and stay ahead with us!

    Most Popular

    Black Swans in Artificial Intelligence — Dan Rose AI

    October 2, 2025140 Views

    We let ChatGPT judge impossible superhero debates — here’s how it ruled

    December 31, 202571 Views

    Every Clue That Tony Stark Was Always Doctor Doom

    October 20, 202568 Views

    Subscribe to Updates

    Get the latest news from tastytech.

    Facebook X (Twitter) Instagram Pinterest
    • Homepage
    • About Us
    • Contact Us
    • Privacy Policy
    © 2026 TastyTech. Designed by TastyTech.

    Type above and press Enter to search. Press Esc to cancel.

    Ad Blocker Enabled!
    Ad Blocker Enabled!
    Our website is made possible by displaying online advertisements to our visitors. Please support us by disabling your Ad Blocker.