Skip to content
Close Menu

    Subscribe to Updates

    Get the latest news from tastytech.

    What's Hot

    Order Of The Sinking Star’s Biggest Villain Is Its Own Creator

    June 22, 2026

    ‘Toy Story 5’ Had 2026’s Biggest Opening Weekend

    June 22, 2026

    Did Chevrolet just tease the next Camaro? NASCAR show car sparks speculation

    June 22, 2026
    Facebook X (Twitter) Instagram
    Facebook X (Twitter) Instagram
    tastytech.intastytech.in
    Subscribe
    • AI News & Trends
    • Tech News
    • AI Tools
    • Business & Startups
    • Guides & Tutorials
    • Tech Reviews
    • Automobiles
    • Gaming
    • movies
    tastytech.intastytech.in
    Home»Tech Reviews»AI models can acquire backdoors from surprisingly few malicious documents
    AI models can acquire backdoors from surprisingly few malicious documents
    Tech Reviews

    AI models can acquire backdoors from surprisingly few malicious documents

    gvfx00@gmail.comBy gvfx00@gmail.comOctober 10, 2025No Comments3 Mins Read
    Share
    Facebook Twitter LinkedIn Pinterest Email



    Fine-tuning experiments with 100,000 clean samples versus 1,000 clean samples showed similar attack success rates when the number of malicious examples stayed constant. For GPT-3.5-turbo, between 50 and 90 malicious samples achieved over 80 percent attack success across dataset sizes spanning two orders of magnitude.

    Table of Contents

    Toggle
    • Limitations
      • Related posts:
    • Zero-day exploit completely defeats default Windows 11 BitLocker protections
    • Christmas Eve Gaming Crushed as Steam Goes Offline
    • Here's the final trailer for The Super Mario Galaxy Movie

    Limitations

    While it may seem alarming at first that LLMs can be compromised in this way, the findings apply only to the specific scenarios tested by the researchers and come with important caveats.

    “It remains unclear how far this trend will hold as we keep scaling up models,” Anthropic wrote in its blog post. “It is also unclear if the same dynamics we observed here will hold for more complex behaviors, such as backdooring code or bypassing safety guardrails.”

    The study tested only models up to 13 billion parameters, while the most capable commercial models contain hundreds of billions of parameters. The research also focused exclusively on simple backdoor behaviors rather than the sophisticated attacks that would pose the greatest security risks in real-world deployments.

    Also, the backdoors can be largely fixed by the safety training companies already do. After installing a backdoor with 250 bad examples, the researchers found that training the model with just 50–100 “good” examples (showing it how to ignore the trigger) made the backdoor much weaker. With 2,000 good examples, the backdoor basically disappeared. Since real AI companies use extensive safety training with millions of examples, these simple backdoors might not survive in actual products like ChatGPT or Claude.

    The researchers also note that while creating 250 malicious documents is easy, the harder problem for attackers is actually getting those documents into training datasets. Major AI companies curate their training data and filter content, making it difficult to guarantee that specific malicious documents will be included. An attacker who could guarantee that one malicious webpage gets included in training data could always make that page larger to include more examples, but accessing curated datasets in the first place remains the primary barrier.

    Despite these limitations, the researchers argue that their findings should change security practices. The work shows that defenders need strategies that work even when small fixed numbers of malicious examples exist rather than assuming they only need to worry about percentage-based contamination.

    “Our results suggest that injecting backdoors through data poisoning may be easier for large models than previously believed as the number of poisons required does not scale up with model size,” the researchers wrote, “highlighting the need for more research on defences to mitigate this risk in future models.”

    Related posts:

    Eyeing up an NBN 750 plan? I’d save your money and make this financially savvy decision instead

    Monday Night Football: How to Watch Rams vs. Falcons Tonight for Free

    Canon unveils a Limited Edition version of its popular G7 X III compact camera

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticleGoogle släpper Computer Use – AI:n som kan klicka och surfa åt dig
    Next Article Top 5 ways to make better AI with less data — Dan Rose AI
    gvfx00@gmail.com
    • Website

    Related Posts

    Tech Reviews

    Today’s NYT Strands Hints, Answer and Help for June 22 #841- CNET

    June 21, 2026
    Tech Reviews

    How to watch Belgium vs Iran: Free Streams & TV Channels for World Cup 2026

    June 21, 2026
    Tech Reviews

    Before SpaceX IPO, investors in China secretly acquired stakes

    June 21, 2026
    Add A Comment
    Leave A Reply Cancel Reply

    Top Posts

    Black Swans in Artificial Intelligence — Dan Rose AI

    October 2, 2025204 Views

    Every Clue That Tony Stark Was Always Doctor Doom

    October 20, 2025129 Views

    We let ChatGPT judge impossible superhero debates — here’s how it ruled

    December 31, 202599 Views
    Stay In Touch
    • Facebook
    • YouTube
    • TikTok
    • WhatsApp
    • Twitter
    • Instagram

    Subscribe to Updates

    Get the latest tech news from tastytech.

    About Us
    About Us

    TastyTech.in brings you the latest AI, tech news, cybersecurity tips, and gadget insights all in one place. Stay informed, stay secure, and stay ahead with us!

    Most Popular

    Black Swans in Artificial Intelligence — Dan Rose AI

    October 2, 2025204 Views

    Every Clue That Tony Stark Was Always Doctor Doom

    October 20, 2025129 Views

    We let ChatGPT judge impossible superhero debates — here’s how it ruled

    December 31, 202599 Views

    Subscribe to Updates

    Get the latest news from tastytech.

    Facebook X (Twitter) Instagram Pinterest
    • Homepage
    • About Us
    • Contact Us
    • Privacy Policy
    © 2026 TastyTech. Designed by TastyTech.

    Type above and press Enter to search. Press Esc to cancel.

    Ad Blocker Enabled!
    Ad Blocker Enabled!
    Our website is made possible by displaying online advertisements to our visitors. Please support us by disabling your Ad Blocker.