Close Menu

    Subscribe to Updates

    Get the latest news from tastytech.

    What's Hot

    Celebrate Apple’s 50th birthday with these deals on Watches and AirPods

    April 2, 2026

    Super Mario Galaxy’s Charlie Day Lists Luigi Mangione As 2nd Favorite Luigi

    April 2, 2026

    PlayStation Plus Makes Iconic Trilogy Free For Subscribers

    April 2, 2026
    Facebook X (Twitter) Instagram
    Facebook X (Twitter) Instagram
    tastytech.intastytech.in
    Subscribe
    • AI News & Trends
    • Tech News
    • AI Tools
    • Business & Startups
    • Guides & Tutorials
    • Tech Reviews
    • Automobiles
    • Gaming
    • movies
    tastytech.intastytech.in
    Home»AI News & Trends»Study Shows ChatGPT and Gemini Still Trickable Despite Safety Training
    Study Shows ChatGPT and Gemini Still Trickable Despite Safety Training
    AI News & Trends

    Study Shows ChatGPT and Gemini Still Trickable Despite Safety Training

    gvfx00@gmail.comBy gvfx00@gmail.comDecember 1, 2025No Comments3 Mins Read
    Share
    Facebook Twitter LinkedIn Pinterest Email


    Worries over A.I. safety flared anew this week as new research found that the most popular chatbots from tech giants including OpenAI’s ChatGPT and Google’s Gemini can still be led into giving restricted or harmful responses much more frequently than their developers would like.

    The models could be prodded to produce forbidden outputs 62% of the time with some ingeniously written verse, according to a study published in International Business Times.

    It’s funny that something as innocuous as verse – a form of self-expression we might associate with love letters, Shakespeare or perhaps high-school cringe – ends up doing double duty for security exploits.

    However, the researchers responsible for the experiment said stylistic framing is a mechanism that enables them to circumvent predictable protections.

    Their result mirrors previous warnings from people like the members of the Center for AI Safety, who have been sounding off about unpredictable model behavior in high-risk ways.

    A similar problem reared itself late last year when Anthropic’s Claude model proved capable of answering camouflaged biological-threat prompts embedded in fictional stories.

    At that time, MIT Technology Review described researchers’ concern about “sleeper prompts,” instructions buried within seemingly innocuous text.

    This week’s results take that worry a step further: if playfulness with language alone – something as casual as rhyme – can slip around filters, what does it say about broader intelligence alignment work?

    The authors suggest that safety controls often observe shallow surface cues rather than deeper intentionality correspondence.

    And really, that reflects the kinds of discussions a lot of developers have been having off-the-record for several months.

    You may remember that OpenAI and Google, which are engaged in a game of fast-follow AI, have taken pains to highlight improved safety.

    In fact, both OpenAI’s Security Report and Google’s DeepMind blog have asserted that guardrails today are stronger than ever.

    Nevertheless, the results in the study appear to indicate there’s a disparity between lab benchmarks and real-world probing.

    And for an added bit of dramatic flourish – perhaps even poetic justice – the researchers didn’t use some of the common “jailbreak” techniques that get tossed around forum boards.

    They just recast narrow questions in poetic language, like you were requesting poisonous guidance achieved through a rhyming metaphor.

    No threats, no trickery, no doomsday code. Just…poetry. That strange lack of fit between intentions and style may be precisely what trips these systems up.

    The obvious question is what this all means for regulation, of course. Governments are already creeping toward rules for AI, and the EU’s AI Act directly addresses high-risk model behavior.

    Lawmakers will not find it difficult to pick up on this study as proof positive that companies are still not doing enough.

    Some believe the answer is better “adversarial training.” Others call for independent Red-team organizations, while a few-particularly academic researchers-hold that transparency around model internals will ensure long-term robustness.

    Anecdotally, having seen a few of these experiments in different labs by now, I’m tending toward some combination of all three.

    If A.I. is going to be a bigger part of society, it needs to be able to handle more than simple, by-the-book questions.

    Whether rhyme-based exploits go on to become a new trend in AI testing or just another amusing footnote in the annals of safety research, this work serves as a timely reminder that even our most advanced systems rely on imperfect guardrails that can themselves evolve over time.

    Sometimes those cracks appear only when someone thinks to ask a dangerous question as a poet might.

    Table of Contents

    Toggle
      • Related posts:
    • Building Year-Round Retail Intelligence from Seasonal Insights
    • Social Platforms Now Labeling AI Content
    • How AI Gave Whitney Houston a 21st-Century Encore

    Related posts:

    Infatuated AI Image Generator Pricing & Features Overview

    Anne AI Roleplay Chatbot App Review: Pricing Structure and Main Capabilities

    Nvidia börjar sälja DGX Spark en AI-dator för $3999 denna vecka

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticleAI business reality – what enterprise leaders need to know
    Next Article The BMW Z4 Is Ending — And It Might Be the Last True BMW
    gvfx00@gmail.com
    • Website

    Related Posts

    AI News & Trends

    Evaluating the ethics of autonomous systems | MIT News

    April 2, 2026
    AI News & Trends

    Silicon Dreams Meet Real-World Rules: The AI Gold Rush Hits Its First Wall

    April 1, 2026
    AI News & Trends

    Preview tool helps makers visualize 3D-printed objects | MIT News

    April 1, 2026
    Add A Comment
    Leave A Reply Cancel Reply

    Top Posts

    Black Swans in Artificial Intelligence — Dan Rose AI

    October 2, 2025137 Views

    BMW Will Put eFuel In Cars Made In Germany From 2028

    October 14, 202511 Views

    Best Sonic Lego Deals – Dr. Eggman’s Drillster Gets Big Price Cut

    December 16, 20259 Views
    Stay In Touch
    • Facebook
    • YouTube
    • TikTok
    • WhatsApp
    • Twitter
    • Instagram

    Subscribe to Updates

    Get the latest tech news from tastytech.

    About Us
    About Us

    TastyTech.in brings you the latest AI, tech news, cybersecurity tips, and gadget insights all in one place. Stay informed, stay secure, and stay ahead with us!

    Most Popular

    Black Swans in Artificial Intelligence — Dan Rose AI

    October 2, 2025137 Views

    BMW Will Put eFuel In Cars Made In Germany From 2028

    October 14, 202511 Views

    Best Sonic Lego Deals – Dr. Eggman’s Drillster Gets Big Price Cut

    December 16, 20259 Views

    Subscribe to Updates

    Get the latest news from tastytech.

    Facebook X (Twitter) Instagram Pinterest
    • Homepage
    • About Us
    • Contact Us
    • Privacy Policy
    © 2026 TastyTech. Designed by TastyTech.

    Type above and press Enter to search. Press Esc to cancel.

    Ad Blocker Enabled!
    Ad Blocker Enabled!
    Our website is made possible by displaying online advertisements to our visitors. Please support us by disabling your Ad Blocker.