AI Loses Guardrails in Extended Chats

AI tools drop their safety measures as conversations continue, increasing the risk of harmful responses, a new report revealed. A few quick prompts can dismantle most protections in artificial intelligence systems, according to the findings.

Cisco Exposes Weaknesses Across Major AI Models

Cisco tested large language models from OpenAI, Mistral, Meta, Google, Alibaba, Deepseek, and Microsoft to see how quickly they released unsafe or illegal content. Researchers used 499 “multi-turn attacks,” where users asked several linked questions to trick safety filters. Each chat included five to ten exchanges.

The team measured how often chatbots produced damaging information, such as stolen data or misinformation. When asked multiple questions, chatbots gave malicious answers in 64 per cent of cases, compared to 13 per cent when asked only once. Success rates varied widely — 26 per cent for Google’s Gemma and up to 93 per cent for Mistral’s Large Instruct.

Cisco said these results show that multi-turn attacks can spread harmful content or help hackers access confidential business data. AI models often forget their safety training in longer conversations, allowing attackers to gradually refine their prompts and break through safeguards.

Open Models Shift Safety Burden to Users

Mistral, Meta, Google, OpenAI, and Microsoft all offer open-weight language models that reveal their training parameters to the public. Cisco reported that these open systems usually include weaker safety layers so users can modify them freely. This approach transfers responsibility for maintaining safety to whoever customizes the models.

Cisco added that Google, OpenAI, Meta, and Microsoft have tried to reduce malicious fine-tuning of their systems. Still, critics continue to blame AI firms for weak safety barriers that criminals can easily exploit.

In August, US company Anthropic admitted that criminals used its Claude model to steal personal data and demand ransoms exceeding $500,000 (€433,000).

What's Hot

Senators Revive U.S. AI Innovation Bill

New Immunotherapy Drug Shows Early Promise Against Advanced Prostate Cancer

Middle East Tensions Spike After Israeli Strikes on Iran

Senators Revive U.S. AI Innovation Bill

Instagram Will Alert Parents When Teens Search for Self-Harm or Suicide

OpenAI Weighed Police Alert Over Future School Shooting Suspect

Europe Faces Growing Pressure as Big Tech Floods AI with Billions

Discord enforces stricter global age verification for adult content

Sydney Scientists Recreate Cosmic Dust to Probe Life’s Origins

Trump Expels Anthropic From Federal AI Projects as Pentagon Conflict Escalates

Pakistan and Taliban on the Brink: Could Border Skirmishes Escalate into War?

Burger King Tests AI Assistant to Monitor Service Language

Google Pushes Gemini AI Deeper Into Pixel Ecosystem

Taiwanese Man Jailed for Fuel Scheme to North Korea

Trump pushes for Ukraine-Russia talks as Kremlin cools summit hopes

Categories

Important Links