AI models023 Archivesstill easy targets for manipulation and attacks, especially if you ask them nicely.
A new report from the UK's new AI Safety Institute found that four of the largest, publicly available Large Language Models (LLMs) were extremely vulnerable to jailbreaking, or the process of tricking an AI model into ignoring safeguards that limit harmful responses.
"LLM developers fine-tune models to be safe for public use by training them to avoid illegal, toxic, or explicit outputs," the Insititute wrote. "However, researchers have found that these safeguards can often be overcome with relatively simple attacks. As an illustrative example, a user may instruct the system to start its response with words that suggest compliance with the harmful request, such as 'Sure, I’m happy to help.'"
Researchers used prompts in line with industry standard benchmark testing, but found that some AI models didn't even need jailbreaking in order to produce out-of-line responses. When specific jailbreaking attacks were used, every model complied at least once out of every five attempts. Overall, three of the models provided responses to misleading prompts nearly 100 percent of the time.
"All tested LLMs remain highly vulnerable to basic jailbreaks," the Institute concluded. "Some will even provide harmful outputs without dedicated attempts to circumvent safeguards."
The investigation also assessed the capabilities of LLM agents, or AI models used to perform specific tasks, to conduct basic cyber attack techniques. Several LLMs were able to complete what the Instititute labeled "high school level" hacking problems, but few could perform more complex "university level" actions.
The study does not reveal which LLMs were tested.
Last week, CNBC reported OpenAI was disbanding its in-house safety team tasked with exploring the long term risks of artificial intelligence, known as the Superalignment team. The intended four year initiative was announced just last year, with the AI giant committing to using 20 percent of its computing power to "aligning" AI advancement with human goals.
"Superintelligence will be the most impactful technology humanity has ever invented, and could help us solve many of the world’s most important problems," OpenAI wrote at the time. "But the vast power of superintelligence could also be very dangerous, and could lead to the disempowerment of humanity or even human extinction."
The company has faced a surge of attention following the May departures of OpenAI co-founder Ilya Sutskever and the public resignation of its safety lead, Jan Leike, who said he had reached a "breaking point" over OpenAI's AGI safety priorities. Sutskever and Leike led the Superalignment team.
On May 18, OpenAI CEO Sam Altman and president and co-founder Greg Brockman responded to the resignations and growing public concern, writing, "We have been putting in place the foundations needed for safe deployment of increasingly capable systems. Figuring out how to make a new technology safe for the first time isn't easy."
Topics Artificial Intelligence Cybersecurity OpenAI
The 8 best wireless earbuds for 2024: Compare AirPods Pro to cheaper optionsHow to watch 'Aquaman 2' — streaming release date, Max dealsZendaya and Timothée Chalamet sport matching coveralls for 'Dune: Part Two' promoBest Dyson deals Feb. 2024: V12 Detect Slim and a refurbished Supersonic that keeps getting cheaperWatch SpaceX try to make history liveJustin Trudeau casually kayaks over to talk to his constituentsSuper rare twoNetflix's 'Avatar: The Last Airbender' does right by the cabbage merchantWordle today: The answer and hints for February 21Apple is giving iMessage a massive security updateNYT's The Mini crossword answers for February 21First Neuralink patient can control a computer mouse by thinking, claims Elon MuskYour favorite weatherman, Brad Pitt, is back with another depressing forecast for the worldTrudeau and Obama had a cozy date night and everyone is swooning'Avatar: The Last Airbender' review: Netflix's liveThis '4D' printing method could make packing for outer space much easierNYT's The Mini crossword answers for February 21Climate change efforts still 'not nearly enough' to meet Paris targetsTrudeau and Obama had a cozy date night and everyone is swooningNew Adobe 'AI Assistant' will summarize PDFs for you The emotional toll of covering climate change in the Trump era Everyone on 'Scandal' swears Season 6 is not about Trump How a Holocaust survivor's graphic novel helped her work through the pain People are gripped by this LAPD car chase streamed live on Facebook Bejeweled penis brooches are the sneaky NSFW trend for 2017 Spotify offers Barack Obama a gig as President of Playlists McDonald’s Japan takes burgers so seriously it's holding a full Cat named Pretty Boy is a caring midwife for a lucky, pregnant goat Introducing the Dystopia Project: Tales to help you through the Dark Times 640 companies to Trump: Stay the course on clean energy The BBC is launching a live 'Sherlock' mystery for you to solve on Twitter 1.5 million gaming profiles leaked after site refuses hacker's $100,000 ransom Social media is going nuts for this giant ball pit 'beach' for grown ups Kristen Bell and Dax Shephard skipped Golden Globes parties to play 'Settlers of Catan' Rick and his group will be 'tested' like never before on 'The Walking Dead' Rooster Teeth's new comedy show will stuff your brain with 'useless knowledge' LinkedIn ranks the 20 highest paying jobs 'Jallikattu', an ancient bull Humans will drive polar bears to extinction without climate action, feds say Former Obama aides behind 'Keepin' It 1600' start a new podcast for the Trump era
1.8128s , 10134.4609375 kb
Copyright © 2025 Powered by 【2023 Archives】,Exquisite Information Network