OpenAI, Google, Meta and more companies put their large language models to the test on the weekend of August 12 at the DEF CON hacker conference in Las Vegas. The result is a new corpus of information shared with the White House Office of Science and Technology Policy and the Congressional AI Caucus. The Generative Red Team Challenge organized by AI Village, SeedAI and Humane Intelligence gives a clearer picture than ever before of how generative AI can be misused and what methods might need to be put in place to secure it.
On August 29, the challenge organizers announced the winners of the contest: Cody “cody3” Ho, a student at Stanford University; Alex Gray of Berkeley, California; and Kumar, who goes by the username “energy-ultracode” and preferred not to publish a last name, from Seattle. The contest was scored by a panel of independent judges. The three winners each received one NVIDIA RTX A6000 GPU.
This challenge was the largest event of its kind and one that will allow many students to get in on the ground floor of cutting-edge hacking.
Jump to:
- What is the Generative Red Team Challenge?
- Generative Red Team Challenge could influence AI security policy
- What vulnerabilities are LLMs likely to have?
- How to prevent LLM vulnerabilities
What is the Generative Red Team Challenge?
The Generative Red Team Challenge asked hackers to force generative AI to do exactly what it isn’t supposed to do: provide personal or dangerous information. Challenges included finding credit card information and learning how to stalk someone.
A group of 2,244 hackers participated, with each taking a 50-minute slot to try to hack a large language model chosen at random from a pre-established selection. The large language models being put to the test were built by Anthropic, Cohere, Google, Hugging Face, Meta, NVIDIA, OpenAI and Stability. Scale AI developed the testing and evaluation system.
Participants sent 164,208 messages in 17,469 conversations over the course of the event in 21 types of tests; they worked on secured Google Chromebooks. The 21 challenges included getting the LLMs to create discriminatory statements, fail at math problems, make up fake landmarks, or create false information about a political event or political figure.
SEE: At Black Hat 2023, a former White House cybersecurity expert and more weighed in on the pros and cons of AI for security. (TechRepublic)
“The diverse issues with these models will not be resolved until more people know how to red team and assess them,” said Sven Cattell, the founder of AI Village, in a press release. “Bug bounties, live hacking events and other standard community engagements in security can be modified for machine learning model-based systems.”
Making generative AI work for everyone’s benefit
“Black Tech Street led more than 60 Black and Brown residents of historic Greenwood [Tulsa, Oklahoma] to DEF CON as a first step in establishing the blueprint for equitable, responsible, and accessible AI for all humans,” said Tyrance Billingsley II, founder and executive director of innovation economy development organization Black Tech Street, in a press release. “AI will be the most impactful technology that humans have ever created, and Black Tech Street is focused on ensuring that this technology is a tool for remedying systemic social, political and economic inequities rather than exacerbating them.”
“AI holds incredible promise, but all Americans – across ages and backgrounds – need a say on what it means for their communities’ rights, success, and safety,” said Austin Carson, founder of SeedAI and co-organizer of the GRT Challenge, in the same press release.
Generative Red Team Challenge could influence AI security policy
This challenge could have a direct impact on the White House’s Office of Science and Technology Policy, with office director Arati Prabhakar working on bringing an executive order to the table based on the event’s results.
The AI Village team will use the results of the challenge to make a presentation to the United Nations in September, Rumman Chowdhury, co-founder of Humane Intelligence, an AI policy and consulting firm, and one of the organizers of the AI Village, told Axios.
That presentation will be part of the trend of continuing cooperation between the industry and the government on AI safety, such as the DARPA project AI Cyber Challenge, which was announced during the Black Hat 2023 conference. It invites participants to create AI-driven tools to solve AI security problems.
What vulnerabilities are LLMs likely to have?
Before DEF CON kicked off, AI Village consultant Gavin Klondike previewed seven vulnerabilities someone trying to create a security breach through an LLM would probably find:
- Prompt injection.
- Modifying the LLM parameters.
- Inputting sensitive information that winds up on a third-party site.
- The LLM being unable to filter sensitive information.
- Output leading to unintended code execution.
- Server-side output feeding directly back into the LLM.
- The LLM lacking guardrails around sensitive information.
“LLMs are unique in that we should not only consider the input from users as untrusted, but the output of LLMs as untrusted,” he pointed out in a blog post. Enterprises can use this list of vulnerabilities to watch for potential problems.
In addition, “there’s been a bit of debate around what’s considered a vulnerability and what’s considered a feature of how LLMs operate,” Klondike said.
These features might look like bugs if a security researcher were assessing a different kind of system, he said. For example, the external endpoint could be an attack vector from either direction — a user could input malicious commands or an LLM could return code that executes in an unsecured fashion. Conversations must be stored in order for the AI to refer back to previous input, which could endanger a user’s privacy.
AI hallucinations, or falsehoods, don’t count as a vulnerability, Klondike pointed out. They aren’t dangerous to the system, though AI hallucinations are factually incorrect.
How to prevent LLM vulnerabilities
Although LLMs are still being explored, research organizations and regulators are moving quickly to create safety guidelines around them.
Daniel Rohrer, NVIDIA vice president of software security, was on-site at DEF CON and noted that the participating hackers talked about the LLMs as if each brand had a distinct personality. Anthropomorphizing aside, the model an organization chooses does matter, he said in an interview with TechRepublic.
“Choosing the right model for the right task is extremely important,” he said. For example, ChatGPT potentially brings with it some of the more questionable content found on the internet; however, if you’re working on a data science project that involves analyzing questionable content, an LLM system that can look for it might be a valuable tool.
Enterprises will likely want a more tailored system that uses only relevant information. “You have to design for the point of the system and application you’re trying to achieve,” Rohrer said.
Other common suggestions for how to secure an LLM system for enterprise use include:
- Limit an LLM’s access to sensitive data.
- Educate users on what data the LLM gathers and where that data is stored, including whether it is used for training.
- Treat the LLM as if it were a user, with its own authentication/authorization controls on access to proprietary information.
- Use the software available to keep AI on task, such as NVIDIA’s NeMo Guardrails or Colang, the language used to build NeMo Guardrails.
Finally, don’t skip the basics, Rohrer said. “For many who are deploying LLM systems, there are a lot of security practices that exist today under the cloud and cloud-based security that can be immediately applied to LLMs that in some cases have been skipped in the race to get to LLM deployment. Don’t skip those steps. We all know how to do cloud. Take those fundamental precautions to insulate your LLM systems, and you’ll go a long way to meeting a number of the usual challenges.”
Note: This article was updated to reflect the DEF CON challenge’s winners and the number of participants.