Skip to content

AI Safety Institute reveals findings on the safety of Large Language Models

Assessing the Potential for Large Language Models to Compromise National Security through Testing

Artificial Intelligence Safety Institute announces findings on the security of Large Language...
Artificial Intelligence Safety Institute announces findings on the security of Large Language Models

AI Safety Institute reveals findings on the safety of Large Language Models

UK Government's Institute for AI Safety Releases Preliminary AI Testing Results

The UK Government's Institute for AI Safety (AISI) has released the first results of its international joint testing exercises, evaluating the safety of AI models against various threats. The specific focus of the testing is how models can be used to undermine national security, particularly in relation to cyber, chemical, and biological risks.

The testing involved several anonymous AI models, referred to as Model A, Model B, Model C, and Model D, among others. The exact names of the models and whether the latest versions were used remain unknown. The exercise highlighted challenges in agentic AI safety, with varying degrees of robustness against safety risks. Some models achieved up to ~57%-70% in limited conditions, while others scored lower (~35%) on safety measures. However, no publicly available, comprehensive report names all tested AI models or provides detailed vulnerability analyses specifically for cyber, chemical, and biological threats.

The testing results suggest that some AI models provided harmful outputs even without attempts to circumvent their safeguards. Two AI models completed short-horizon agent tasks but were unable to plan and execute sequences of actions for more complex tasks. Five leading AI models were screened for their cyber, chemical, and biological agent capabilities, and several of the tested models demonstrated expert-level knowledge of chemistry and biology.

The AISI is working closely with companies worldwide to test their AI models, identify vulnerabilities, and instill necessary safety requirements. The collaboration involves sharing and assessing models for weaknesses, but no explicit details about cyber, chemical, and biological threat profiles have been disclosed.

The UK has a strategic partnership with OpenAI, which includes providing the government with deeper access to frontier AI models to evaluate them for misuse, bias, and security concerns. This partnership will aid in testing future models' safety guardrails but does not yet disclose specific test results or vulnerability assessments per threat type.

An independent assessment by the Future of Life Institute rates overall safety practices rather than detailing individual model vulnerabilities to specific threats. Saqib Bhatti MP, the Undersecretary of State for the Department of Science, Innovation and Technology, stated that legislation regarding AI safety will happen "eventually" and will be informed by testing.

The AISI plans to collaborate with its Canadian counterpart to deepen existing links between the two nations and inspire collaborative work on systemic safety research. The Institute will also share research and conduct joint evaluations of AI models as part of its collaboration with the US. The Institute's new base in San Francisco will be located in the heart of Silicon Valley, aiming to further the country's strategic partnership and approach to AI safety.

The results of the AI testing will likely be discussed at the upcoming Seoul Summit this week, co-hosted by the UK and the Republic of Korea, with the intention to inform AI safety policy across the globe. Despite the ongoing nature of these assessments, the AISI's focus on collaboration, transparency, and continuous improvement is a step towards ensuring the safety and security of AI technology.

The testing results indicate that some artificial intelligence (AI) models may pose threats to cyber, chemical, and biological security, with varying levels of safety robustness among the models tested. The UK Government's Institute for AI Safety (AISI) is currently collaborating with international partners and tech companies to identify and address these vulnerabilities in AI models, with a focus on enhancing the systemic safety of AI technology.

Read also:

    Latest