How we use Lakera Guard to secure our LLMs

From search to organization, rapid advancements in artificial intelligence (AI) have made it easier for Dropbox users to discover and interact with their files. However, these advancements can also introduce new security challenges. Large Language Models (LLMs), integral to some of our most recent intelligent features, are also susceptible to various threats—from data breaches and adversarial attacks to exploitation by malicious actors. While hundreds of millions of users already trust Dropbox to protect their content, ensuring the security and integrity of these models is essential for maintaining that trust.

Last year we evaluated several security solutions to help safeguard our LLM-powered applications and ultimately chose Lakera Guard. With its robust capabilities, Lakera Guard helps us secure and protect user data, and—as outlined in our AI principles—uphold the reliability and trustworthiness of our intelligent features.

Addressing these challenges requires a multifaceted approach, incorporating stringent security protocols, continuous monitoring, and proactive risk management strategies. In this story, we’ll share insights into our approach to securing our LLMs, the criteria we used to evaluate potential solutions, and the key benefits of implementing Lakera's technology.

LLM security is comprised of many parts. Common problems include reliability, consistency, alignment, and adversarial attacks. However, the scope of the problem we were trying to solve was more customer-centric—specifically, using LLMs to chat about, summarize, transcribe, and retrieve information, in addition to agent/assistant use cases. These kinds of untrusted user inputs could result in moderation issues or prompt injection—a method sometimes used to manipulate models—which creates a lot of headaches, including undesirable model outputs.

We considered a variety of open source, in-house, and proprietary options before narrowing our criteria to either open source or commercial tools. Whatever we chose, we decided the following requirements were mandatory:

  • We couldn’t call out to a third party. The solution had to be deployable in-house on our existing infrastructure.
  • Low latency. Dropbox is committed to maximizing performance for users across all of its products. We couldn’t add latency to LLM-powered features any more than absolutely necessary, so we determined upper latency numbers with the product teams.
    • Latency for a given context length is also an important sub-problem here. Many options perform well on context lengths of <800 tokens, but drop off significantly at >4,000. Excellent support for long context lengths—the ability for models to process greater amounts of information—was critical, as many customer use cases routinely exceed this number.
  • Confidence scores. API integrations that not only allowed extensive control over the categories of blocking, but also the sensitivity, were key (eg., separating the danger classification jailbreak based on confidence scores in order to ensure we could meet the diverse needs of our product teams).
  • Future intelligence and continuous improvement. Since LLM security is a fast evolving space, we wanted a solution that could also give us actionable insights into attacks and payloads in a rapidly shifting environment.

In fact, given the rapidly shifting environment, our top priority was selecting a solution that gave us enough of a foothold to observe and reorient as needed.

Once we had a short list of open-source and commercial tools that met our criteria, we set up each tool internally for evaluation. For our test suite, we used Garak, an open-source LLM vulnerability scanner customized to run Dropbox-specific security tests. With Garak, we could evaluate the security coverage of each of the potential solutions. This allowed us to conduct a range of tests involving prompt injection, jailbreak, and other security assessments developed by Dropbox. 

We then tested each solution directly against a range of LLMs already in use or under evaluation by our product teams. This enabled us to establish a baseline of each model’s vulnerability. For example, if a security tool blocked 90% of malicious prompts, but the LLM had already mitigated 85% of these vulnerabilities, we measured a net improvement of only 5%.

Finally, we needed a tool that did not add excessive latency to LLM calls and respected the privacy of customer data (e.g., did not store prompt content or send it outside the Dropbox network). For this, we measured the response time of each security test and also monitored network requests and file changes to detect any potential breaches of user data.

After extensive testing, Lakera Guard emerged as the product meeting all our requirements, offering both the lowest latency and highest security coverage.

How we integrated Lakera Guard

Lakera provides a Docker container that we run as an internal service at Dropbox. This means Lakera Guard is just an RPC call away from any LLM pipeline. Conceptually, the LLM security architecture at Dropbox is designed using LangChain as shown in the figure below. 

A high-level representation of the internal AI/ML security infrastructure at Dropbox

Here, a textual LLM prompt is directed through one or more prompt security chains before hitting the model. We have a security chain that makes Lakera Guard security API endpoint requests to our internally-hosted Docker container, which responds with confidence scores for prompt injection and jailbreak attacks. Dropbox services can then action on the returned Lakera Guard prompt security categories as appropriate for the application. 

Prompts that are deemed to be safe are then passed to the LLM—either a third-party model, like GPT-4, or an internally hosted open-source model, like LLaMA 3, depending on the use case—which produces a textual response. The LLM’s response is then passed through our content moderation chains, which analyze the text for potentially harmful topics. The moderation chain calls out to Lakera’s content moderation API endpoint to identify harassing or explicit content that the Dropbox feature or service can withhold from the user as configured.

Integrating Lakera Guard into our Dropbox infrastructure was a gradual process. We started with one product directly calling the Lakera-provided Docker container. Eventually, we created a custom Dropbox service that can automatically scale up more containers as needed—and can be called via the LLM security layer we built as part of Dropbox’s central machine learning libraries.

What we learned, and what’s next

Throughout this process, several product teams had concerns about latency—especially since many LLM use cases at Dropbox have prompts of more than 8,000 characters. We worked closely with Lakera to minimize added latency as much as possible, and our current average latency is now a 7x improvement for prompts with more than 8,000 characters.

Our belief in Lakera is so strong that we've invested in its continued success and have collaborated with its teams on numerous improvements to Lakera Guard itself. We found novel ways to cause LLM hallucination and collaborated with Lakera to increase efficacy for malicious prompt detection. We also shared internal research—such as our work on repeated token attacks—as well as some interesting false positives.

Finally, by working closely with the machine learning and product teams, we were able to help them meet their requirements while also achieving our security goals. For example, some of the false positives we encountered were a result of poor user input sanitization—or the process of filtering any potentially harmful or unwanted characters—which we were able to pass back to the product teams for improvement. Lakera has also been very interested in understanding our product flows to ensure they’re delivering a product that meets us where we are. 

We’re currently planning to expand our Lakera Guard integration to all products using LLMs at Dropbox. This will involve tuning the detections for each use case and determining other potential causes of false positives or high latencies that can occur with some of the different data structures currently in use.

One of our core commitments at Dropbox is being worthy of our customers' trust. Partnering with Lakera to protect our users and their data is a testament to this promise. There are also many more interesting problems yet to be solved, and we plan to share more about how our approach to LLM security continues to evolve in future posts.

~ ~ ~

If building innovative products, experiences, and infrastructure excites you, come build the future with us! Visit dropbox.com/jobs to see our open roles, and follow @LifeInsideDropbox on Instagram and Facebook to see what it's like to create a more enlightened way of working.

首页 - Wiki
Copyright © 2011-2024 iteam. Current version is 2.137.1. UTC+08:00, 2024-11-23 02:20
浙ICP备14020137号-1 $访客地图$