Dropbox 于2007年5月由麻省理工学院学生德鲁·休斯顿和阿拉什·费道斯创立，时名Evenflow, Inc.，于2009年10月更名为Dropbox，总部位于美国加利福尼亚州旧金山。
How Dropbox re-wrote its core web serving stack for the next decade—sunsetting technical debt accrued over 13 years, and migrating high-traffic surfaces to a future-proofed platform ready for the company’s multi-product evolution.
Dropbox is well known for storing users’ files—but it’s equally important we can retrieve content quickly when our users need it most. For the Retrieval Experiences team, that means building a search experience that is as fast, simple, and powerful as possible. But when we conducted a research study in July 2022, one of the most common complaints was that search was still too slow. If search was faster, these users said, they would be more likely to use Dropbox on a regular basis.
Like many companies, Dropbox has been experimenting with large language models (LLMs) as a potential backend for product and research initiatives. As interest in leveraging LLMs has increased in recent months, the Dropbox Security team has been advising on measures to harden internal Dropbox infrastructure for secure usage in accordance with our AI principles. In particular, we’ve been working to mitigate abuse of potential LLM-powered products and features via user-controlled input.
Injection attacks that manipulate inputs used in LLM queries have been one such focus for Dropbox security engineers. For example, an adversary who is able to modify server-side data can then manipulate the model’s responses to a user query. In another attack path, an abusive user may try to infer information about the application’s instructions in order to circumvent server-side prompt controls for unrestricted access to the underlying model.
As part of this work, we recently observed some unusual behavior with two popular large language models from OpenAI, in which control characters (like backspace) are interpreted as tokens. This can lead to situations where user-controlled input can circumvent system instructions designed to constrain the question and information context. In extreme cases, the models will also hallucinate or respond with an answer to a completely different question.
When was the last time you were about to click a button on a website, only to have the page shift—causing you to click the wrong button instead? Or the last time you rage-quit a page that took too long to load?
These problems are only amplified in applications as rich and interactive as ours. The more front-end code is written to support more complex features, the more bytes are sent to the browser to be parsed and executed, and the worse performance can get.
At Dropbox, we understand how incredibly annoying such experiences can be. Over the past year, our web performance engineering team narrowed some of our performance problems down to an oft-overlooked culprit: the module bundler.
At Dropbox, we store data about how people use our products and services in a Hadoop-based data lake. Various teams rely on the information in this data lake for all kinds of business purposes—for example, analytics, billing, and developing new features—and our job is to make sure that only good quality data reaches the lake.
Our data lake is over 55 petabytes in size, and quality is always a big concern when working with data at this scale. The features we build, the decisions we make, and the financial results we report all hinge on our data being accurate and correct. But with so much data to sift through, quality problems can be incredibly hard to find—if we even know they exist in the first place. It's the data engineering equivalent of looking for a black cat in a dark room.
Like many companies, Dropbox runs experiments that compare two product versions—A and B—against each other to understand what works best for our users. When a company generates revenue from selling advertisements, analyzing these A/B experiments can be done promptly; did a user click on an ad or not? However, at Dropbox we sell subscriptions, which makes analysis more complex. What is the best way to analyze A/B experiments when a user’s experience over several months can affect their decision to subscribe?
For example, let’s say we wanted to measure the effect of a change in how we onboard a new trial user on the first day of their trial. We could pick some metric that is available immediately—such as the number of files uploaded—but this might not be well correlated with user satisfaction. We could wait 90 days to see if the user converts and continues on a paid subscription, but that takes a long time. Is there a metric that is both available immediately and highly correlated with user satisfaction?
We found that, yes, there is a better metric: eXpected Revenue (XR). Using machine learning, we can make a prediction about the probable value of a trial user over a two-year period, measured as XR. This prediction is made a few days after the start of a trial, and it is highly correlated with user satisfaction. With machine learning we can now draw accurate conclusions from A/B experiments in a matter of days instead of months—meaning we can run more experiments every year, giving us more opportunities to make the Dropbox experience even better for our users.
When Magic Pocket adopted SMR drives in 2017, one of the design decisions was to use SSDs as a write-back cache for live writes. The main motivation was that SMR disks have a reputation for being slower for random writes than their PMR counterparts. To compensate, live writes to Magic Pocket were committed to SSDs first and acknowledgements were sent to upstream services immediately. An asynchronous background process would then flush a set of these random writes to SMR disks as sequential writes. Using this approach, Magic Pocket was able to support higher disk densities while maintaining our durability and availability guarantees.
The design worked well for us over the years. Our newer generation storage platforms were able to support disks with greater density (14-20 TB per disk). A single storage host—with more than 100 such data disks and a single SSD—was able to support 1.5-2 PBs of raw data. But as data density increased, we started to hit limits with maximum write throughput per host. This was primarily because all live writes would pass through a single SSD.
We found each host's write throughput was limited by the max write throughput of its SSD. Even the adoption of NVMe-based SSD drives wasn't enough to keep up with Magic Pocket’s scale. While a typical NVMe based SSD can handle up to 15-20 Gbps in write throughput, this was still far lower than the cumulative disk throughput of hundreds of disks on a single one of our hosts.
This bottleneck only became more apparent as the density of our storage hosts increased. While higher density storage hosts meant we needed fewer servers, our throughput remained unchanged—meaning our SSDs had to handle even more writes than before to keep up with Magic Pocket’s needs.
Metadata is crucial for serving user requests. It also takes up a lot of space—and as we’ve grown, so has the amount of metadata we’ve had to store. This isn’t a bad problem to have, but we knew it was only a matter of time before our metadata stack would need an overhaul.
Dropbox operates two large-scale metadata storage systems powered by sharded MySQL. One is the Filesystem which contains metadata related to files and folders. The other is Edgestore, which powers all other internal and external Dropbox services. Both operate at a massive scale. They run on thousands of servers, store petabytes of data on SSDs, and serve tens of millions of queries per second with single-digit millisecond latency.
Dropbox originally used Amazon S3 and the Hadoop Distributed File System (HDFS) as the backbone of its data storage infrastructure. Although we migrated user file data to our internal block storage system Magic Pocket in 2015, Dropbox continued to use S3 and HDFS as a general-purpose store for other internal products and tools. Among these use cases were crash traces, build artifacts, test logs, and image caching.
Using these two legacy systems as generic blob storage caused many pain points—the worst of which was the cost inefficiency of using S3’s API. For instance, crash traces wrote many objects which were rarely accessed unless specifically needed for an investigation, generating a large PUT bill. Caches built against S3 burned pricey GET requests with each cache miss.
Over the past few years, server-side request forgery (SSRF) has received an increasing amount of attention from security researchers. With SSRF, an attacker can retarget a request to internal services and exploit the implicit trust within the network. It often escalates into a critical vulnerability, and in 2021 it was among the top ten web application security risks identified by the Open Web Application Security Project. At Dropbox, it’s the Application Security team’s responsibility to guard against and address SSRF in a scalable manner, so that our engineers can deliver products securely and with as little friction as possible.
Nobody likes waiting for software. Snappy, responsive interfaces make us happy, and research shows there’s a relationship between responsiveness and attention1. But maintaining fast-feeling websites often requires tradeoffs. This might mean diverting resources from the development of new features, paying off technical debt, or other engineering work. The key to justifying such diversions is by connecting the dots between performance and business outcomes—something we can do through measurement.
Over the last year, we’ve been rethinking the way we track page load performance on the web at Dropbox. After identifying a few gaps in our existing metrics, we decided we needed a more objective, user-focused way to define page load performance so that we could more reliably and meaningfully compare experiences across products. We thought a relatively new page load metric called Time To Visually Complete (TTVC) could work well.
There was just one problem: Browsers don’t yet report the moment a page becomes visually complete. If we wanted to adopt TTVC as our new primary performance metric, we would have to fill that gap. So we built a small library to allow us to track TTVC as our users experience it in the real world. That library is @dropbox/ttvc—and we’re excited to be open-sourcing this work!
A good password manager should be able to securely store, sync, and even autofill your username and password when logging into websites and apps. A password manager like…Dropbox Passwords!
When we released Dropbox Passwords in the Summer of 2020, it was important we ensured that a user’s logins would always be available—and up to date—on any device they used. Luckily, Dropbox has some experience here, and we were able to leverage our existing syncing infrastructure to copy a user’s encrypted password info, known as a payload, from one device to another. However, while implementing this crucial component, we encountered an unexpected syncing issue where, sometimes, out-of-date login items would overwrite newer, more recent changes.
Eventually we found a solution that built on prior Dropbox syncing work. But it also involved contemplating the very nature of time itself.
Magic Pocket, the exabyte scale custom infrastructure we built to drive efficiency and performance for all Dropbox products, is an ongoing platform for innovation. We continually look for opportunities to increase storage density, reduce latency, improve reliability, and lower costs. The next step in this evolution is our new deployment of specially configured servers filled to capacity with high-density SMR (Shingled Magnetic Recording) drives.
Dropbox is the first major tech company to adopt SMR technology, and we’re currently adding hundreds of petabytes of new capacity with these high-density servers at a significant cost savings over conventional PMR (Perpendicular Magnetic Recording) drives. Off the shelf, SMR drives have the reputation of being slower to write to than conventional drives. So the challenge has been to benefit from the cost savings of the denser drives without sacrificing performance. After all, our new products support active collaboration between small teams all the way up to the largest enterprise customers. That’s a lot of data to write, and the experience has to be fast.
In HBO’s Silicon Valley, lossless video compression plays a pivotal role for Pied Piper as they struggle to stream HD content at high speed.
Inspired by Pied Piper, we created our own version of their algorithm Pied Piper at Hack Week. In fact, we’ve extended that work and have a bit-exact, lossless media compression algorithm that achieves extremely good results on a wide array of images. (Stay tuned for more on that!)
However, to help our users sync and collaborate faster, we also need to work with a standardized compression format that already ships with most browsers. In that vein, we’ve been working on open source improvements to the Brotli codec, which will make it possible to ship bits to our business customers using 4.4% less of their bandwidth than through gzip.
Over the past four years, we've been working hard on rebuilding our desktop client's sync engine from scratch. The sync engine is the magic behind the Dropbox folder on your desktop computer, and it's one of the oldest and most important pieces of code at Dropbox. We're proud to announce today that we've shipped this new sync engine (codenamed "Nucleus") to all Dropbox users.
Rewriting the sync engine was really hard, and we don’t want to blindly celebrate it, because in many environments it would have been a terrible idea. It turned out that this was an excellent idea for Dropbox but only because we were very thoughtful about how we went about this process. In particular, we’re going to share reflections on how to think about a major software rewrite and highlight the key initiatives that made this project a success, like having a very clean data model.
Dropbox Capture is a new visual communication tool designed to make it easy for teams to asynchronously share their work using screen recordings, video messages, screenshots, or GIFs. There's no formal onboarding required, and you can start sharing your ideas in seconds. In fact, simplicity is key to the Capture experience, and it's a value that also extends down to the development of Capture’s underlying code.