公司：dropbox的相关资料

Meet Chrono, our scalable, consistent, metadata caching solution

Efficiently storing and retrieving metadata has long been a core focus for metadata teams at Dropbox, and in recent years we've made significant improvements to our foundational layer. By building our own incrementally scalable key-value storage, we’ve been able to avoid necessary doubling of our hardware fleet. As we scale, our evolution continues—but there are still challenges we cannot ignore.

At our current rate of growth, we knew there would come a time when the volume of read queries per second (QPS) could not be sustainably served by our existing underlying storage machines. Caching was the obvious solution—but not necessarily an easy one.

dropbox技术

Bringing AI-powered answers and summaries to file previews on the web

Dropbox offers a handful of features that use machine learning to understand content much like a human would. For example, Dropbox can generate summaries and answer questions about files when those…

dropbox技术

Bye Bye Bye...: Evolution of repeated token attacks on ChatGPT models

Building on prior prompt injection research, we recently discovered a new training data extraction vulnerability involving OpenAI’s chat completion models.

dropbox技术

From AI to sustainability, why our latest data centers use 400G networking

Dropbox最近推出了使用400G以太网技术的数据中心架构。他们采用了400G-DAC互连的简化版本的MDF机架和跨数据中心的数据中心互连。通过优化光纤网络，支持从100G到400G的连接，并使用密集波分复用系统，提供了更高的带宽和更低的成本。他们在采用400G技术过程中学到了一些重要的教训，例如对组件进行细致的测试，确保向后兼容性，并制定了应对供应链不稳定因素的备用计划。这些经验将有助于他们在未来的发展中应对新的挑战。

dropbox技术

Is this a date? Using ML to identify date formats in file names

File names play a vital role in facilitating effective communication and organization among teams. Files with cryptic or nonsensical names can quickly lead to chaos—whereas a well-structured naming system can streamline workflows, improve collaboration, and ensure easy retrieval of information. Consistency in naming across different files and file types enables teams to find and share content more efficiently, saving time and reducing frustration.

To make it easier for our users to organize and find their files, Dropbox has an automated feature called naming conventions. With this feature, users can set rules around how files should be named, and files uploaded to a specific folder will automatically be renamed to match the preferred convention. For example, files could be renamed to include a keyword or date.

dropbox技术

How Edison is helping us build a faster, more powerful Dropbox on the web

How Dropbox re-wrote its core web serving stack for the next decade—sunsetting technical debt accrued over 13 years, and migrating high-traffic surfaces to a future-proofed platform ready for the company’s multi-product evolution.

dropbox技术

Investigating the impact of HTTP3 on network latency for search

Dropbox is well known for storing users’ files—but it’s equally important we can retrieve content quickly when our users need it most. For the Retrieval Experiences team, that means building a search experience that is as fast, simple, and powerful as possible. But when we conducted a research study in July 2022, one of the most common complaints was that search was still too slow. If search was faster, these users said, they would be more likely to use Dropbox on a regular basis.

dropbox技术

Dont you (forget NLP): Prompt injection with control characters in ChatGPT

Like many companies, Dropbox has been experimenting with large language models (LLMs) as a potential backend for product and research initiatives. As interest in leveraging LLMs has increased in recent months, the Dropbox Security team has been advising on measures to harden internal Dropbox infrastructure for secure usage in accordance with our AI principles. In particular, we’ve been working to mitigate abuse of potential LLM-powered products and features via user-controlled input.

Injection attacks that manipulate inputs used in LLM queries have been one such focus for Dropbox security engineers. For example, an adversary who is able to modify server-side data can then manipulate the model’s responses to a user query. In another attack path, an abusive user may try to infer information about the application’s instructions in order to circumvent server-side prompt controls for unrestricted access to the underlying model.

As part of this work, we recently observed some unusual behavior with two popular large language models from OpenAI, in which control characters (like backspace) are interpreted as tokens. This can lead to situations where user-controlled input can circumvent system instructions designed to constrain the question and information context. In extreme cases, the models will also hallucinate or respond with an answer to a completely different question.

dropbox技术

How we reduced the size of our JavaScript bundles by 33%

When was the last time you were about to click a button on a website, only to have the page shift—causing you to click the wrong button instead? Or the last time you rage-quit a page that took too long to load?

These problems are only amplified in applications as rich and interactive as ours. The more front-end code is written to support more complex features, the more bytes are sent to the browser to be parsed and executed, and the worse performance can get.

At Dropbox, we understand how incredibly annoying such experiences can be. Over the past year, our web performance engineering team narrowed some of our performance problems down to an oft-overlooked culprit: the module bundler.

dropbox技术

Balancing quality and coverage with our data validation framework

At Dropbox, we store data about how people use our products and services in a Hadoop-based data lake. Various teams rely on the information in this data lake for all kinds of business purposes—for example, analytics, billing, and developing new features—and our job is to make sure that only good quality data reaches the lake.

Our data lake is over 55 petabytes in size, and quality is always a big concern when working with data at this scale. The features we build, the decisions we make, and the financial results we report all hinge on our data being accurate and correct. But with so much data to sift through, quality problems can be incredibly hard to find—if we even know they exist in the first place. It's the data engineering equivalent of looking for a black cat in a dark room.

dropbox技术

Accelerating our A/B experiments with machine learning

Like many companies, Dropbox runs experiments that compare two product versions—A and B—against each other to understand what works best for our users. When a company generates revenue from selling advertisements, analyzing these A/B experiments can be done promptly; did a user click on an ad or not? However, at Dropbox we sell subscriptions, which makes analysis more complex. What is the best way to analyze A/B experiments when a user’s experience over several months can affect their decision to subscribe?

For example, let’s say we wanted to measure the effect of a change in how we onboard a new trial user on the first day of their trial. We could pick some metric that is available immediately—such as the number of files uploaded—but this might not be well correlated with user satisfaction. We could wait 90 days to see if the user converts and continues on a paid subscription, but that takes a long time. Is there a metric that is both available immediately and highly correlated with user satisfaction?

We found that, yes, there is a better metric: eXpected Revenue (XR). Using machine learning, we can make a prediction about the probable value of a trial user over a two-year period, measured as XR. This prediction is made a few days after the start of a trial, and it is highly correlated with user satisfaction. With machine learning we can now draw accurate conclusions from A/B experiments in a matter of days instead of months—meaning we can run more experiments every year, giving us more opportunities to make the Dropbox experience even better for our users.

dropbox技术

Increasing Magic Pocket write throughput by removing our SSD cache disks

When Magic Pocket adopted SMR drives in 2017, one of the design decisions was to use SSDs as a write-back cache for live writes. The main motivation was that SMR disks have a reputation for being slower for random writes than their PMR counterparts. To compensate, live writes to Magic Pocket were committed to SSDs first and acknowledgements were sent to upstream services immediately. An asynchronous background process would then flush a set of these random writes to SMR disks as sequential writes. Using this approach, Magic Pocket was able to support higher disk densities while maintaining our durability and availability guarantees.

The design worked well for us over the years. Our newer generation storage platforms were able to support disks with greater density (14-20 TB per disk). A single storage host—with more than 100 such data disks and a single SSD—was able to support 1.5-2 PBs of raw data. But as data density increased, we started to hit limits with maximum write throughput per host. This was primarily because all live writes would pass through a single SSD.

We found each host's write throughput was limited by the max write throughput of its SSD. Even the adoption of NVMe-based SSD drives wasn't enough to keep up with Magic Pocket’s scale. While a typical NVMe based SSD can handle up to 15-20 Gbps in write throughput, this was still far lower than the cumulative disk throughput of hundreds of disks on a single one of our hosts.

This bottleneck only became more apparent as the density of our storage hosts increased. While higher density storage hosts meant we needed fewer servers, our throughput remained unchanged—meaning our SSDs had to handle even more writes than before to keep up with Magic Pocket’s needs.

dropbox技术

Future-proofing our metadata stack with Panda, a scalable key-value store

Metadata is crucial for serving user requests. It also takes up a lot of space—and as we’ve grown, so has the amount of metadata we’ve had to store. This isn’t a bad problem to have, but we knew it was only a matter of time before our metadata stack would need an overhaul.

Dropbox operates two large-scale metadata storage systems powered by sharded MySQL. One is the Filesystem which contains metadata related to files and folders. The other is Edgestore, which powers all other internal and external Dropbox services. Both operate at a massive scale. They run on thousands of servers, store petabytes of data on SSDs, and serve tens of millions of queries per second with single-digit millisecond latency.

dropbox技术

Everything in its write place: Cloud storage abstraction with Object Store

Dropbox originally used Amazon S3 and the Hadoop Distributed File System (HDFS) as the backbone of its data storage infrastructure. Although we migrated user file data to our internal block storage system Magic Pocket in 2015, Dropbox continued to use S3 and HDFS as a general-purpose store for other internal products and tools. Among these use cases were crash traces, build artifacts, test logs, and image caching.

Using these two legacy systems as generic blob storage caused many pain points—the worst of which was the cost inefficiency of using S3’s API. For instance, crash traces wrote many objects which were rarely accessed unless specifically needed for an investigation, generating a large PUT bill. Caches built against S3 burned pricey GET requests with each cache miss.

dropbox技术

Defending against SSRF attacks (with help from our bug bounty program)

Over the past few years, server-side request forgery (SSRF) has received an increasing amount of attention from security researchers. With SSRF, an attacker can retarget a request to internal services and exploit the implicit trust within the network. It often escalates into a critical vulnerability, and in 2021 it was among the top ten web application security risks identified by the Open Web Application Security Project. At Dropbox, it’s the Application Security team’s responsibility to guard against and address SSRF in a scalable manner, so that our engineers can deliver products securely and with as little friction as possible.

dropbox技术

We’re using TTVC to measure performance on the web—and now you can too

Nobody likes waiting for software. Snappy, responsive interfaces make us happy, and research shows there’s a relationship between responsiveness and attention1. But maintaining fast-feeling websites often requires tradeoffs. This might mean diverting resources from the development of new features, paying off technical debt, or other engineering work. The key to justifying such diversions is by connecting the dots between performance and business outcomes—something we can do through measurement.

Over the last year, we’ve been rethinking the way we track page load performance on the web at Dropbox. After identifying a few gaps in our existing metrics, we decided we needed a more objective, user-focused way to define page load performance so that we could more reliably and meaningfully compare experiences across products. We thought a relatively new page load metric called Time To Visually Complete (TTVC) could work well.

There was just one problem: Browsers don’t yet report the moment a page becomes visually complete. If we wanted to adopt TTVC as our new primary performance metric, we would have to fill that gap. So we built a small library to allow us to track TTVC as our users experience it in the real world. That library is @dropbox/ttvc—and we’re excited to be open-sourcing this work!

dropbox技术