Using Marketplace Marginal Values to Address Interference Bias
Written by Shima Nassiri and Ido Bright
Network Effect
At Lyft, we run various randomized experiments to tackle different measurement needs. User-split experiments account for 90% of the randomized studies due to the higher power and fit for most use cases. However, they are prone to interference or network bias. In a multi-sided marketplace, there is no such thing as a perfect balance of supply and demand and one side of the market is congested: if we have oversupply, we can run rider-split experiments without interference concerns. If we are undersupplied, however, interference in a rider-split experiment can severely bias the results. Same goes for under or over-demand situations and driver-split experiments. For example, in a supply constrained situation, not enough drivers are available to address the demand. As illustrated in Figure 1, in such an environment, if the treatment in an A/B experiment incentivises higher convergence of riders to complete their intended rides, there will be fewer resources available for the riders in the control group. Hence, the outcomes of the control group will be negatively impacted by the treatment through the congested resource (i.e., the drivers) and the impact of treatment can be overestimated — this is known as interference bias or network effect. This situation violates the Stable Unit Treatment Value Assumption (SUTVA) which indicates that the control group should not be affected by the treatment to keep the results unbiased.
It’s important to recognize that interference doesn’t always lead to an overestimation of the treatment effect. For instance, in social networks, treating units in the treatment group can positively influence the outcomes for control units who are friends with those treated, boosting control outcomes and thus reducing the perceived treatment effect. Similarly, in a retail setting, with complementary products, treating units in the treatment group might positively impact complementary products often purchased together, inflating control outcomes and underestimating the treatment effect. Conversely, for substitutable products, the opposite occurs, where the treatment effect may be overestimated.
Possible Solutions to Interference
Much of the literature on interference focuses on modifying classical experimental designs to mitigate its effects. Cluster randomization is a popular method for addressing interference. For instance, at Amazon, cluster randomization is explored to tackle interference issues among substitutable products. In Section 4 of Cooperider and Nassiri (2023), the authors also address the challenge of low power resulting from such clustering and discuss how power can be improved through better cluster balancing.
Other alternative designs like time-split or region-split experiments can also be used to address interference. In a time-split experiment all units are exposed to a single treatment at any given time or time-location combination, which helps prevent the interference effect. (This type of experiment is also known as switchback). However, this approach can affect the user experience for user-facing changes. For example, if we frequently toggle a UI feature that provides the driver with more rider information, it might disrupt the user experience. Additionally, time-split experiments are inherently suited for scenarios where the focus is on the overall marketplace impact. They are designed to capture short-term marketplace behavior, as users experience different treatments throughout the experiment. However, it’s not possible to include a holdout group in a time-split experiment, making them unsuitable for assessing long-term impacts. Therefore, time-split experiments are suitable only for a limited range of use cases. Experimenters might opt to run a combination of a time-split experiment followed by a user-split experiment to leverage the strengths of both approaches. This strategy allows them to accurately gauge marketplace-level effects without interference concerns through the time-split, while also assessing user-level, long-term impacts via the user-split. However, this approach is costly to implement and can delay decision making by several weeks.
On the other hand, region-split or geo experiments apply a treatment across an entire region or region-time bucket, effectively eliminating interference bias since significant interference across different regions is unlikely. Additionally, they don’t impact user experience. However, region-split experiments often suffer from low statistical power due to smaller effective sample sizes, which limits their large-scale adoption.
Another method to obtain unbiased treatment effect estimates despite interference is by modeling interference. Interference can be a challenge in two types of marketplaces: choice-based (e.g., Airbnb and Amazon) and match-based (e.g., Lyft and Doordash). In choice-based marketplaces, customers select from multiple options, making it more complex to model the interference. In contrast, match-based marketplaces assign customers to a single option, which simplifies the modeling of interference. At Lyft, we use a Marketplace Marginal Values (MMV) approach for modeling interference. You can find the theoretical details of this approach in Bright et al. (2024). Essentially, MMV represents the change in the gain (which can be whatever you are optimizing for, e.g., more profit or rides) as a result of changing the resource (additional supply/demand) by one unit. This concept is commonly known as the shadow price in the operations research literature.
Why MMV?
In the paper, the authors present technical proofs demonstrating how marginal values can help significantly reduce the estimator bias of the treatment effect. Essentially, the primary source of interference bias as previously mentioned, is the competition for limited resources. Marginal values effectively capture this resource contention. Consider the following situations:
As illustrated in Figure 2, when supply is abundant, the marginal value of having rider R1 matches its face value, which is $6. However, in a low supply scenario where resources are limited, the resource is allocated to rider R2. Consequently, both the marginal and face values for R1 become zero. For rider R2, the face value is $10, but its marginal value is only the additional $4 gained by having rider R2. This demonstrates how the marginal value inherently accounts for resource contention. By aggregating the marginal values across both the treatment and control groups and calculating the difference, one can derive an unbiased estimator of the average treatment effect.
How to compute MMVs?
As previously mentioned, shadow prices in the dispatch optimization problem can be used to obtain the MMVs. The primal dispatch problem can be described as follows:

Where xᵢⱼ is a variable that takes the value of 1 if the driver j got matched to that rider i, and 0 otherwise. πᵢⱼ represents the score (e.g., profit) of matching driver j to rider i. The first constraint ensures that a driver is matched with at most one ride per a matching cycle (more on this later), and the second constraint indicates that a rider can have at most one driver. Solving this optimization gives the optimal matching of drivers to riders. We can relax the last constraint into xᵢⱼ ≥ 0, and obtain a linear relaxation of the above problem for which we can compute the dual as:

The dual variable μⱼ is associated to the driver constraint (first primal constraint), and λᵢ is associated with the rider constraint. This means that for each driver j, there is an associated dual variable μⱼ (same is true for riders). More on duality can be found here. To find the MMV values, we aim at generating a matching cycle dispatch graph, solve it, and then efficiently compute the incremental values via the duals. Consider the objective function, denoted as Π(d,s), where d and s represent the demand and supply respectively. Assume that the treatment effect results in increasing the demand by e. Then the global effect of such treatment can be presented as follows:

Now to estimate Δ, we can do a rider-split 50/50 A/B test where each group serves half the demand. Consider the demand in each group being presented by dₑ. We then have

The global average treatment effect can then be estimated as:

where λ* is the optimal rider dual or shadow price. Here, the analysis provides the first order Taylor approximation results — for more details see Proposition 5 of Bright et al. (2024). We can observe that the difference in the objective function outcomes across treatment and control groups can be presented by the shadow price. In the paper, the authors further did a simulation and showed this shadow price estimator will hamper the overestimation of the default estimates from standard A/B tests while lowering the noise level (refer to Figure 8 in Bright et al. (2024)).
Matching Cycle
Next, we need to decide how often to solve these optimization problems, essentially determining the length of the matching cycle. If the matching cycle is too short, contention can occur between cycles. For instance, a driver who isn’t matched in Cycle 1 might be available in the next cycle, or a rider choosing the wait-and-save option might wait several cycles before being matched. At Lyft, we use a 1-hour mega cycle to solve the dispatch optimization problem for all eligible riders and drivers within that period. This cycle length helps significantly reduce concerns about contention between cycles.
Secondary Metrics
Finally, if we want to assess the MMV-corrected impact of a treatment on metrics beyond those defined by the dispatch objective function, we can compute the edge or ride cost for each completed ride (e.g., νᵢⱼ). Considering a linear relaxation of the primal problem and applying complementary slackness, we have:

Assuming non-degeneracy, we can then solve the above system of equations to find the optimal dual values and use them to estimate the average treatment effect same as before.
Productionalizing MMV in an Experimentation Platform
To implement MMV in an experimentation platform, we solve the matching optimization problem for passenger and driver duals on an hourly basis, as previously described, and store these values in a table. This data is then used to calculate MMV-corrected values for drivers and riders. These metrics are included in experiment reports alongside other metrics, with standard computations like CUPED applied to them. Below is an example of how an MMV-corrected metric might appear in a driver randomized experiment. Here, riders are the congested resource contributing to the interference bias, and over-estimation of the results as discussed earlier. The MMV correction would hamper the effect by accounting for the contention over the limited resource (in this case pool of riders).
It’s important to note that there are limitations to the use cases for MMV. For instance, MMV cannot be applied in situations where the target population for randomization is not drivers or passengers. An example of this would be mapping experiments where the route is associated with the ride itself, rather than being specific to drivers or passengers, making MMV-corrected metrics unsuitable.
MMVs in Practice
Other experimental designs like time-split, region-split, or a combination of time- and user-splits often fall short in addressing the majority of experimentation needs. They tend to lack sufficient power, are costly to implement, and can take several weeks to execute. In contrast, the MMV approach can adjust the effect sizes in user-split experiments where interference is a concern. This is particularly important in cases with significant network effects, as the change in effect magnitude can be substantial, potentially altering launch decisions under MMV correction.
At Lyft, we’ve had instances where both time- and user-split experiments were conducted for the same initiative to capture both market-level and long-term effects. We compared the MMV-corrected user-split outcomes with the time-split outcomes in three historical cases where a time-split counterpart was available. After applying MMV correction, we observed greater alignment with the time-split results.
Additionally, a comprehensive backtest across various user-split experiments was conducted, comparing MMV-corrected completed rides with traditional metrics. In 10% of these comparisons, the launch decision could change when using MMV-corrected values. These cases were evenly split between false positives (launching based on traditional values when MMV-corrected values didn’t meet launch criteria) and false negatives.
Moreover, when MMV results show a lower magnitude compared to naive user-split results, particularly in resource-constrained experiments, we anticipate an average 45% reduction in outcome magnitude based on this numerical study. This decrease occurs because the contribution of each ride is divided between the rider and the driver when calculating marginal values, thus correcting for the overestimation of effects due to interference bias.
Acknowledgements
We would like to thank Anahita Hassanzadeh and Thu Le for helpful discussions and suggestions.
Lyft is hiring! If you’re passionate about experimentation and measurement, visit Lyft Careers to see our openings.