Measuring mobile apps performance in production

In the illustration above we see 6 frames: 3 frames are good and 3 frames have freezes. That means that 3 good frames were rendered within 16ms and 3 other frames have freezes of different durations. We calculate freeze duration as a difference between the actual frame duration and 16ms of the target frame duration. To calculate total freeze time we need to summarize the durations of all freezes that happen on the screen.

Freeze Time can be the same with different patterns: 1 freeze with 1000ms, 100 freezes with 10ms. Also, freeze time can increase without any additional change just by increased session duration (e.g. when every item of the scrollable list generates some slow frames and the user starts to scroll more it leads to a higher total freeze time).

To catch such situations we are also using 2 additional metrics:

Freeze Count: The total count of slow frames (slower than 16ms) during a screen session. To see if the pattern of freezes changes.
Session Duration: The screen session duration. To see if session duration changed which might cause freeze time changes.

Rationale for choosing the Freeze Time metric

Both Google and Apple offer metrics for assessing rendering performance. Initially, we adopted a method implemented by Firebase for our rendering performance monitoring, which involved tracking slow frames (>16ms to render) and frozen frames (>700ms). However, we discovered that these metrics did not adequately capture rendering performance degradation.

For instance, consider a scenario where views in a list are already slow, requiring 20ms for rendering. If the rendering time increases to 300ms, the metrics would still report one slow frame per view without any frozen frames, failing to indicate a significant deterioration in rendering time.

Moreover, there is an inconsistency in how performance changes are reflected. A view’s rendering time increasing from 15ms to 20ms is recorded as the same metric change as an increase from 15ms to 300ms, which does not accurately represent the severity of the slowdown.

Apple’s “Hang rate” metric, which is calculated as seconds of hang time per hour, appeared to be more in line with what we needed. It resembles our Freeze Time metric but is normalized by dividing the total freeze time by the session duration. However, this normalization caused the metric to become overly sensitive to changes in user behavior.

For instance, if a product feature causes users to spend more time scrolling through a slow list, the Hang rate may show an improvement because the session duration increased, even though the user experience has degraded due to more freezes.

After encountering various scenarios where the relative metric did not provide a clear picture of performance, we decided to use an absolute metric instead. This allows us to measure rendering performance more accurately, not just for the entire application but for each screen session, without the results being skewed by user behavior or session length.

The absolute metric has certain limitations too. For instance, we can take the same example: if a product feature results in users scrolling through a slow list more frequently, the rendering metric would worsen even though there hasn’t been a technical decline in performance. However, incorporating a supplementary metric Session Duration allows us to manage these situations effectively.

The main idea behind this is that we consider any increase in Freeze Time as a negative performance change, regardless of the reason (though ideally, the user shouldn’t see any freezes at all). Of course, it is important to react to new performance issues caused by a new feature, but it is also important to detect the old screen producing more freezes because users start to interact with it more actively.

PerformanceSuite logo: a mix of a turtle and a swift

Wrapping up all the theoretical knowledge and clear metric definitions we finally can implement the working solution for collecting these metrics. We’ve recently open-sourced our performance tracking libraries for both platforms, which you can find on GitHub:

These light-weight libraries help to gather aforementioned metrics and allows you to report them to any analytics system. We are actively working on further improvements and new features (e.g. terminations monitoring), however, it is already used in the main Booking app on both platforms. The iOS library is also used in our app Pulse for the property owners and in the Agoda app from our sister company.

Feel free to try it out, leave feedback, or even better, contribute!

and