Detecting Android memory leaks in production

Monitoring mobile performance and resource consumption at Lyft

Pavlo Stavytskyi
Lyft Engineering

--

Photo by Nathan Langer on Unsplash

Android developers have a number of tools in their arsenal for detecting memory leaks such as Android Studio Memory Profiler, LeakCanary, Perfetto, etc. These tools are useful when analyzing app builds locally. However, in production the app is running on a wide range of devices in different circumstances and it is hard to predict all of the edge cases when profiling the build locally.

At Lyft, we were curious about how our apps behave in production on users’ devices. Thus, we’ve decided to bring observability to various runtime performance metrics and see how it could improve the user experience.

We’ve already published a blog post about CPU usage monitoring and in this story, we will focus on the memory footprint of mobile apps. While the overall concept of monitoring the memory footprint is applicable to both Android and iOS platforms, we will focus on the former for implementation details.

Lyft relies on A/B testing when rolling out new features. When a feature is ready for production, it’s covered by a feature flag and is launched as part of an experiment. This experiment is run for a certain group of users in order to compare metrics against the standard version of the app.

When a large and complex feature is released, it is important to make sure it does not bring any regressions in terms of memory usage. This is especially important if the feature includes native C/C++ code which has a higher chance of introducing memory leaks.

Therefore, we wanted to test the following hypothesis. For each feature experiment, we measure its memory footprint across all users that have access to it (by reporting metrics to analytics at runtime). Then, we compare it to the standard version of the app. If the variant shows larger memory usage values, this is an indicator of a regression or memory leak.

Memory footprint metrics

First, we needed to identify the availability of memory metrics on Android, which are not as trivial to collect as one might think. We are interested in the memory usage by the application process or in other words the app’s memory footprint.

Android provides various APIs for retrieving memory usage metrics for apps. However, the hardest part is not retrieving the metrics but making sure they are suitable and provide meaningful data.

Since Android Studio has a built-in memory profiler, we decided to use it as a reference point. If we get the same value as the memory profiler, our data is correct.

One of the primary metrics the Android Studio memory profiler shows is called PSS.

PSS

Proportional set size (PSS) — the amount of private and shared memory used by the app where the amount of shared memory is proportional to the number of processes it is shared with.

For example, if 3 processes are sharing 3MB, each process gets 1MB in PSS.

Android exposes a Debug API for this data.

Getting MemoryInfo that includes PSS programmatically

Debug.MemoryInfo includes PSS as well as the number of components it consists of. The easiest way to view the summary of the data it holds is by calling the getMemoryStats function. It yields key-value pairs with the metrics as shown in the example below.

MemoryInfo summary example

These are the numbers Android Studio normally displays in the Memory Profiler.

In order to get the PSS value without any additional info, it is possible to use the call below.

Getting PSS programmatically

USS

Unique set size (USS) — the amount of private memory used by the app excluding shared memory.

Its value can be derived from the PSS metrics we’ve seen above. In order to get it programmatically, we can again use Debug.MemoryInfo.

Getting USS programmatically

What’s wrong with PSS and USS?

PSS and USS are useful but calling Debug.getMemoryInfo and Debug.getPss can take at least 200ms and 100ms respectively.

We need to report memory metrics regularly, so it’s not the best idea to use time-consuming APIs for this.

There is a faster Android API for PSS, but there is a catch. It has a hard sample rate limit of 5 minutes. This means if called more frequently, it returns a cached value from a previous call. This won’t serve our use case well.

Getting PSS using API with a limited sample rate of 5 minutes

However, there is a similar metric called RSS.

RSS

Resident set size (RSS) — the amount of private and shared memory used by the app where all shared memory is included.

For example, if three processes are sharing 3MB, each process gets 3MB in RSS.

This means that RSS values are pessimistic as they show more memory than the application has actually used. This metric is much faster to retrieve compared to PSS. Moreover, it is fine to use RSS values because we are more interested in comparing variants of the A/B experiment so that we can sacrifice precision to some extent here as a tradeoff.

Overall, if comparing set size metrics at any given point in time, USS would always show the smallest value while RSS is the largest: RSS > PSS > USS.

In order to get RSS programmatically, we need to refer to a system file. This file is located at /proc/[pid]/statm, where [pid] is the ID of an application process. android.os.Process.myPid() can be used to get [pid] programmatically.

Reading this file yields something like this:

3693120 27503 18904 1 0 319129 0

Official Linux documentation helps to shed light on the meaning of these numbers. We are only interested in the second value.

  • (2) resident — resident set size, represented in pages.

The default page size on Linux is 4 kB. Therefore, in order to calculate the RSS metric we use the simple formula below.

rssKb = resident * 4

We ended up using RSS as a primary metric for identifying the app’s memory footprint.

However, that is not all. Memory, private to the app process, includes many components and one of them is the heap.

In order to increase the precision of measurements, we can additionally report the amount of memory allocated by the app in the JVM and native heaps. This might come in handy when trying to narrow down a regression detected with the RSS metric.

JVM heap

The first metric will help us to identify the amount of memory allocated by the app in the JVM heap.

Getting the amount of memory allocated in the JVM heap programmatically

Native heap

The second metric shows the same but in the native heap. It is especially useful when the app includes custom native C/C++ libraries.

Getting the amount of memory allocated in the native heap programmatically

Detecting memory leaks

Now that we have identified memory usage metrics, let’s see how to use them to catch regressions.

First, we need to decide when and how often should we report memory metrics to the analytics. We have picked two scenarios:

  • Report a snapshot with metrics every time a UI screen closes.
  • Report a snapshot with metrics periodically if a user stays on a single UI screen for a long time. Normally, we use 1-minute intervals but this can be configured remotely.

Now, let’s see how to interpret the data.

A/B experiments allow us to compare reported values between two app variants:

  • treatment — the group of users that use the app with the new feature enabled.
  • control — the group of users that use the app in a normal state with the new feature disabled.

Example 1 — no regressions

First, as a baseline, we will take a look at a feature which has not introduced any regressions in terms of memory footprint.

This is an example of an experiment that adds a feature to the Lyft Android app for passengers.

Example 1 — an experiment that did not introduce any regressions

As we can see above, both control (green line) and treatment (orange line) variants are the same. This means that the feature has not introduced any regressions and is safe to roll out to production from a memory usage perspective.

Example 2 — regression

Now let’s take a look at a feature that introduced a regression.

This is an example of an experiment that adds a feature to the Lyft Android app for drivers.

Example 2 — an experiment that introduced a memory usage regression

The new feature here has clearly increased the app’s memory footprint at each percentile. This is an indicator of a regression that allowed us to identify a memory leak.

This graph is based on the RSS metric. In order to narrow down the root cause of the issue similar graphs with JVM and native heap allocations were used.

Example 3 — memory leak at 99th percentile

The final example is the most important as it demonstrates the biggest advantage of this memory monitoring approach.

In the graph below both variants show almost the same values, except for a noticeable difference around the 99th percentile.

Example 3 — an experiment that introduced a memory leak at the 99th percentile

The memory footprint of the treatment variant has significantly increased at the 99th percentile.

This led to identifying a memory leak that occurred in very specific circumstances for a small number of users. However, when it occurred it significantly increased memory usage.

In the case of the second example, it is more likely to detect a memory leak that affects all usages of the app when profiling locally. However, it becomes a much harder task with the third example as the memory leak was related to an edge case which is easy to miss locally.

Learnings

One of the challenges with implementing such a memory monitoring tool is to pick the right metrics that show valid data. On the other hand, it is important that collecting those metrics is not time- or resource-consuming.

Another task is to visualize the data in an effective report. Comparing the average values between control and treatment variants on a daily basis does not produce meaningful data. A better approach is to use a percentile distribution from the very beginning of the experiment as shown in the examples above.

Overall, the longer the experiment runs the better data is. It requires a few days to start getting meaningful data after an experiment launch. There is also a dependency on how many users are exposed to the experiment.

This approach is especially useful when a memory leak appears around the 99th percentile. This means a memory leak happens due to a specific edge case that would be much harder to detect with local profiling.

Using tools for local memory leak detection is an important task to prevent promoting regressions to users. Reporting performance metrics at runtime shines a light on issues that would be much harder to detect otherwise.

Resources

If you’re interested in working on tooling and performance at Lyft then take a look at our careers page.

--

--

Google Developer Expert for Android, Kotlin | Sr. Staff Software Engineer at Turo | Public Speaker | Tech Writer