Measuring the Memory Impact for Hybrid Apps

Memory problems are always challenging to detect and fix for mobile applications, particularly on Android, due to many hardware profiles, OS versions, and OEM skins. With proper memory reporting and analysis, most issues are caught during the development lifecycle. Yet if your application is delivering an entire platform, such as the Salesforce Mobile App, the high degree of customization can produce unanticipated spikes in memory usage. As a hybrid app with integrated web and native layers, bearing these memory spikes across the whole stack can lead to unpredictable behavior!

Thankfully, there are several tools available for testing across an array of devices with automation, such as Firebase Test Lab and Saucelabs. A well-designed app recovers gracefully when the operating system triggers a memory warning or frees up resources. In the best-case scenario, memory issues manifest as slightly sluggish behavior, and in the worst case, they result in an application crash. iOS memory issues have been famously categorized in a blog article from Facebook Engineering as Low Memory Warnings (LMW), Foreground Out of Memory (FOOM) crashes, or Background Out of Memory (BOOM) crashes. But what happens when your hybrid app suffers from memory problems without any obvious symptoms? How does one begin to measure them?

Earlier this year, one of our customers reported a strange issue. Many of their users were greeted with an Internal Server Error when accessing a record detail view. After a thorough investigation, we narrowed it down to the offending change list. It was clear that the datasets for these users were so large and complex, they were breaking our background syncing engine.

But what caught us by surprise was that none of our existing instrumentation measured this type of impact. After all, we log all low memory events, and we certainly log crashes, but within the noise of the otherwise benign memory warnings, a critical error was lurking.

As part of our Root Cause Analysis, we dove deeply into how memory events are reported and, more importantly, what types of impacts they have. Numbers such as Available Memory and Total Memory can make for a more straightforward quantitative analysis. Still, they can also lead us to believe in a false dichotomy between “app is working as expected” and “app has crashed”. The reality is that the impacts of memory usage are more nuanced, especially when dealing with the single-threaded nature of Javascript executing in an Android WebView.

We decided to take a fresh approach towards the qualitative analysis of our memory performance. Our analysis included stress testing our frameworks and experimenting with the thread scheduler while carefully examining the OS memory callbacks and stack traces for our entire stack. Ultimately, we were able to break down our hybrid memory issues into four distinct categories, not separated by numbers, but by impact:

1. Low Memory Warnings

Low memory warnings are directly provided by the operating system to the application when system resources are low. They are easy to capture using the onTrimMemory() callback contained in all application components such as Activity, Fragment, etc. There are several levels of severity that are triggered which can be quite noisy, so be judicious in your reporting. For instance, our app takes action only for TRIM_MEMORY_RUNNING_CRITICAL or TRIM_MEMORY_COMPLETE, which are called when the system is likely to clean up the process in the background.

Typically, the impact of these events is limited and can be mitigated by proactively cleaning up resources upon the onTrimMemory() callback or practicing good state management using instance state bundles. In the case of hybrid applications, we also recommend passing the event to the web application for additional resource cleanup.

2. OutOfMemoryError on the Javascript thread

An OutOfMemoryError occurs when the JVM is unable to allocate enough heap for an object and the Garbage Collector cannot free up enough space. The impact of an OOM error on a Javascript thread is the most difficult to detect and measure within a hybrid app, since the root cause could be within the native code, but it manifests on the web application side.

It was this type of impact that led to our customer reports of unexpected server error messages. To mitigate the impact, avoid executing native code synchronously by using Coroutines. Proper error handling with an appropriate message to the user also lessens the impact. Given the unpredictability of OOM errors and the fact that the web application’s Javascript executes on a single thread, it is always better to place all mitigations on the native side.

Note that this type of impact is not limited to OutOfMemoryError but could also apply to any exception that was thrown on the Javascript thread.

3. WebView render process was terminated

When a hybrid application’s memory is pushed to the limit, the WebView’s render process becomes highly vulnerable to be killed by the system. The good news is that it can be detected by the onRenderProcessGone() callback in the attached WebViewClient. The bad news is that there is no way to recover from it outside of reconstructing the view altogether and reloading the URL. What’s worse is that the impact leaves the user on a white screen, which (to the dismay of many hybrid app developers) is also the end result of any number of other issues.

While this is the least common type of low memory impact for hybrid apps, it can be the most frustrating to diagnose and recover from, therefore it is almost imperative to properly report with detailed analytics.

4. Application Crash

Application crashes due to an OutOfMemoryError have the most obvious impact on the user, but it can be difficult to pin down their causes. The Android Vitals section of the Play Store does a good job of aggregating the occurrences and frequency of these issues. Even third-party analytics providers usually report similar metrics. However, due to the nature of memory allocation in a multi-threaded environment, the stack traces may vary wildly, making it nearly impossible to find the root cause from this data alone.

To make sense of these types of crashes, try to provide as much context within the crash report as possible to uncover patterns. For example, AppCenter allows you to determine if a crash occurred in the previous session. You can report the crash event along with along with device logs and additional metadata such as OS version, WebView version, and feature activations.

After finally putting faces on these impacts, we could work backward to uncover the causes, add fixes or mitigations, and define metrics to ensure they are caught earlier. In the case of our mysterious memory muck-up, we found that the root cause was the poor usage of a new hybrid framework, which was duplicating large strings on the Javascript execution thread and throwing OOM exceptions. After a small refactor to optimize our string handling, we were able to test the fix and release with confidence. We also updated our dashboards with charts to measure the new memory impacts and alerts to let us know ASAP the next time something goes awry.

The two biggest takeaways we found from this experience were that measuring the impacts to user experience is just as important as measuring performance numbers and that we should never be afraid to deeply examine each layer of a tech stack, especially complicated ones like those found within a hybrid architecture.