Maximizing OkHttp connection reuse

Diego Gómez Olvera
Booking.com Engineering
6 min readJun 3, 2020

--

Debugging a 3rd party library

Introduction

At Booking.com we know that performance is important for our users, and this includes networking. Recently we investigated the performance of our Android app’s networking stack and found some areas to improve our performance and the app itself for all users. We want to share some tips on how to optimize OkHttp connection reuse, and also the process of debugging a 3rd party library.

In Booking.com we use OkHttp, an HTTP client library for Java/JVM clients that’s efficient by default, easy to test and allows defining common behaviours across network requests composing Interceptor types.

Problem investigation

Performance bottleneck

We want to know the total time from the moment that we are ready to make a network request until the moment that we have results ready to use (including the request preparation, execution, response handling and parsing). Looking at how long each stage takes to complete will highlight which areas can be improved. We used a small logger utility to avoid profiling tools impact on runtime (Android Benchmark is better suited for code which is executed very often) and we saw that the most noticeable issue by far was the network request execution, especially the latency (see different executions using Stetho):

Network: 1.51s: Latency 1.32 s - Download 198 ms
Network: 1.43s: Latency 1.26 s - Download 197 ms
Network: 1.24s: Latency 1.16 s - Download 76 ms

We noticed a disparity between the times observed in the client and backend wall-clock, so there might be something that we can do on the client to close that gap.

Looking into OkHttp request execution

OkHttp has an extension called Logging Interceptor, a mechanism which hooks into a request execution with callbacks and logs information about the request execution. After checking the documentation, let’s see the log using HttpLoggingInterceptor:

1582304879.418 D/OkHttp: <-- 200 OK https://iphone-xml.booking.com/... (1066ms)
1582304879.418 D/OkHttp: Server: nginx
1582304879.418 D/OkHttp: Date: Fri, 21 Feb 2020 17:07:55 GMT
1582304879.418 D/OkHttp: Content-Type: application/json; charset=utf-8
1582304879.419 D/OkHttp: Transfer-Encoding: chunked
1582304879.419 D/OkHttp: Vary: Accept-Encoding
1582304879.419 D/OkHttp: X-XSS-Protection: 1; mode=block
1582304879.440 D/OkHttp: {"review_recommendation":"", ... 66}
1582304879.446 D/OkHttp: <-- END HTTP (107725-byte body)

And using LoggingEventListener:

[0 ms] callStart: Request{method=GET, url=https://iphone-xml.booking.com/json/mobile.searchResults?reas...[16 ms] connectionAcquired: Connection{iphone-xml.booking.com:443, proxy=DIRECT hostAddress=iphone-xml.booking.com/185.28.222.15:443 cipherSuite=TLS_AES_128_GCM_SHA256 protocol=http/1.1}
[17 ms] requestHeadersStart
[18 ms] requestHeadersEnd
[18 ms] responseHeadersStart
[155 ms] secureConnectEnd: Handshake{tlsVersion=TLS_1_3 cipherSuite=TLS_AES_128_GCM_SHA256 peerCertificates=[CN=secure-iphone-xml.booking.com, O=Booking.com BV, L=Amsterdam, C=NL, CN=DigiCert SHA2 Secure Server CA, O=DigiCert Inc, C=US] localCertificates=[]}
[155 ms] connectEnd: http/1.1
[156 ms] connectionAcquired: Connection{secure-iphone-xml.booking.com:443, proxy=DIRECT hostAddress=secure-iphone-xml.booking.com/185.28.222.26:443 cipherSuite=TLS_AES_128_GCM_SHA256 protocol=http/1.1}[158 ms] requestHeadersStart
[158 ms] requestHeadersEnd...
[35 ms] secureConnectEnd: Handshake{tlsVersion=TLS_1_2 cipherSuite=TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256 peerCertificates=[CN=*.booking.com, O=Booking.com BV, L=Amsterdam, C=NL, CN=DigiCert ECC Secure Server CA, O=DigiCert Inc, C=US] localCertificates=[]}...

LoggingEventListener gives us some interesting information: It seems that the application is configuring the connection repeatedly, using different versions of TLS. OkHttp aims to reduce the number of socket connections by reusing them across HTTP requests; but because this is not always happening, there is a potential performance improvement. Unfortunately, while looking at the code to reuse connections we can see that there is no specific callback when a new RealConnection is created.

Debugging OkHttp

While not having a callback for a particular event makes tracking it more complex, it’s still possible given that the dependency ships with sources (and if not, IntelliJ includes a Java decompiler), meaning that we have full access to the whole code execution stack including all variables and properties.

We can attach the debugger to the point where the connection is created, which lets us see the current state of the method call.

New RealConnection being created

For instance, we can check the parameter Route

Route properties

And as we want to focus on our mobile applications endpoints, we can adjust the debugger breakpoint with a condition:

Stop at breakpoint when a condition is met

We know that a socket connection to a host could be reused (but isn’t), so we can go to the method which verifies the condition to reuse RealConnection:

Instance where RealConnection is created

We can validate that transmitterAcquirePooledConnection is what makes the condition fail with the debugger:

Validating assumptions with Evaluate Expression… dialog

Looking inside the method, here’s what it looks like:

boolean transmitterAcquirePooledConnection(Address address, Transmitter transmitter,
@Nullable List<Route> routes, boolean requireMultiplexed) {
assert (Thread.holdsLock(this));
for (RealConnection connection : connections) {
if (requireMultiplexed && !connection.isMultiplexed()) continue;
if (!connection.isEligible(address, routes)) continue;
transmitter.acquireConnectionNoEvents(connection);
return true;
}
return false;
}

Either RealConnection supports Multiplex (HTTP/2, currently not supported) or isEligible is false. Looking at isEligible we see:

boolean isEligible(Address address, @Nullable List<Route> routes) {
// If this connection is not accepting new exchanges, we're done.
if (transmitters.size() >= allocationLimit || noNewExchanges) return false;

// If the non-host fields of the address don't overlap, we're done.
if (!Internal.instance.equalsNonHost(this.route.address(), address)) return false;

// If the host exactly matches, we're done: this connection can carry the address.
if (address.url().host().equals(this.route().address().url().host())) {
return true; // This connection is a perfect match.
}

// At this point we don't have a hostname match. But we still be able to carry the request if
// our connection coalescing requirements are met. See also:
// https://hpbn.co/optimizing-application-delivery/#eliminate-domain-sharding
// https://daniel.haxx.se/blog/2016/08/18/http2-connection-coalescing/

// 1. This connection must be HTTP/2.
if (http2Connection == null) return false;
...The condition for connection reuse before HTTP/2 seems clear: allocationLimit is always 1, so in order to reuse the connection the endpoint Address (except the host) and host() must be the same. Is Address not matching any of the existing ones? Let’s find out why.

We can look at the pool of existing RealConnection

Active connections in connection pool

And compare the Address of one with the same host to find the root cause of the problem.

Finding RealConnection where equality base on Address fails

Here’s what the comparison method looks like:

boolean equalsNonHost(Address that) {
return this.dns.equals(that.dns)
...
&& Objects.equals(this.sslSocketFactory, that.sslSocketFactory)
&& Objects.equals(this.hostnameVerifier, that.hostnameVerifier)
&& Objects.equals(this.certificatePinner, that.certificatePinner)
&& this.url().port() == that.url().port();
}

Using the debugger it’s possible to see that all properties are equals, except sslSocketFactory:

Same type, different instance

We see that we have a custom SSLSocketFactory type, which is not reused and does not implement equals, impeding effective connection reuse. The Java debugger allows us not only to inspect everything but also to call properties and methods of entities in the current scope.

Problem resolution

Two measures were taken in order to maximize connection reuse that you should also consider if you customize OkHttp:

  • Implement equals in custom SSLSocketFactory types
  • Reuse SSLSocketFactory type instances

If you simply need to use an external Security Provider, just use the OkHttp suggested approach:

Security.insertProviderAt(Conscrypt.newProvider(), 1)

These changes were done in an experiment which shows nice results reducing the Time To Interactive when a network request is required: results vary from a few milliseconds gain up to 20% improvement, depending on the number of connections required. Once the underlying connection reuse between requests is optimized, we can perform further optimizations like fine tuning a custom ConnectionPool to improve network requests execution times even more.

If you made it this far, I hope that you learned something new about OkHttp network stack and the possibilities of debugging with a 3rd party library!

--

--