1
问题描述
公司有的项目使用 keycloak 作为统一身份认证、权限控制的方法,后端使用的是 springboot,所以一般是使用 springboot + 集成 keycloak 作为统一登录的方案,具体搭建流程可以参考官方文档。此前一直没有遇到问题,直到某天客户反馈说页面突然打不开,但是过了一阵子就好了,有时候没来得及定位问题就恢复了。
问题出现的前端展示情况
有一次持续了几分钟,而且其他项目(使用 springboot 集成 keycloak) 都有概率出现这种问题,这种情况分析一般原因是因为接口比较耗时,所以进入容器 jstack 后打印出了当前的堆栈信息,堆栈信息比较长,我只粘贴了出问题的部分:
"http-nio-8081-exec-9" #61 daemon prio=5 os_prio=0 tid=0x00007efc702c1000 nid=0x4c waiting for monitor entry [0x00007efc778f6000]
java.lang.Thread.State: BLOCKED (on object monitor)
at org.keycloak.adapters.rotation.JWKPublicKeyLocator.getPublicKey(JWKPublicKeyLocator.java:60)
- waiting to lock <0x00000003f1888968> (a org.keycloak.adapters.rotation.JWKPublicKeyLocator)
at org.keycloak.adapters.rotation.AdapterTokenVerifier.getPublicKey(AdapterTokenVerifier.java:121)
at org.keycloak.adapters.rotation.AdapterTokenVerifier.createVerifier(AdapterTokenVerifier.java:111)
at org.keycloak.adapters.rotation.AdapterTokenVerifier.verifyToken(AdapterTokenVerifier.java:47)
at org.keycloak.adapters.BearerTokenRequestAuthenticator.authenticateToken(BearerTokenRequestAuthenticator.java:103)
at org.keycloak.adapters.BearerTokenRequestAuthenticator.authenticate(BearerTokenRequestAuthenticator.java:88)
at org.keycloak.adapters.RequestAuthenticator.authenticate(RequestAuthenticator.java:67)
at org.keycloak.adapters.springsecurity.filter.KeycloakAuthenticationProcessingFilter.attemptAuthentication(KeycloakAuthenticationProcessingFilter.java:154)
at org.springframework.security.web.authentication.AbstractAuthenticationProcessingFilter.doFilter(AbstractAuthenticationProcessingFilter.java:212)
at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:334)
at org.keycloak.adapters.springsecurity.filter.KeycloakPreAuthActionsFilter.doFilter(KeycloakPreAuthActionsFilter.java:96)
at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:334)
at org.springframework.security.web.header.HeaderWriterFilter.doHeadersAfter(HeaderWriterFilter.java:92)
at org.springframework.security.web.header.HeaderWriterFilter.doFilterInternal(HeaderWriterFilter.java:77)
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:119)
at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:334)
....
这应该是后端请求 keycloak 获取 publicKey 出现了阻塞。
2
问题原因分析
初步分析是调用 keycloak 出现了问题,进入 JWKPublicKeyLocator 的 60 行,代码如下:
// Check if we are allowed to send request
synchronized (this) {
currentTime = Time.currentTime();
if (currentTime > lastRequestTime + minTimeBetweenRequests) {
sendRequest(deployment);
lastRequestTime = currentTime;
} else {
log.debug("Won't send request to realm jwks url. Last request time was " + lastRequestTime);
}
return lookupCachedKey(publicKeyCacheTtl, currentTime, kid);
}
因为当前 JWKPublicKeyLocator 是单例,同一个进程所有线程公用这个实例,所以当一个线程程序无法退出时,其他线程执行到 synchronized 只能阻塞,继续去看 sendRequest 函数,该函数会调用 keycloak 接口获取 publicKey,协议是 http。
抛出问题:
调用获取 publicKey 接口为啥会出现长时间未返回?
是否有设置超时时间,当接口超出时间未返回时快速失败?
我们后端使用 springboot 自动配置 keycloak,配置文件主要有三个参数:
keycloak.realm=realmId
keycloak.resource=clientId
keycloak.auth-server-url=http://127.0.0.1:8180/auth
这几个参数是 KeycloakSpringBootProperties 配置类自动注入的,其中 keycloak.auth-server-url 配置的就是 keycloak 调用的 baseurl,客户环境该参数是域名形式,不是 ip+端口格式,所以调用时会走域名解析,负载均衡等过程。
如果客户的外网环境很差,出现网络抖动等问题,通过这种方式调用还是可能会出现数据长时间未返回的情况。通过代码分析,这里的调用设置的超时时间等参数用的是默认值,可以查看 org.apache.http.client.config.RequestConfig 类,默认是使用 public static final RequestConfig DEFAULT = (new RequestConfig.Builder()).build();
构建默认值,Builder 里面的默认值,有关超时的三个参数:
private int connectionRequestTimeout = -1;
private int connectTimeout = -1;
private int socketTimeout = -1;
-1 表示不超时,所以我们的接口默认是不会超时的,当一个请求阻塞住没法释放锁,其它请求都没办法响应,只能等待锁释放。
3
问题解决方案
总结一下避免此类问题的办法:
调用外部接口时,必须设置超时时间,避免由于一次调用超时导致整个服务的不可用;
如果 keycloak 部署在同一个局域网环境中,配置的 keycloak 的地址参数可以使用内网 ip 参数,不使用域名或者外网,这样不会出现由于网络问题导致的接口长时间不返回。
3.1 设置超时时间
先分析 keycloak jdk 中调用部的源码。
org.keycloak.adapters.rotation.JWKPublicKeyLocator#sendRequest 如下:
private void sendRequest(KeycloakDeployment deployment) {
if (log.isTraceEnabled()) {
log.trace("Going to send request to retrieve new set of realm public keys for client " + deployment.getResourceName());
}
HttpGet getMethod = new HttpGet(deployment.getJwksUrl());
try {
JSONWebKeySet jwks = HttpAdapterUtils.sendJsonHttpRequest(deployment, getMethod, JSONWebKeySet.class);
Map<String, PublicKey> publicKeys = JWKSUtils.getKeysForUse(jwks, JWK.Use.SIG);
if (log.isDebugEnabled()) {
log.debug("Realm public keys successfully retrieved for client " + deployment.getResourceName() + ". New kids: " + publicKeys.keySet().toString());
}
// Update current keys
currentKeys.clear();
currentKeys.putAll(publicKeys);
} catch (HttpClientAdapterException e) {
log.error("Error when sending request to retrieve realm keys", e);
}
}
org.keycloak.adapters.HttpAdapterUtils#sendJsonHttpRequest 如下:
public static <T> T sendJsonHttpRequest(KeycloakDeployment deployment, HttpRequestBase httpRequest, Class<T> clazz) throws HttpClientAdapterException {
try {
HttpResponse response = deployment.getClient().execute(httpRequest);
int status = response.getStatusLine().getStatusCode();
if (status != 200) {
close(response);
throw new HttpClientAdapterException("Unexpected status = " + status);
}
HttpEntity entity = response.getEntity();
if (entity == null) {
throw new HttpClientAdapterException("There was no entity.");
}
InputStream is = entity.getContent();
try {
return JsonSerialization.readValue(is, clazz);
} finally {
try {
is.close();
} catch (IOException ignored) {
}
}
} catch (IOException e) {
throw new HttpClientAdapterException("IO error", e);
}
}
源码中 HttpGet 和 deployment.getClient() 这两个地方都未设置超时时间,所以在请求 keycloak 接口时,使用的默认的配置,默认配置是-1 表示不超时。这里的 deployment 对象虽然未使用 Spring 托管,但是可以通过其他托管对象获取到,而且它一旦建立就是全局唯一的,所以我们解决的思路是获取全局的 deployment 对象,然后获取其 client,然后改变其设置。
通过分析代码发现,AdapterDeploymentContext 实例是 spring 托管的,而且能通过它找到 deployment 实例。接下来就是确定怎么拦截这个请求,一般有两种方式 filter 或者 interceptor,在此场景中使用 filter 会更方便点(因为 keycloak jdk 本身就定义了很多 filter,而且支持自定义 filter),例如 jdk 中自带的 org.keycloak.adapters.springsecurity.filter.KeycloakPreAuthActionsFilter,参考这个 filter,我们自定义 filter,代码如下:
@Component
public class ChangeTimeOutFilter implements Filter {
@Resource
private AdapterDeploymentContext deploymentContext;
private volatile boolean deploymentChanged = false;
@Override
public void doFilter(ServletRequest request, ServletResponse response, FilterChain chain) throws IOException, ServletException {
HttpFacade facade = new SimpleHttpFacade((HttpServletRequest)request, (HttpServletResponse)response);
KeycloakDeployment deployment = deploymentContext.resolveDeployment(facade);
if (deployment == null) {
chain.doFilter(request, response);
return;
}
//deployment 是全局唯一,只需要修改一次
if (deploymentChanged) {
chain.doFilter(request, response);
return;
}
/**
* 设置超时时间
*/
HttpParams params = deployment.getClient().getParams();
params.setIntParameter(CoreConnectionPNames.SO_TIMEOUT, 10000);
params.setIntParameter(CoreConnectionPNames.CONNECTION_TIMEOUT, 10000);
params.setLongParameter(ClientPNames.CONN_MANAGER_TIMEOUT, 10000L);
deploymentChanged=true;
chain.doFilter(request, response);
}
为了测试超时时间是否生效所构造的错误,我们将超时时间设置非常短,例如几毫秒,然后调用一定会超时。代码部署后,错误日志会输出:
Error when sending request to retrieve realm keys
org.keycloak.adapters.HttpClientAdapterException: IO error
at org.keycloak.adapters.HttpAdapterUtils.sendJsonHttpRequest(HttpAdapterUtils.java:57)
at org.keycloak.adapters.rotation.JWKPublicKeyLocator.sendRequest(JWKPublicKeyLocator.java:99)
at org.keycloak.adapters.rotation.JWKPublicKeyLocator.getPublicKey(JWKPublicKeyLocator.java:63)
at org.keycloak.adapters.rotation.AdapterTokenVerifier.getPublicKey(AdapterTokenVerifier.java:121)
at org.keycloak.adapters.rotation.AdapterTokenVerifier.createVerifier(AdapterTokenVerifier.java:111)
at org.keycloak.adapters.rotation.AdapterTokenVerifier.verifyToken(AdapterTokenVerifier.java:47)
at org.keycloak.adapters.BearerTokenRequestAuthenticator.authenticateToken(BearerTokenRequestAuthenticator.java:103)
at org.keycloak.adapters.BearerTokenRequestAuthenticator.authenticate(BearerTokenRequestAuthenticator.java:88)
at org.keycloak.adapters.RequestAuthenticator.authenticate(RequestAuthenticator.java:67)
......
at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:171)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
上面这个报错信息,说明超时配置已经生效。
3.2 keycloak 访问地址设置为内网
原来的 keycloak 配置信息:
keycloak.realm=atlas
keycloak.resource=atlas-assistant
keycloak.auth-server-url=http://122.122.122.122:8180/auth
keycloak.ssl-required=none
keycloak.public-client=true
keycloak.use-resource-role-mappings=true
将 keycloak.auth-server-url 改为内网地址 keycloak.auth-server-url=http://172.17.0.1:8180/auth
因为考虑到前端需要通过后端返回的 keycloak 地址在浏览器进行跳转(跳转到登录页面),所以这个返回的地址必须是外网地址(内网地址前端没法请求),所以新增一个配置项,这个配置项的值配置为外网地址,只用来返回给前端(以前都是使用 keycloak.auth-server-url 这个配置,现在将其拆开) environment.keycloak.auth-server-url=http://122.122.122.122:8180/auth
,然后相应的地方代码修改部署到测试环境后,进入页面报错:
设置内网登录失败
先看错误日志(此日志级别是 debug,非 error,所以这块对源码是有疑问的):
2022-04-01 16:45:33.069 [http-nio-8081-exec-3] DEBUG o.k.adapters.BearerTokenRequestAuthenticator - Found [1] values in authorization header, selecting the first value for Bearer.
2022-04-01 16:45:33.069 [http-nio-8081-exec-3] DEBUG o.k.adapters.BearerTokenRequestAuthenticator - Verifying access_token
2022-04-01 16:45:33.071 [http-nio-8081-exec-3] DEBUG o.k.adapters.BearerTokenRequestAuthenticator - Failed to verify token
2022-04-01 16:45:33.071 [http-nio-8081-exec-3] DEBUG org.keycloak.adapters.RequestAuthenticator - Bearer FAILED
2022-04-01 16:45:33.071 [http-nio-8081-exec-3] DEBUG o.k.a.s.f.KeycloakAuthenticationProcessingFilter - Auth outcome: FAILED
2022-04-01 16:45:33.071 [http-nio-8081-exec-3] DEBUG o.k.a.s.f.KeycloakAuthenticationProcessingFilter - Authentication request failed: org.keycloak.adapters.springsecurity.KeycloakAuthenticationException: Invalid authorization header, see WWW-Authenticate header for details
org.keycloak.adapters.springsecurity.KeycloakAuthenticationException: Invalid authorization header, see WWW-Authenticate header for details
at org.keycloak.adapters.springsecurity.filter.KeycloakAuthenticationProcessingFilter.attemptAuthentication(KeycloakAuthenticationProcessingFilter.java:162)
at org.springframework.security.web.authentication.AbstractAuthenticationProcessingFilter.doFilter(AbstractAuthenticationProcessingFilter.java:212)
at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:334)
at org.keycloak.adapters.springsecurity.filter.KeycloakPreAuthActionsFilter.doFilter(KeycloakPreAuthActionsFilter.java:96)
at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:334)
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:103)
at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:334)
at org.springframework.security.web.context.SecurityContextPersistenceFilter.doFilter(SecurityContextPersistenceFilter.java:105)
at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:334)
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:103)
at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:334)
at org.springframework.security.web.FilterChainProxy.doFilterInternal(FilterChainProxy.java:215)
at org.springframework.security.web.FilterChainProxy.doFilter(FilterChainProxy.java:178)
at org.springframework.web.filter.DelegatingFilterProxy.invokeDelegate(DelegatingFilterProxy.java:358)
at org.springframework.web.filter.DelegatingFilterProxy.doFilter(DelegatingFilterProxy.java:271)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
at org.springframework.web.filter.RequestContextFilter.doFilterInternal(RequestContextFilter.java:100)
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:119)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:103)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:103)
通过分析源码,最终定位到是有一个 RealmUrlCheck 校验逻辑没通过导致。
分析出这两个值,一个是内网地址,一个是外网地址,虽然指向同一个地方,但是值不同,校验失败,为什么会出现这两个不同的地址,可以从前后端交互逻辑说起:
所以 1.2 后端需要返回给前端浏览器地址必须是外网地址,而前端请求后生成的 token 中携带的就是外网地址,deployment 中的 realmUrl 是根据配置信息(keycloak.auth-server-url)解析出来的,是内网地址,而 JsonWebToken 是根据前端传到后端的 token 解析出来的,这里面的地址是外网地址,4.2 校验 token 时两个地址是不一致的,后端会认为该 token 存在被篡改的危险,抛出了异常,所以为了解决该问题,思考了两种解决方案:
方案 1:扩展原来的 jdk,使用自定义的 KeycloakConfigResolver,KeycloakDeployment 等,比较复杂;
方案 2:在上面的 filter 中,对 deployment 数据进行修改,将 realmUrl 地址从内网替换成域名,因为真正请求的时候不是使用这个参数,所以不会影响内网调用(真正调用时使用的是 authServerBaseUrl 参数)。
为了节省时间,使用了方案 2,具体的 filter 变成了:
@Component
public class ChangeTimeOutFilter implements Filter {
@Resource
private AdapterDeploymentContext deploymentContext;
@Resource
private KeyCloakConfig keyCloakConfig;
private static Field realmInfoUrlFd;
private volatile boolean deploymentChanged = false;
static {
try {
ChangeTimeOutFilter.realmInfoUrlFd = KeycloakDeployment.class.getDeclaredField("realmInfoUrl");
realmInfoUrlFd.setAccessible(true);
} catch (Exception ex){
}
}
@Override
public void doFilter(ServletRequest request, ServletResponse response, FilterChain chain) throws IOException, ServletException {
HttpFacade facade = new SimpleHttpFacade((HttpServletRequest)request, (HttpServletResponse)response);
KeycloakDeployment deployment = deploymentContext.resolveDeployment(facade);
if (deployment == null) {
chain.doFilter(request, response);
return;
}
//deployment 是全局唯一,只需要修改一次
if (deploymentChanged) {
chain.doFilter(request, response);
return;
}
/**
* 将 realmInfoUrl 从内网改为外网,可以让 check 通过
*/
String realmInfoUrl = deployment.getRealmInfoUrl();
if (!StringUtils.isBlank(realmInfoUrl)) {
realmInfoUrl = realmInfoUrl.replaceAll(keyCloakConfig.getInnerUrl(), keyCloakConfig.getAuthUrl());
try {
realmInfoUrlFd.set(deployment, realmInfoUrl);
} catch (Exception ex){
}
}
HttpParams params = deployment.getClient().getParams();
params.setIntParameter(CoreConnectionPNames.SO_TIMEOUT, 10000);
params.setIntParameter(CoreConnectionPNames.CONNECTION_TIMEOUT, 10000);
params.setLongParameter(ClientPNames.CONN_MANAGER_TIMEOUT, 10000L);
deploymentChanged=true;
chain.doFilter(request, response);
}
}
通过反射将字段的值改变使其一致,就可以绕过校验。
基于以上解决方案,目前我们已经解决了这个突如其来的报错。未来,我将继续在“观远数据技术团队”分享过往的各种踩坑故事以及改进经验,欢迎大家关注,共同探讨。
●●●
●●●
👇 点击阅读原文,直接体验Demo