Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spark Iceberg REST Catalog refresh token #12363

Open
nqvuong1998 opened this issue Feb 21, 2025 · 13 comments
Open

Spark Iceberg REST Catalog refresh token #12363

nqvuong1998 opened this issue Feb 21, 2025 · 13 comments
Labels
question Further information is requested

Comments

@nqvuong1998
Copy link

nqvuong1998 commented Feb 21, 2025

Query engine

Spark

Question

When I set up Spark to connect to the REST Catalog (Lakekeeper) using Keycloak as the OAuth2 server with the client credentials flow, I configured the token to expire after 5 minutes on Keycloak. Once the 5 minutes elapsed, the REST Catalog returned an "Unauthorized" exception. It appears that Spark's REST Catalog does not refresh the token from the OAuth2 server.

spark.sql.catalog.rest=org.apache.iceberg.spark.SparkCatalog
spark.sql.catalog.rest.type=rest
spark.sql.catalog.rest.uri=...
spark.sql.catalog.rest.credential={{ .Env.CLIENT_ID }}:{{ .Env.CLIENT_SECRET }}
spark.sql.catalog.rest.warehouse=my_warehouse
spark.sql.catalog.rest.scope=...
spark.sql.catalog.rest.oauth2-server-uri=...
spark.sql.catalog.rest.io-impl=org.apache.iceberg.aws.s3.S3FileIO

A similar setup on Trino works correctly.

@nqvuong1998 nqvuong1998 added the question Further information is requested label Feb 21, 2025
@nqvuong1998
Copy link
Author

nqvuong1998 commented Feb 21, 2025

cc @dimas-b @RussellSpitzer @c-thiel

@nqvuong1998 nqvuong1998 changed the title Spark Icberg REST Catalog refresh token Spark Iceberg REST Catalog refresh token Feb 21, 2025
@c-thiel
Copy link
Contributor

c-thiel commented Feb 21, 2025

@nqvuong1998 which version if Iceberg and Spark are you using?
Spark should refresh those tokens automatically. This is the class that handles the refresh:

public static class AuthSession implements org.apache.iceberg.rest.auth.AuthSession {

@nqvuong1998
Copy link
Author

nqvuong1998 commented Feb 21, 2025

Hi @c-thiel ,
In Keycloak, I set the token expiration to 5 minutes. After that, I configured the catalog in both Spark and Trino. However, after the token expires, Trino continues to work normally, while Spark returns an "Unauthorized" error and Lakekeeper return "Failed to decode token: ExpiredSignature" message.

Note: Trino and Spark used same clientId and clientSecret.

Spark: 3.5.2
Iceberg: 1.7.1
Lakekeeper: 0.6.2
Trino: 470

@c-thiel
Copy link
Contributor

c-thiel commented Feb 22, 2025

@nqvuong1998 I did some further tests and tracked it down. The problem is two-fold:
Firstly we currently always use the token exchange flow to refresh a token instead of falling back to simply re-performing the initial client credential flow. The code is here:

private static OAuthTokenResponse refreshToken(
RESTClient client,
Map<String, String> headers,
String subjectToken,
String subjectTokenType,
String scope,
String oauth2ServerUri,
Map<String, String> optionalOAuthParams) {
Map<String, String> request =
tokenExchangeRequest(
subjectToken,
subjectTokenType,
scope != null ? ImmutableList.of(scope) : ImmutableList.of(),
optionalOAuthParams);

The second half of the problem is that Keycloak didn't implement the token exchange flow according to the RFC. Iceberg did.
Iceberg uses the actor_token and actor_token_type fields to authenticate the exchange while Keycloak expects client_id and client_secret.

I opened an Issue for Keycloak exactly for this some time back: keycloak/keycloak#36203

It seems that trino took the easier route and just re-performs the initial client credential flow.

@adutra do you know if the new Auth Manager would solve this?

@adutra
Copy link
Contributor

adutra commented Feb 22, 2025

Hi @c-thiel and @nqvuong1998!

Yes, the fact that token refreshes are broken when using an external IDP like Keycloak is a known issue. To summarize (and your analyzis was spot on!), there are two issues:

  1. using token exchange to refresh tokens with client_credentials is non-standard;
  2. a malformed token exchange request results in 401.

I posted some explanations here: #12196 (comment).

do you know if the new Auth Manager would solve this?

The AuthManager API can definitely help with this issue, although it won't solve it per se. Let's push to have it merged, and then I can move on and provide a fix for the token refresh problem: #12197

@adutra
Copy link
Contributor

adutra commented Feb 22, 2025

@c-thiel about this part of your comment:

The second half of the problem is that Keycloak didn't implement the token exchange flow according to the RFC. Iceberg did.
Iceberg uses the actor_token and actor_token_type fields to authenticate the exchange while Keycloak expects client_id and client_secret.

So, here, I'm not sure I agree with you. Reading RFC 8693 section 2.1:

Client authentication to the authorization server is done using the normal mechanisms provided by OAuth 2.0. Section 2.3.1 of [RFC6749] defines password-based authentication of the client, however, client authentication is extensible and other mechanisms are possible.

So my understanding is that token exchange is no different from other flows defined in RFC 6749 wrt client authentication. In particular, the actor token is NOT meant to represent the client, but rather:

A security token that represents the identity of the acting party. Typically, this will be the party that is authorized to use the requested security token and act on behalf of the subject.

So, in my opinion, Keycloak is rather correct here. It's Iceberg that is doing a malformed token exchange request.

FYI Keycloak still considers token exchange in preview state. They have been collecting use cases for quite a while now under this issue, which is worth reading:

keycloak/keycloak#26502

@varpa89
Copy link

varpa89 commented Feb 24, 2025

We're experiencing similar problems with token refresh (Iceberg Rest Catalog with Trino and Spark) and Keycloak.

And there is one interesting observation:
In one of the cases when background token refresh is happening there is a flow when RestSessionCatalog uses basic authorization (not bearer) and Keycloak is able to authenticate such request. But unfortunately it fails because Keycloak can't validate a subjuect_token
https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/rest/auth/OAuth2Util.java#L605

@adutra
Copy link
Contributor

adutra commented Feb 24, 2025

@varpa89

Yes, in certain cases Keycloak may be able to refresh the token:

Iceberg REST will try 5 times to refresh using bearer token, then will fall back to basic auth.

So in certain cases, it is possible that the refresh eventually works with basic auth, but that is possible only if 1) the IDP supports token exchange and 2) the IDP is correctly configured to take the subject token and issue a new access token for it.

@varpa89
Copy link

varpa89 commented Feb 24, 2025

There is also a potential issue with ddos of IDP from the RestSessionCatalog. When I was debugging refresh token behaviour, I found a strange situation. We don't use session "iceberg.rest-catalog.session" = 'NONE' but in case when we provide credentials "iceberg.rest-catalog.oauth2.credential" = 'admin:password' we still create a session for each request. But key for the session in the cache is a random UUID. So we put a new session for each request and then in background try to refresh a token with retires

Image

@adutra
Copy link
Contributor

adutra commented Feb 24, 2025

@varpa89 I don't know what session=NONE is, is that a Trino/Spark property?

From what I see, your session cache is full of "context" sessions because only context sessions have UUID keys.

Are you creating the catalog client with this constructor?

In any case, you shouldn't end up with one auth session per request, session contexts are meant to be reused.

@adutra
Copy link
Contributor

adutra commented Feb 24, 2025

@varpa89 I think I see where your problem could be coming from: if you are using these constructors:

public RESTCatalog() {
this(
SessionCatalog.SessionContext.createEmpty(),
config -> HTTPClient.builder(config).uri(config.get(CatalogProperties.URI)).build());
}
public RESTCatalog(Function<Map<String, String>, RESTClient> clientBuilder) {
this(SessionCatalog.SessionContext.createEmpty(), clientBuilder);
}

And creating lots of catalog instances, then you would end up with lots of cached auth sessions. That's because SessionCatalog.SessionContext.createEmpty() returns a different UUID key each time.

I wonder if SessionCatalog.SessionContext.createEmpty() shouldn't return a constant UUID 🤔

@varpa89
Copy link

varpa89 commented Feb 24, 2025

@adutra oh I see, sorry for the confusion. iceberg.rest-catalog.session is a Trino's property. And this is the place where they create a new SessionContext with random UUID.
So looks like the problem is outside of iceberg but in TrinoRestCatalog

@adutra
Copy link
Contributor

adutra commented Feb 24, 2025

So looks like the problem is outside of iceberg but in TrinoRestCatalog

Yes it seems so; I would suggest using something deterministic to create the session context ID for the NONE case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants