feat(auth): implement SigV4 authentication for REST catalog#616
feat(auth): implement SigV4 authentication for REST catalog#616plusplusjiajia wants to merge 11 commits intoapache:mainfrom
Conversation
bfc98f7 to
9ddbdbd
Compare
915c87b to
d1c0732
Compare
| ICEBERG_PRECHECK(delegate_type != AuthProperties::kAuthTypeSigV4, | ||
| "Cannot delegate a SigV4 auth manager to another SigV4 auth manager"); |
There was a problem hiding this comment.
add delegate_type in error message?
There was a problem hiding this comment.
add delegate_type in error message?
Good idea, done.
| - name: Install dependencies | ||
| shell: bash | ||
| run: sudo apt-get update && sudo apt-get install -y libcurl4-openssl-dev | ||
| run: sudo apt-get update && sudo apt-get install -y libcurl4-openssl-dev ninja-build |
There was a problem hiding this comment.
n00b question why is ninja-build required here?
There was a problem hiding this comment.
Good question — I added a build step in this PR so the linter can see the SigV4 code (needs compile_commands.json from a real build). I used cmake -G Ninja for speed and to be consistent with the other CI workflows, and Ninja is not preinstalled on ubuntu-24.04, hence the extra ninja-build package. Happy to switch to Make if you'd prefer.
There was a problem hiding this comment.
If you see https://github.com/actions/runner-images/blob/main/images/ubuntu/Ubuntu2404-Readme.md, if will find Ninja is a pre-install.
There was a problem hiding this comment.
Thanks @zhjwpku — you're right, Ninja is pre-installed on the ubuntu-24.04 runner. Dropped the ninja-build from the apt install step
| if (session_token_it != properties.end() && !session_token_it->second.empty()) { | ||
| Aws::Auth::AWSCredentials credentials(access_key_it->second.c_str(), | ||
| secret_key_it->second.c_str(), | ||
| session_token_it->second.c_str()); | ||
| return std::make_shared<Aws::Auth::SimpleAWSCredentialsProvider>(credentials); | ||
| } | ||
| Aws::Auth::AWSCredentials credentials(access_key_it->second.c_str(), | ||
| secret_key_it->second.c_str()); | ||
| return std::make_shared<Aws::Auth::SimpleAWSCredentialsProvider>(credentials); |
There was a problem hiding this comment.
Nit: could do only one return if Credentials are created in the conditional statement.
There was a problem hiding this comment.
Nit: could do only one return if Credentials are created in the conditional statement.
Nice catch, done.
| auto it = properties.find(AuthProperties::kSigV4SigningName); | ||
| if (it != properties.end() && !it->second.empty()) { | ||
| return it->second; | ||
| } |
There was a problem hiding this comment.
if(properties.count(AuthProperties::kSigV4SigningName) > 0) {
// do work
}
might be a little less verbose than
auto it = properties.find(AuthProperties::kSigV4SigningName);
if (it != properties.end() && !it->second.empty()) {
return it->second;
}
There was a problem hiding this comment.
Thanks for the suggestion! I chose to keep the !it->second.empty() check on purpose — the intent is for an explicitly-empty value (e.g., a env var set to "") to also fall through to the legacy key / default.
| const TableIdentifier& table, | ||
| const std::unordered_map<std::string, std::string>& properties, | ||
| std::shared_ptr<AuthSession> parent) { | ||
| auto* sigv4_parent = dynamic_cast<SigV4AuthSession*>(parent.get()); |
There was a problem hiding this comment.
use checked_pointer_cast instead
There was a problem hiding this comment.
use
checked_pointer_castinstead
Done here as well.
| Result<std::shared_ptr<AuthSession>> SigV4AuthManager::ContextualSession( | ||
| const std::unordered_map<std::string, std::string>& context, | ||
| std::shared_ptr<AuthSession> parent) { | ||
| auto* sigv4_parent = dynamic_cast<SigV4AuthSession*>(parent.get()); |
There was a problem hiding this comment.
use
checked_pointer_cast
Thanks, that's much better! Done.
| std::string signing_region_; | ||
| std::string signing_name_; | ||
| std::shared_ptr<Aws::Auth::AWSCredentialsProvider> credentials_provider_; | ||
| /// Shared signer instance, matching Java's single Aws4Signer per manager. |
There was a problem hiding this comment.
I find this comment a bit confusing especially given that signer_ is a unique pointer that will be destroyed when SigV4AuthSession is destructed
There was a problem hiding this comment.
You're right, sorry about the confusion — the "shared signer" wording was misleading since signer_ is owned per-session via unique_ptr. I've removed the comment.
|
|
||
| std::unordered_map<std::string, std::string> headers; | ||
| ASSERT_THAT(session_result.value()->Authenticate(headers), IsOk()); | ||
| ASSERT_THAT(session_result.value()->Authenticate(headers, {}), IsOk()); |
There was a problem hiding this comment.
it feels like Authenticate coudl accept a default value for the second parameter
There was a problem hiding this comment.
Good point, thanks! Done
| /// - IOError: Network or connection errors when reaching auth server | ||
| /// - RestError: HTTP errors from authentication service | ||
| virtual Status Authenticate(std::unordered_map<std::string, std::string>& headers) = 0; | ||
| virtual Status Authenticate(std::unordered_map<std::string, std::string>& headers, |
There was a problem hiding this comment.
The current design splits the request context into two separate parameters (headers as in-out + HTTPRequestContext as a separate struct).
The Java implementation uses a cleaner "request-in, request-out" pattern where authenticate() receives the full HTTPRequest and returns a new immutable request with auth headers, I'd suggest aligning with Java by introducing an HTTPRequest type that encapsulates method, url, headers, and body together, and changing the signature to:
virtual Result Authenticate(const HTTPRequest& request) = 0;
I'm open for this
There was a problem hiding this comment.
Thanks @lishuxu ! Agreed — aligning with Java's request-in/request-out pattern is the right call. I'll address this in the current PR: introducing an HTTPRequest type (encapsulating method, url, headers, body), changing the signature to Result Authenticate(const HTTPRequest& request). Will push an update shortly — PTAL when it's ready.
| // ---- SigV4 AWS credential entries ---- | ||
|
|
||
| /// AWS region for SigV4 signing. | ||
| inline static const std::string kSigV4SigningRegion = "rest.signing-region"; |
There was a problem hiding this comment.
We can remove the legacy key kSigV4Region/kSigV4Service
| // ---- SigV4 AWS credential entries ---- | ||
|
|
||
| /// AWS region for SigV4 signing. |
There was a problem hiding this comment.
The names are self-explanatory. I think we can remove them to keep the code concise.
| inline static const std::string kSigV4DelegateAuthType = | ||
| "rest.auth.sigv4.delegate-auth-type"; | ||
|
|
||
| // ---- SigV4 AWS credential entries ---- |
There was a problem hiding this comment.
Nit: the // ---- SigV4 AWS credential entries ---- section header is redundant given the // ---- SigV4 entries ---- block above already covers SigV4 config. Merge them into a single section.
| } | ||
|
|
||
| #ifdef ICEBERG_BUILD_SIGV4 | ||
| Result<std::unique_ptr<AuthManager>> MakeSigV4AuthManager( |
There was a problem hiding this comment.
MakeSigV4AuthManager is implemented directly in auth_managers.cc, while all other factory functions (MakeNoopAuthManager, MakeBasicAuthManager, MakeOAuth2Manager) are defined in their own translation units and only declared in auth_manager_internal.h. Suggest moving the implementation to sigv4_auth_manager.cc for consistency.
There was a problem hiding this comment.
Good catch! Done.
|
|
||
| #include "iceberg/catalog/rest/auth/auth_manager_internal.h" | ||
| #ifdef ICEBERG_BUILD_SIGV4 | ||
| # include "iceberg/catalog/rest/auth/sigv4_auth_manager.h" |
There was a problem hiding this comment.
Nit: auth_properties.h should come before the #ifdef ICEBERG_BUILD_SIGV4 block to maintain alphabetical include order.
There was a problem hiding this comment.
Thanks for catching it!
| } | ||
|
|
||
| { | ||
| std::lock_guard<std::mutex> lock(signing_mutex_); |
There was a problem hiding this comment.
The mutex guards signer_->SignRequest() because AWSAuthV4Signer::SignRequest reportedly mutates internal signer state. However, signer_ is per-session (not shared across sessions), so the mutex only matters if the same SigV4AuthSession instance is called concurrently from multiple threads.
In contrast, Java's RESTSigV4AuthManager shares a single Aws4Signer across all sessions — if the Java signer were stateful, it would need synchronization there. It's worth confirming whether AWSAuthV4Signer::SignRequest actually mutates this or just uses local state — if the latter, the mutex can be removed entirely.
There was a problem hiding this comment.
@lishuxu Good call — I checked the aws-sdk-cpp source (1.11.x, AWSAuthV4Signer) and you're right: for the symmetric SigV4 path we use, SignRequest does not mutate this, so the mutex is unnecessary. I've dropped it.
| return std::make_shared<Aws::Auth::DefaultAWSCredentialsProviderChain>(); | ||
| } | ||
|
|
||
| std::string SigV4AuthManager::ResolveSigningRegion( |
There was a problem hiding this comment.
ResolveSigningRegion manually reads AWS_REGION / AWS_DEFAULT_REGION and falls back to "us-east-1". Java delegates to DefaultAwsRegionProviderChain which also covers ~/.aws/config, EC2/ECS instance metadata. The AWS C++ SDK has an equivalent Aws::Config::EC2InstanceProfileConfigLoader and Aws::Environment::GetEnv. Consider using the SDK's built-in region resolution instead of reimplementing a subset of it.
There was a problem hiding this comment.
@lishuxu Good point.Switched to: return {Aws::Client::ClientConfiguration().region.c_str()};
| Status SigV4AuthManager::Close() { return delegate_->Close(); } | ||
|
|
||
| Result<std::shared_ptr<Aws::Auth::AWSCredentialsProvider>> | ||
| SigV4AuthManager::MakeCredentialsProvider( |
There was a problem hiding this comment.
Java's AwsProperties.restCredentialsProvider() supports loading a custom AwsCredentialsProvider via a class name property. C++ only supports static credentials and the default chain. This is a known gap — worth a // TODO comment for future extensibility.
| Result<std::shared_ptr<AuthSession>> SigV4AuthManager::ContextualSession( | ||
| const std::unordered_map<std::string, std::string>& context, | ||
| std::shared_ptr<AuthSession> parent) { | ||
| auto sigv4_parent = internal::checked_pointer_cast<SigV4AuthSession>(std::move(parent)); |
There was a problem hiding this comment.
checked_pointer_cast in ContextualSession/TableSession compiles to static_pointer_cast in Release builds — a wrong type silently causes UB.
There was a problem hiding this comment.
You're right. Updated.
| if (!first) url += "&"; | ||
| auto ek = EncodeString(k); | ||
| auto ev = EncodeString(v); | ||
| url += (ek ? *ek : k) + "=" + (ev ? *ev : v); |
There was a problem hiding this comment.
AppendQueryString silently falls back to the raw key/value when EncodeString fails. If encoding fails, the URL passed to Authenticate would differ from what the server receives, causing signature verification to fail. Consider propagating the error instead:
ICEBERG_ASSIGN_OR_RAISE(auto ek, EncodeString(k));
ICEBERG_ASSIGN_OR_RAISE(auto ev, EncodeString(v));
url += ek + "=" + ev;
There was a problem hiding this comment.
@lishuxu Good catch. Changed AppendQueryString to return Resultstd::string
wgtmac
left a comment
There was a problem hiding this comment.
Thanks for adding this! I have just completed the architectural review and didn't fully review the sigv4 manager yet. I have some preliminary questions here:
- Should we also be compatible to the legacy
rest.sigv4-enabled=trueconfig (and others) when creating a auth manager? - How is this tested e2e? Any chance to have an integration test?
|
|
||
| function(resolve_aws_sdk_dependency) | ||
| find_package(AWSSDK REQUIRED COMPONENTS core) | ||
| list(APPEND ICEBERG_SYSTEM_DEPENDENCIES AWSSDK) |
There was a problem hiding this comment.
Here it records only AWSSDK for installed-package dependency discovery, while src/iceberg/catalog/rest/CMakeLists.txt exports aws-cpp-sdk-core in the REST install interface. The generated iceberg-config.cmake can only call find_dependency(AWSSDK) without COMPONENTS core, but AWS SDK’s CMake config loads component packages from AWSSDK_FIND_COMPONENTS. A downstream installed SigV4 build can therefore fail to find/link AWS core unless it happens to be on the default linker path.
I'd suggest to special-case find_dependency(AWSSDK COMPONENTS core) in the iceberg-config.cmake.in or otherwise export the AWS SDK dependency component-aware.
|
|
||
| #ifdef ICEBERG_BUILD_SIGV4 | ||
| /// \brief Create a SigV4 authentication manager with a delegate. | ||
| Result<std::unique_ptr<AuthManager>> MakeSigV4AuthManager( |
There was a problem hiding this comment.
Where is the definition? BTW, we don't need to use macro ICEBERG_BUILD_SIGV4 everywhere. We can return Unsupported from MakeSigV4AuthManager function internally depending on this macro.
| {AuthProperties::kAuthTypeBasic, MakeBasicAuthManager}, | ||
| {AuthProperties::kAuthTypeOAuth2, MakeOAuth2Manager}, | ||
| }; | ||
| #ifdef ICEBERG_BUILD_SIGV4 |
There was a problem hiding this comment.
ditto, we don't need to use this macro here.
| mkdir build && cd build | ||
| cmake .. -G Ninja -DCMAKE_EXPORT_COMPILE_COMMANDS=ON | ||
| cmake .. -G Ninja -DCMAKE_EXPORT_COMPILE_COMMANDS=ON \ | ||
| -DICEBERG_BUILD_SIGV4=ON \ |
There was a problem hiding this comment.
Why do we need changes in the file? Is it because unrecognized headers from sigv4_auth_manager.cc? Does disabling ICEBERG_BUILD_SIGV4 help in this case? I am thinking if we can add a dedicated ci workflow for aws-related stuff like S3 and SigV4
| option(ICEBERG_BUILD_BUNDLE "Build the battery included library" ON) | ||
| option(ICEBERG_BUILD_REST "Build rest catalog client" ON) | ||
| option(ICEBERG_BUILD_REST_INTEGRATION_TESTS "Build rest catalog integration tests" OFF) | ||
| option(ICEBERG_BUILD_SIGV4 "Build SigV4 authentication support (requires AWS SDK)" OFF) |
There was a problem hiding this comment.
Please rebase on the latest main branch so we can see the option ICEBERG_S3. I think we should follow the same pattern to name it ICEBERG_SIGV4.
| cpr_params.Add({key, val}); | ||
| if (params.empty()) return base_url; | ||
| std::map<std::string, std::string> sorted(params.begin(), params.end()); | ||
| std::string url = base_url + "?"; |
There was a problem hiding this comment.
Here we assume that base_url will never contain & which is true for rest catalog use case but HttpClient is an exported class so it is worth adding a comment to avoid misuse.
| SigV4AuthSession( | ||
| std::shared_ptr<AuthSession> delegate, std::string signing_region, | ||
| std::string signing_name, | ||
| std::shared_ptr<Aws::Auth::AWSCredentialsProvider> credentials_provider, |
There was a problem hiding this comment.
Generally it is not a good practice to expose AWS sdk internals like this. We can consider changing to sigv4_auth_manager_internal.h so we don't install it any more. This is also related to my other comment about installed headers.
| namespace { | ||
|
|
||
| /// \brief Ensures AWS SDK is initialized exactly once per process. | ||
| /// ShutdownAPI is intentionally never called (leak-by-design) to avoid |
There was a problem hiding this comment.
Is this the recommended approach?
| // ---------- Tests ported from Java TestRESTSigV4AuthSession ---------- | ||
|
|
||
| // Java: authenticateWithoutBody |
There was a problem hiding this comment.
| // ---------- Tests ported from Java TestRESTSigV4AuthSession ---------- | |
| // Java: authenticateWithoutBody |
Let's remove comments like this and below.
| auto delegate_session, | ||
| delegate_->TableSession(table, properties, sigv4_parent->delegate())); | ||
|
|
||
| auto merged = MergeProperties(sigv4_parent->effective_properties(), properties); |
There was a problem hiding this comment.
Quick question: is it intentional that table sessions inherit the parent session's effective SigV4 properties, including contextual overrides?
This seems slightly different from Java's current RESTSigV4AuthManager, where tableSession merges table properties with catalogProperties. The C++ precedence of catalog < context < table looks reasonable to me, but if this is deliberate, could we document it or keep the test that makes this behavior explicit?
Implement AWS SigV4 authentication for the REST catalog client, following Java's
RESTSigV4AuthManagerandRESTSigV4AuthSession.AuthSession::Authenticate()withHTTPRequestContext(method, url, body) for SigV4 request signingSigV4AuthSession: delegate-first auth → relocate conflicting Authorization header → sign with AWS SDKSigV4AuthManager: wraps delegate AuthManager (default OAuth2), resolves credentials from properties or default chainSignerChecksumParamsoutput: empty body → hexEMPTY_BODY_SHA256; non-empty body →Base64(SHA256(body))