[V2] Add AuthMetadataService + wire external OAuth2 auth into runs service#6998
Open
[V2] Add AuthMetadataService + wire external OAuth2 auth into runs service#6998
Conversation
Restructure auth config to match flyteadmin's shape with support for self-hosted and external authorization server modes. The self mode builds OAuth2 metadata from relative URLs based on AuthorizedURIs, while the external mode fetches metadata from .well-known/oauth-authorization-server with retry logic, HTTP proxy support, and token endpoint proxy rewriting. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Kevin Su <pingsutw@apache.org>
Signed-off-by: Kevin Su <pingsutw@apache.org>
Signed-off-by: Kevin Su <pingsutw@apache.org>
Signed-off-by: Kevin Su <pingsutw@apache.org>
Signed-off-by: Kevin Su <pingsutw@apache.org>
Signed-off-by: Kevin Su <pingsutw@apache.org>
Enable external-mode OAuth2 authentication for the v2 runs service so JWT bearer tokens and auth cookies are validated at the HTTP boundary and the standard OIDC browser login flow is served from the same binary. Go changes: - runs/setup.go: new setupAuth() builds ResourceServer + AuthContext, registers /login /callback /logout and the OIDC metadata redirect, chains the new HTTP auth middleware with existing middleware, and replaces the buggy duplicate AuthMetadataService mount with a single real-or-stub branch. - runs/service/auth/http_middleware.go: bearer/cookie validator with a public-path allowlist (/healthz, /readyz, /healthcheck, /login, /callback, /logout, /.well-known/, /flyteidl2.auth.AuthMetadataService/). - runs/service/auth/auth_context.go: NewAuthContext takes an oidcClientSecret and populates oauth2.Config.ClientSecret; RedirectURL is computed as an absolute URL from the first authorizedUri via a new computeOIDCRedirectURL helper. - runs/service/auth/config/config.go: add go:generate enumer directives for AuthorizationServerType and SameSite. - Generated enumer files for AuthorizationServerType and SameSite. - Unit tests: http_middleware, computeOIDCRedirectURL, enumer round trips, cookie helpers, token helpers, config defaults. Helm chart (charts/flyte-binary): - templates/configmap.yaml: render a new 004-auth.yaml from configuration.auth.* (including externalAuthServer, authorizedUris, userAuth.openId, thirdPartyConfig.flyteClient, and runs.security.useAuth) when auth.enabled. - templates/_helpers.tpl: runServiceAuthSecretName honors a new configuration.auth.runServiceAuthSecretRef override so deployments can reuse an existing admin-auth secret instead of re-rendering. - templates/run-service-auth-secret.yaml: skip rendering when the override is set to avoid Helm ownership conflicts. - templates/deployment.yaml: fix run-service-auth-secret include-path typo; guard its checksum with the override; keep the existing extraInlineSecretRefs projection loop. - templates/ingress/http.yaml: new ingress.minimalPaths flag that omits /oauth2, /.well-known, /me, /config, /v1/*, /api, and /console paths so they can fall through to an adjacent Flyte deployment sharing the same ALB ingress group. - values.yaml: defaults for configuration.auth.externalAuthServer.*, configuration.auth.runServiceAuthSecretRef, and ingress.minimalPaths (false, preserving existing behavior). Signed-off-by: Kevin Su <pingsutw@apache.org>
The unified Flyte binary uses connect-rpc clients that talk to their own mux via http://localhost:<port> (e.g. RunService calls ActionsService.CreateAction). Those calls have no Authorization header because they're in-process, and the new external auth middleware was rejecting them with 401 — so run creation silently failed end-to-end. Bypass auth when req.RemoteAddr is a loopback address (127.0.0.0/8 or ::1). External traffic from the ALB never has a loopback remote addr, so this doesn't widen the attack surface. Add table-driven isLoopbackRequest tests and middleware tests for both IPv4 and IPv6 loopback and a non-loopback pod IP. Signed-off-by: Kevin Su <pingsutw@apache.org>
…alRunService Task pods running flytekit call ActionsService.CreateAction (and InternalRunService) via the flyte2-grpc ClusterIP service to enqueue subsequent actions. Those calls arrive at the pod with the task pod's IP as RemoteAddr — not loopback — so the loopback bypass does not catch them, and the external auth middleware was returning 401, which flytekit reported as "Failed to launch action: Unauthorized" and task execution failed. Add /flyteidl2.actions.ActionsService/ and /flyteidl2.workflow.InternalRunService/ to the public-path allowlist so in-cluster traffic to these services passes without credentials. Remove the same paths from the ingress grpcPaths helper so they are not exposed via the external ALB — they remain reachable only through the ClusterIP service inside the cluster, matching v1's propeller -> flyteadmin pattern. Update the table-driven IsPublicPath test and swap the loopback / non-loopback test path to RunService so the assertion still exercises the gate rather than the new public path. Signed-off-by: Kevin Su <pingsutw@apache.org>
Signed-off-by: Kevin Su <pingsutw@apache.org>
The external auth server's .well-known/oauth-authorization-server
response uses camelCase JSON keys (authorizationEndpoint, tokenEndpoint,
jwksUri) per the proto3 JSON specification. The v2 proto struct has
snake_case JSON tags (authorization_endpoint, token_endpoint, jwks_uri).
json.Unmarshal only matched the issuer field (same case), silently
dropping all other fields. This caused GetOAuth2Metadata to return
only {"issuer":"..."}, breaking CLI auth bootstrap — the client could
not discover the token or authorization endpoints.
Switch unmarshalResp from json.Unmarshal to protojson.Unmarshal, which
accepts both camelCase and snake_case input per the protobuf spec.
Signed-off-by: Kevin Su <pingsutw@apache.org>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Architecture
The v2 runs service acts as an OAuth2 resource server. It validates JWTs issued by a configured external OAuth2 authorization server and runs the standard OIDC code flow against an OpenID Connect IdP (e.g. Okta) for browser users. v2 does not mint tokens itself —
authCfg.AppAuth.AuthServerTypeisExternal, andexternalAuthServer.baseUrlpoints at the issuer whose.well-known/oauth-authorization-server+jwks_uriv2 reads at boot.graph TB Client[UI / flyte-sdk / task pods] ALB["Ingress / load balancer"] IDP["OIDC IdP (Okta)<br/>signin.example.com<br/>client id 0oak...5d6"] AUTHSRV["External OAuth2 authorization server<br/>.well-known/oauth-authorization-server<br/>jwks_uri"] MW["runs GetAuthenticationHTTPInterceptor<br/>bearer + cookie + loopback gate"] OIDC["OIDC browser handlers<br/>/login /callback /logout<br/>.well-known/openid-configuration"] RPC["v2 connect-rpc handlers<br/>RunService ProjectService TaskService<br/>TriggerService TranslatorService<br/>AuthMetadataService IdentityService ..."] INT["intra-cluster connect-rpc<br/>ActionsService InternalRunService<br/>(ClusterIP only, allowlisted)"] COOKIE["CookieManager<br/>AES-256 + HMAC-SHA256<br/>cookie_hash_key cookie_block_key"] S1["auth secret<br/>cookie_hash_key / cookie_block_key"] S2["OIDC client secret<br/>oidc_client_secret"] Client --> ALB ALB -->|"/login /callback /logout /healthcheck<br/>/flyteidl2.*/*"| MW ALB -.->|"/v2/* static SPA"| MW Client -.->|"task-pod ClusterIP calls"| INT MW --> OIDC MW --> RPC OIDC <-->|"authorization code flow"| IDP OIDC --> COOKIE MW -.->|"boots JWKS"| AUTHSRV MW -.->|"validate bearer token signature"| AUTHSRV COOKIE -.->|"reads"| S1 OIDC -.->|"reads"| S2Component data flow
setupAuth()bootstrapruns/setup.go/etc/secrets/cookie_hash_key,/etc/secrets/cookie_block_key,/etc/secrets/oidc_client_secret,authCfg.AppAuth.ExternalAuthServer.BaseURL*AuthenticationContext, registers 4 HTTP handlers + connect-rpc services, setssc.MiddlewareResourceServerruns/service/auth/authzserver/resource_server.go.well-known/oauth-authorization-serverthenjwks_urithen JWKS*IdentityContextGetAuthenticationHTTPInterceptorruns/service/auth/http_middleware.goreq.URL.Path,req.Header.Get("Authorization"), request cookies,req.RemoteAddrIdentityContextintoctx, writes401, or bypasses auth for loopback and public pathsIsPublicPathreq.URL.PathisLoopbackRequestreq.RemoteAddrIdentityContextFromRequestruns/service/auth/handlers.goAuthorization: Bearer ...header thenResourceServer.ValidateAccessToken, or cookies thenCookieManager.RetrieveTokenValuesthenIdentityContextFromIDToken*IdentityContextor errorCookieManagerruns/service/auth/cookie_manager.gocookie_hash_key+cookie_block_key(base64, at startup);flyte_at/flyte_rt/flyte_idt/flyte_user_infocookies (per request)/login+/callback, decrypted payloads on middleware passGetLoginHandlerruns/service/auth/handlers.goredirect_urlquery paramflyte_csrf_statecookie; 307 tooauth2Config.AuthCodeURL(state)/loginGetCallbackHandlerruns/service/auth/handlers.gocode+statequery params,flyte_csrf_statecookie,oidc_client_secretflyte_at/flyte_rt/flyte_idt/flyte_user_infocookies, 307 to post-login redirectcomputeOIDCRedirectURLruns/service/auth/auth_context.gocfg.AuthorizedURIs[0]https://<host>/callbackNewAuthContexttime, baked intooauth2.Config.RedirectURLauthzserver.NewAuthMetadataServiceruns/service/auth/authzserver/metadata_provider.goauthCfg.AppAuth.ExternalAuthServer.BaseURL+.MetadataEndpointURL/flyteidl2.auth.AuthMetadataService/GetOAuth2MetadataandGetPublicClientConfigvia connect-rpcconfigmap.yamlrenders004-auth.yamlcharts/flyte-binary/templates/configmap.yamlconfiguration.auth.*valuesauth:block into the runs ConfigMapingress/http.yamlminimalPathscharts/flyte-binary/templates/ingress/http.yaml.Values.ingress.minimalPathsFlow 1 — API call with a bearer token (machine client, the common path)
sequenceDiagram autonumber participant C as flyte-sdk participant ALB as Ingress participant AS as External auth server participant V2 as runs service participant MW as HTTP auth middleware participant H as connect-rpc handler Note over C,AS: One-time bootstrap C->>AS: GET oauth authorization server metadata AS-->>C: issuer, token endpoint, jwks uri C->>AS: POST client credentials grant AS-->>C: signed access token Note over V2,AS: Boot-time JWKS cache on the runs service V2->>AS: GET metadata and jwks AS-->>V2: RSA public keys cached in memory Note over C,H: API call with bearer token C->>ALB: POST RunService CreateRun with Authorization Bearer ALB->>V2: forward to runs service V2->>MW: enter sc Middleware chain Note right of MW: CORS then auth then mux MW->>MW: IsPublicPath returns false MW->>MW: isLoopbackRequest returns false MW->>MW: IdentityContextFromRequest MW->>MW: ResourceServer ValidateAccessToken Note right of MW: verify signature against cached JWKS,<br/>check aud iss exp, build IdentityContext MW->>H: next ServeHTTP with IdentityContext H->>H: read identity, authorize, execute H-->>C: connect-rpc responseFlow 2 — Browser OIDC login via
/loginsequenceDiagram autonumber participant B as Browser participant ALB as Ingress participant V2 as runs service participant OK as OIDC IdP participant CM as CookieManager B->>ALB: GET login with redirect url query ALB->>V2: forward V2->>V2: GetLoginHandler generates CSRF token V2->>B: Set-Cookie flyte_csrf_state (HttpOnly, Secure) V2->>B: 307 redirect to IdP authorize endpoint Note right of V2: client_id plus absolute redirect_uri,<br/>response_type code, scope openid profile,<br/>state set to hashed CSRF cookie value B->>OK: follow redirect OK->>B: login form, SSO, user authenticates OK->>B: 302 redirect to callback with code and state B->>ALB: GET callback with code and state, plus flyte_csrf_state cookie ALB->>V2: forward V2->>V2: VerifyCsrfCookie, hash of cookie equals state V2->>OK: POST token endpoint with authorization code Note right of V2: Basic auth sends client_id and oidc_client_secret OK-->>V2: access token, refresh token, id token V2->>OK: GET userinfo endpoint with access token OK-->>V2: sub, email, name V2->>CM: SetTokenCookies and SetUserInfoCookie Note right of CM: cookies encrypted with cookie_hash_key<br/>and cookie_block_key from /etc/secrets CM-->>V2: encrypted cookie values V2->>B: Set-Cookie flyte_at, flyte_rt, flyte_idt, flyte_user_info V2->>B: 307 redirect to original redirect url Note over B,V2: Subsequent requests reuse these cookies via IdentityContextFromRequest<br/>and CookieManager RetrieveTokenValues.What v2 owns and what it defers
ResourceServercaches JWKS at boot, verifies per request/login+/callbackredirect_uri, encrypted cookiesCookieManager/etc/secrets.well-known/pathflyteidl2.auth.AuthMetadataServiceis a connect-rpc surface for its own clients, not the bare well-known URLActionsService,InternalRunService)Where trust comes from
ResourceServerfetches the issuer's metadata and JWKS at boot, caches the RSA public key, and verifies every incoming JWT. Tokens whoseauddoes not matchauthCfg.AppAuth.ExternalAuthServer.AllowedAudienceare rejected.CookieManagerencrypts session cookies with AES-256 + HMAC-SHA256 usingcookie_hash_keyandcookie_block_keyloaded from/etc/secretsat startup. Cookies areHttpOnly,Secure, and scoped to the deployment hostname./callbackexchanges the authorization code with the IdP using the confidential client secret in/etc/secrets/oidc_client_secret. The redirect URI is the absolute URL computed from the firstauthorizedUrisentry.ActionsServiceandInternalRunServiceare reachable only via the ClusterIP service (never exposed via the external ingress) and are allowlisted in the auth middleware so in-cluster task pods can enqueue actions without carrying credentials.Tracking issue
Why are the changes needed?
Two related goals, implemented in sequence on this branch:
AuthMetadataService parity with upstream. The previous
runs/service/auth_metadata.gowas a static-config implementation that could not serve tokens from an external authorization server. The runs service needs two modes — self auth server (builds metadata from relative URLs) and external auth server (fetches from.well-known/oauth-authorization-serverwith retry logic) — to match the upstream auth shape.Actually turn authentication on in the runs service binary. Even with a working AuthMetadataService, nothing in
runs.Setup()built anAuthenticationContext, registered the OIDC browser handlers, or enforced bearer/cookie validation on API calls. The auth package existed but was entirely unwired. This PR makescfg.Security.UseAuth = trueactually gate every non-public endpoint and kick browser users through the standard OIDC flow.What changes were proposed in this pull request?
Go —
runs/service/auth_metadata.gorewriteruns/config/config.go— restructured auth config:AuthConfigwithAuthorizedURIs,GrpcAuthorizationHeader,AppAuth,HTTPProxyURL,TokenEndpointProxyConfig;OAuth2OptionswithAuthServerType(Self/External),SelfAuthServer,ExternalAuthServer,ThirdParty;ExternalAuthorizationServerwith retry config;FlyteClientConfigandTokenEndpointProxyConfigsub-structs.runs/service/auth_metadata.go— two code paths: Self mode builds OAuth2 metadata with relative URLs (/oauth2/token,/oauth2/authorize,/oauth2/jwks) based on firstAuthorizedURI; External mode fetches metadata from.well-known/oauth-authorization-serverwith retry, HTTP proxy support, and optional token endpoint proxy rewriting.Go — wire external auth into the runs binary
runs/setup.go— newsetupAuth()helper inside theif cfg.Security.UseAuthblock:AuthMetadataService(viaauthzserver.NewAuthMetadataService) andIdentityService.cookie_hash_key,cookie_block_key, andoidc_client_secretfrom/etc/secrets/.ResourceServerviaauthzserver.NewOAuth2ResourceServer(with fallback to the firstauthorizedUriwhenexternalAuthServer.baseUrlis empty).AuthenticationContextviaauthservice.NewAuthContext./login,/callback,/logout, and/.well-known/openid-configurationviaauthservice.RegisterHandlers.sc.Middleware(CORS stays outermost).authconnect.NewAuthMetadataServiceHandlerwas mounted twice (real + stub) on the same mux path — the duplicate registration would have panicked the pod the momentUseAuth=true.Go — new:
runs/service/auth/http_middleware.goGetAuthenticationHTTPInterceptor(h *AuthHandlerConfig) func(http.Handler) http.Handlervalidates a bearer token or auth cookies on every request via the existingIdentityContextFromRequest, injectsIdentityContextinto the request context on success, and returns 401 on failure./healthz,/readyz,/healthcheck,/login,/callback,/logout,/.well-known/,/flyteidl2.auth.AuthMetadataService/,/flyteidl2.actions.ActionsService/, and/flyteidl2.workflow.InternalRunService/(the last two are reachable only via the ClusterIP service since they are excluded from the external ingress).isLoopbackRequest(req)bypasses auth when the request originated from the loopback interface — this is required because the unified binary makes intra-process connect-rpc calls (e.g. RunService → ActionsService) to its own mux viahttp://localhost:<port>, and those calls have no Authorization header.cfg.DisableForHTTPas a global bypass.Go —
runs/service/auth/auth_context.gofixesNewAuthContextnow accepts anoidcClientSecretparameter and sets it onoauth2.Config.ClientSecret. Without this, the/callbackcode exchange with a confidential OIDC client (e.g. Okta) fails withinvalid_client.computeOIDCRedirectURL(cfg)helper derives an absolute callback URL fromcfg.AuthorizedURIs[0] + "/callback", replacing the previous relative string"callback"that IdPs rejected.Go — generated enumer files
//go:generate enumerdirectives added toruns/service/auth/config/config.go.authorizationservertype_enumer.goandsamesite_enumer.gofor JSON marshal/unmarshal of the enum types.Go — unit tests
runs/service/auth/http_middleware_test.go— table-drivenIsPublicPath, public-path bypass,DisableForHTTPbypass, unauthenticated → 401, IPv4 loopback bypass, IPv6 loopback bypass, non-loopback still blocks, ActionsService reachable from pod IP without auth,isLoopbackRequesttruth table.runs/service/auth/auth_context_test.go—computeOIDCRedirectURLcases (no authorizedUris, simple host, trailing slash, multiple uris, host with path prefix).runs/service/auth/config/config_test.go— enumer JSON round-trips, invalid values,ThirdPartyConfigOptions.IsEmpty,MustParseURL,DefaultConfigsanity.runs/service/auth/cookie_test.go— CSRF hash, CSRF token generation,NewSecureCookieround-trip, wrong-key decode,VerifyCsrfCookiehappy/mismatch/missing/empty-state,NewRedirectCookie,GetAuthFlowEndRedirectquery/cookie/fallback paths.runs/service/auth/token_test.go—NewOAuthTokenFromRaw,ExtractTokensFromOauthTokenhappy + nil + missing-id-token,bearerTokenFromMD/idTokenFromMDhappy + wrong scheme + blank + no metadata.runs/service/auth_metadata_test.go— 13 cases covering self mode, self-mode custom issuer, self-mode no authorizedUris, external-mode happy path, external-mode custom metadataUrl, external-mode token-proxy rewriting, external-mode missing base URL, retry paths.Helm chart —
charts/flyte-binarytemplates/configmap.yaml— render a new004-auth.yamlkey fromconfiguration.auth.*whenconfiguration.auth.enabled. Includesauth.appAuth.externalAuthServer,auth.appAuth.thirdPartyConfig.flyteClient,auth.authorizedUris,auth.userAuth.openId, andruns.security.useAuth: true.templates/_helpers.tpl—flyte-binary.configuration.auth.runServiceAuthSecretNamenow honors a newconfiguration.auth.runServiceAuthSecretRefoverride so deployments can reuse an existing admin-auth secret without fighting Helm ownership. ThegrpcPathshelper no longer emitsInternalRunServiceorActionsService— those are intra-cluster only and the middleware allowlist handles them.templates/run-service-auth-secret.yaml— skip rendering entirely when the override is set.templates/deployment.yaml— fix a latent include-path typo (/runservice-auth-secret.yaml→/run-service-auth-secret.yaml) that would have brokenhelm upgradethe momentauth.enabled: true. Guard the checksum annotation with the override. Keep theextraInlineSecretRefsprojection loop for mounting additional existing config secrets.templates/ingress/http.yaml— newingress.minimalPathsflag. Whentrue, omits/.well-known,/.well-known/*,/me,/config,/config/*,/oauth2,/oauth2/*,/api,/api/*,/console,/console/*from the HTTP ingress so these paths can be served by a different deployment sharing the same ALB group.values.yaml— new default fields:configuration.auth.externalAuthServer.{baseUrl, metadataUrl, allowedAudience}configuration.auth.runServiceAuthSecretRef: ""ingress.minimalPaths: false(preserving existing behavior)How was this patch tested?
Unit tests
All new and existing tests pass across
runs/service/auth,runs/service/auth/authzserver,runs/service/auth/config,runs/config, andruns/service.End-to-end on the live development cluster
Deployed via
helm upgrade flyte charts/flyte-binary -f charts/flyte-binary/values-union.yaml -n flyteand verified:GET /login(external)redirect_uri=/callback, correctclient_id,scope=openid profilePOST /flyteidl2.project.ProjectService/ListProjects(external, no token)POST /flyteidl2.actions.ActionsService/CreateAction(in-cluster from task pod IP)POST /flyteidl2.workflow.RunService/CreateRunvia loopback (intra-process)POST /flyteidl2.workflow.RunService/CreateRunfrom non-loopback pod IP without token/etc/secrets/cookie_hash_key,cookie_block_key,oidc_client_secret,claim_symmetric_key,token_rsa_key.pemall mounted, no crashLabels
AuthMetadataServicemux registration panic; chartrun-service-auth-secret.yamlinclude-path typo; relative OIDC redirect URL rejected by confidential clients; intra-processhttp.DefaultClientcalls to own mux blocked by auth; task-podActionsServiceenqueue calls returningUnauthorizedCheck all the applicable boxes
main