Skip to content

[V2] Add AuthMetadataService + wire external OAuth2 auth into runs service#6998

Open
pingsutw wants to merge 16 commits intov2from
revamp-auth-metadata-service
Open

[V2] Add AuthMetadataService + wire external OAuth2 auth into runs service#6998
pingsutw wants to merge 16 commits intov2from
revamp-auth-metadata-service

Conversation

@pingsutw
Copy link
Copy Markdown
Member

@pingsutw pingsutw commented Mar 9, 2026

Architecture

The v2 runs service acts as an OAuth2 resource server. It validates JWTs issued by a configured external OAuth2 authorization server and runs the standard OIDC code flow against an OpenID Connect IdP (e.g. Okta) for browser users. v2 does not mint tokens itself — authCfg.AppAuth.AuthServerType is External, and externalAuthServer.baseUrl points at the issuer whose .well-known/oauth-authorization-server + jwks_uri v2 reads at boot.

graph TB
    Client[UI / flyte-sdk / task pods]
    ALB["Ingress / load balancer"]
    IDP["OIDC IdP (Okta)<br/>signin.example.com<br/>client id 0oak...5d6"]
    AUTHSRV["External OAuth2 authorization server<br/>.well-known/oauth-authorization-server<br/>jwks_uri"]
    MW["runs GetAuthenticationHTTPInterceptor<br/>bearer + cookie + loopback gate"]
    OIDC["OIDC browser handlers<br/>/login /callback /logout<br/>.well-known/openid-configuration"]
    RPC["v2 connect-rpc handlers<br/>RunService ProjectService TaskService<br/>TriggerService TranslatorService<br/>AuthMetadataService IdentityService ..."]
    INT["intra-cluster connect-rpc<br/>ActionsService InternalRunService<br/>(ClusterIP only, allowlisted)"]
    COOKIE["CookieManager<br/>AES-256 + HMAC-SHA256<br/>cookie_hash_key cookie_block_key"]
    S1["auth secret<br/>cookie_hash_key / cookie_block_key"]
    S2["OIDC client secret<br/>oidc_client_secret"]

    Client --> ALB
    ALB -->|"/login /callback /logout /healthcheck<br/>/flyteidl2.*/*"| MW
    ALB -.->|"/v2/* static SPA"| MW
    Client -.->|"task-pod ClusterIP calls"| INT
    MW --> OIDC
    MW --> RPC
    OIDC <-->|"authorization code flow"| IDP
    OIDC --> COOKIE
    MW -.->|"boots JWKS"| AUTHSRV
    MW -.->|"validate bearer token signature"| AUTHSRV
    COOKIE -.->|"reads"| S1
    OIDC -.->|"reads"| S2
Loading

Component data flow

Component File(s) Reads Writes / Emits When
setupAuth() bootstrap runs/setup.go /etc/secrets/cookie_hash_key, /etc/secrets/cookie_block_key, /etc/secrets/oidc_client_secret, authCfg.AppAuth.ExternalAuthServer.BaseURL *AuthenticationContext, registers 4 HTTP handlers + connect-rpc services, sets sc.Middleware Once at pod start
ResourceServer runs/service/auth/authzserver/resource_server.go configured issuer .well-known/oauth-authorization-server then jwks_uri then JWKS verifies JWT signature, parses claims, returns *IdentityContext Once at boot (JWKS fetch), then per-request (verify)
GetAuthenticationHTTPInterceptor runs/service/auth/http_middleware.go req.URL.Path, req.Header.Get("Authorization"), request cookies, req.RemoteAddr injects IdentityContext into ctx, writes 401, or bypasses auth for loopback and public paths Every non-public request
IsPublicPath same file req.URL.Path skips middleware for health probes, OIDC flow paths, AuthMetadataService, ActionsService, InternalRunService Every request
isLoopbackRequest same file req.RemoteAddr bypasses middleware for intra-process connect-rpc calls from the unified binary to its own mux Every non-public request
IdentityContextFromRequest runs/service/auth/handlers.go Authorization: Bearer ... header then ResourceServer.ValidateAccessToken, or cookies then CookieManager.RetrieveTokenValues then IdentityContextFromIDToken *IdentityContext or error Called by middleware
CookieManager runs/service/auth/cookie_manager.go cookie_hash_key + cookie_block_key (base64, at startup); flyte_at / flyte_rt / flyte_idt / flyte_user_info cookies (per request) encrypted cookies on /login+/callback, decrypted payloads on middleware pass Startup + per request
GetLoginHandler runs/service/auth/handlers.go redirect_url query param sets flyte_csrf_state cookie; 307 to oauth2Config.AuthCodeURL(state) When client hits /login
GetCallbackHandler runs/service/auth/handlers.go code + state query params, flyte_csrf_state cookie, oidc_client_secret exchanges code at IdP, fetches user info, sets flyte_at/flyte_rt/flyte_idt/flyte_user_info cookies, 307 to post-login redirect When IdP redirects back
computeOIDCRedirectURL runs/service/auth/auth_context.go cfg.AuthorizedURIs[0] returns absolute URL like https://<host>/callback At NewAuthContext time, baked into oauth2.Config.RedirectURL
authzserver.NewAuthMetadataService runs/service/auth/authzserver/metadata_provider.go authCfg.AppAuth.ExternalAuthServer.BaseURL + .MetadataEndpointURL serves /flyteidl2.auth.AuthMetadataService/GetOAuth2Metadata and GetPublicClientConfig via connect-rpc Client discovery calls
Helm configmap.yaml renders 004-auth.yaml charts/flyte-binary/templates/configmap.yaml configuration.auth.* values renders auth: block into the runs ConfigMap Once per helm upgrade
Helm ingress/http.yaml minimalPaths charts/flyte-binary/templates/ingress/http.yaml .Values.ingress.minimalPaths emits ingress rules that claim only paths the runs service actually serves Once per helm upgrade

Flow 1 — API call with a bearer token (machine client, the common path)

sequenceDiagram
    autonumber
    participant C as flyte-sdk
    participant ALB as Ingress
    participant AS as External auth server
    participant V2 as runs service
    participant MW as HTTP auth middleware
    participant H as connect-rpc handler

    Note over C,AS: One-time bootstrap
    C->>AS: GET oauth authorization server metadata
    AS-->>C: issuer, token endpoint, jwks uri
    C->>AS: POST client credentials grant
    AS-->>C: signed access token

    Note over V2,AS: Boot-time JWKS cache on the runs service
    V2->>AS: GET metadata and jwks
    AS-->>V2: RSA public keys cached in memory

    Note over C,H: API call with bearer token
    C->>ALB: POST RunService CreateRun with Authorization Bearer
    ALB->>V2: forward to runs service
    V2->>MW: enter sc Middleware chain
    Note right of MW: CORS then auth then mux
    MW->>MW: IsPublicPath returns false
    MW->>MW: isLoopbackRequest returns false
    MW->>MW: IdentityContextFromRequest
    MW->>MW: ResourceServer ValidateAccessToken
    Note right of MW: verify signature against cached JWKS,<br/>check aud iss exp, build IdentityContext
    MW->>H: next ServeHTTP with IdentityContext
    H->>H: read identity, authorize, execute
    H-->>C: connect-rpc response
Loading

Flow 2 — Browser OIDC login via /login

sequenceDiagram
    autonumber
    participant B as Browser
    participant ALB as Ingress
    participant V2 as runs service
    participant OK as OIDC IdP
    participant CM as CookieManager

    B->>ALB: GET login with redirect url query
    ALB->>V2: forward
    V2->>V2: GetLoginHandler generates CSRF token
    V2->>B: Set-Cookie flyte_csrf_state (HttpOnly, Secure)
    V2->>B: 307 redirect to IdP authorize endpoint
    Note right of V2: client_id plus absolute redirect_uri,<br/>response_type code, scope openid profile,<br/>state set to hashed CSRF cookie value

    B->>OK: follow redirect
    OK->>B: login form, SSO, user authenticates
    OK->>B: 302 redirect to callback with code and state

    B->>ALB: GET callback with code and state, plus flyte_csrf_state cookie
    ALB->>V2: forward
    V2->>V2: VerifyCsrfCookie, hash of cookie equals state
    V2->>OK: POST token endpoint with authorization code
    Note right of V2: Basic auth sends client_id and oidc_client_secret
    OK-->>V2: access token, refresh token, id token
    V2->>OK: GET userinfo endpoint with access token
    OK-->>V2: sub, email, name

    V2->>CM: SetTokenCookies and SetUserInfoCookie
    Note right of CM: cookies encrypted with cookie_hash_key<br/>and cookie_block_key from /etc/secrets
    CM-->>V2: encrypted cookie values
    V2->>B: Set-Cookie flyte_at, flyte_rt, flyte_idt, flyte_user_info
    V2->>B: 307 redirect to original redirect url

    Note over B,V2: Subsequent requests reuse these cookies via IdentityContextFromRequest<br/>and CookieManager RetrieveTokenValues.
Loading

What v2 owns and what it defers

Responsibility Owner in this PR Notes
Verifying JWTs on API calls runs middleware ResourceServer caches JWKS at boot, verifies per request
Running the OIDC code flow for browsers runs /login + /callback Confidential client, absolute redirect_uri, encrypted cookies
Running the cookie encrypt/decrypt layer CookieManager AES-256 + HMAC-SHA256, keys loaded from /etc/secrets
Serving OAuth2 metadata discovery on the bare .well-known/ path deferred to the external authorization server The runs service's own flyteidl2.auth.AuthMetadataService is a connect-rpc surface for its own clients, not the bare well-known URL
Minting access / refresh / ID tokens deferred to the external authorization server The runs service does not host a self-hosted OAuth2 server in this PR
Enforcing auth on intra-cluster services (ActionsService, InternalRunService) bypassed via public-path allowlist, excluded from external ingress Task pods reach these through the ClusterIP service without credentials, matching the same-namespace trust model task SDKs already rely on

Where trust comes from

  • JWT signature trust: ResourceServer fetches the issuer's metadata and JWKS at boot, caches the RSA public key, and verifies every incoming JWT. Tokens whose aud does not match authCfg.AppAuth.ExternalAuthServer.AllowedAudience are rejected.
  • Cookie trust: CookieManager encrypts session cookies with AES-256 + HMAC-SHA256 using cookie_hash_key and cookie_block_key loaded from /etc/secrets at startup. Cookies are HttpOnly, Secure, and scoped to the deployment hostname.
  • OIDC client trust: /callback exchanges the authorization code with the IdP using the confidential client secret in /etc/secrets/oidc_client_secret. The redirect URI is the absolute URL computed from the first authorizedUris entry.
  • Intra-cluster trust: ActionsService and InternalRunService are reachable only via the ClusterIP service (never exposed via the external ingress) and are allowlisted in the auth middleware so in-cluster task pods can enqueue actions without carrying credentials.

Tracking issue

Why are the changes needed?

Two related goals, implemented in sequence on this branch:

  1. AuthMetadataService parity with upstream. The previous runs/service/auth_metadata.go was a static-config implementation that could not serve tokens from an external authorization server. The runs service needs two modes — self auth server (builds metadata from relative URLs) and external auth server (fetches from .well-known/oauth-authorization-server with retry logic) — to match the upstream auth shape.

  2. Actually turn authentication on in the runs service binary. Even with a working AuthMetadataService, nothing in runs.Setup() built an AuthenticationContext, registered the OIDC browser handlers, or enforced bearer/cookie validation on API calls. The auth package existed but was entirely unwired. This PR makes cfg.Security.UseAuth = true actually gate every non-public endpoint and kick browser users through the standard OIDC flow.

What changes were proposed in this pull request?

Go — runs/service/auth_metadata.go rewrite

  • runs/config/config.go — restructured auth config: AuthConfig with AuthorizedURIs, GrpcAuthorizationHeader, AppAuth, HTTPProxyURL, TokenEndpointProxyConfig; OAuth2Options with AuthServerType (Self/External), SelfAuthServer, ExternalAuthServer, ThirdParty; ExternalAuthorizationServer with retry config; FlyteClientConfig and TokenEndpointProxyConfig sub-structs.
  • runs/service/auth_metadata.go — two code paths: Self mode builds OAuth2 metadata with relative URLs (/oauth2/token, /oauth2/authorize, /oauth2/jwks) based on first AuthorizedURI; External mode fetches metadata from .well-known/oauth-authorization-server with retry, HTTP proxy support, and optional token endpoint proxy rewriting.

Go — wire external auth into the runs binary

  • runs/setup.go — new setupAuth() helper inside the if cfg.Security.UseAuth block:
    1. Mounts the real AuthMetadataService (via authzserver.NewAuthMetadataService) and IdentityService.
    2. Loads cookie_hash_key, cookie_block_key, and oidc_client_secret from /etc/secrets/.
    3. Builds a ResourceServer via authzserver.NewOAuth2ResourceServer (with fallback to the first authorizedUri when externalAuthServer.baseUrl is empty).
    4. Builds an AuthenticationContext via authservice.NewAuthContext.
    5. Registers /login, /callback, /logout, and /.well-known/openid-configuration via authservice.RegisterHandlers.
    6. Chains a new HTTP auth middleware with any pre-existing sc.Middleware (CORS stays outermost).
  • Also fixes a latent bug where authconnect.NewAuthMetadataServiceHandler was mounted twice (real + stub) on the same mux path — the duplicate registration would have panicked the pod the moment UseAuth=true.

Go — new: runs/service/auth/http_middleware.go

  • GetAuthenticationHTTPInterceptor(h *AuthHandlerConfig) func(http.Handler) http.Handler validates a bearer token or auth cookies on every request via the existing IdentityContextFromRequest, injects IdentityContext into the request context on success, and returns 401 on failure.
  • A public-path allowlist bypasses the middleware for /healthz, /readyz, /healthcheck, /login, /callback, /logout, /.well-known/, /flyteidl2.auth.AuthMetadataService/, /flyteidl2.actions.ActionsService/, and /flyteidl2.workflow.InternalRunService/ (the last two are reachable only via the ClusterIP service since they are excluded from the external ingress).
  • isLoopbackRequest(req) bypasses auth when the request originated from the loopback interface — this is required because the unified binary makes intra-process connect-rpc calls (e.g. RunService → ActionsService) to its own mux via http://localhost:<port>, and those calls have no Authorization header.
  • Respects cfg.DisableForHTTP as a global bypass.

Go — runs/service/auth/auth_context.go fixes

  • NewAuthContext now accepts an oidcClientSecret parameter and sets it on oauth2.Config.ClientSecret. Without this, the /callback code exchange with a confidential OIDC client (e.g. Okta) fails with invalid_client.
  • New computeOIDCRedirectURL(cfg) helper derives an absolute callback URL from cfg.AuthorizedURIs[0] + "/callback", replacing the previous relative string "callback" that IdPs rejected.

Go — generated enumer files

  • //go:generate enumer directives added to runs/service/auth/config/config.go.
  • Generated authorizationservertype_enumer.go and samesite_enumer.go for JSON marshal/unmarshal of the enum types.

Go — unit tests

  • runs/service/auth/http_middleware_test.go — table-driven IsPublicPath, public-path bypass, DisableForHTTP bypass, unauthenticated → 401, IPv4 loopback bypass, IPv6 loopback bypass, non-loopback still blocks, ActionsService reachable from pod IP without auth, isLoopbackRequest truth table.
  • runs/service/auth/auth_context_test.gocomputeOIDCRedirectURL cases (no authorizedUris, simple host, trailing slash, multiple uris, host with path prefix).
  • runs/service/auth/config/config_test.go — enumer JSON round-trips, invalid values, ThirdPartyConfigOptions.IsEmpty, MustParseURL, DefaultConfig sanity.
  • runs/service/auth/cookie_test.go — CSRF hash, CSRF token generation, NewSecureCookie round-trip, wrong-key decode, VerifyCsrfCookie happy/mismatch/missing/empty-state, NewRedirectCookie, GetAuthFlowEndRedirect query/cookie/fallback paths.
  • runs/service/auth/token_test.goNewOAuthTokenFromRaw, ExtractTokensFromOauthToken happy + nil + missing-id-token, bearerTokenFromMD / idTokenFromMD happy + wrong scheme + blank + no metadata.
  • runs/service/auth_metadata_test.go — 13 cases covering self mode, self-mode custom issuer, self-mode no authorizedUris, external-mode happy path, external-mode custom metadataUrl, external-mode token-proxy rewriting, external-mode missing base URL, retry paths.

Helm chart — charts/flyte-binary

  • templates/configmap.yaml — render a new 004-auth.yaml key from configuration.auth.* when configuration.auth.enabled. Includes auth.appAuth.externalAuthServer, auth.appAuth.thirdPartyConfig.flyteClient, auth.authorizedUris, auth.userAuth.openId, and runs.security.useAuth: true.
  • templates/_helpers.tplflyte-binary.configuration.auth.runServiceAuthSecretName now honors a new configuration.auth.runServiceAuthSecretRef override so deployments can reuse an existing admin-auth secret without fighting Helm ownership. The grpcPaths helper no longer emits InternalRunService or ActionsService — those are intra-cluster only and the middleware allowlist handles them.
  • templates/run-service-auth-secret.yaml — skip rendering entirely when the override is set.
  • templates/deployment.yaml — fix a latent include-path typo (/runservice-auth-secret.yaml/run-service-auth-secret.yaml) that would have broken helm upgrade the moment auth.enabled: true. Guard the checksum annotation with the override. Keep the extraInlineSecretRefs projection loop for mounting additional existing config secrets.
  • templates/ingress/http.yaml — new ingress.minimalPaths flag. When true, omits /.well-known, /.well-known/*, /me, /config, /config/*, /oauth2, /oauth2/*, /api, /api/*, /console, /console/* from the HTTP ingress so these paths can be served by a different deployment sharing the same ALB group.
  • values.yaml — new default fields:
    • configuration.auth.externalAuthServer.{baseUrl, metadataUrl, allowedAudience}
    • configuration.auth.runServiceAuthSecretRef: ""
    • ingress.minimalPaths: false (preserving existing behavior)

How was this patch tested?

Unit tests

go test ./runs/service/auth/... ./runs/config/... ./runs/service/...

All new and existing tests pass across runs/service/auth, runs/service/auth/authzserver, runs/service/auth/config, runs/config, and runs/service.

End-to-end on the live development cluster

Deployed via helm upgrade flyte charts/flyte-binary -f charts/flyte-binary/values-union.yaml -n flyte and verified:

Probe Expected Actual
GET /login (external) 307 → IdP with absolute redirect_uri=/callback, correct client_id, scope=openid profile ✅ 307 with exact URL
POST /flyteidl2.project.ProjectService/ListProjects (external, no token) 401 from middleware ✅ 401
POST /flyteidl2.actions.ActionsService/CreateAction (in-cluster from task pod IP) allowlisted, passes through middleware ✅ reaches handler
POST /flyteidl2.workflow.RunService/CreateRun via loopback (intra-process) loopback bypass, passes through middleware ✅ reaches handler
POST /flyteidl2.workflow.RunService/CreateRun from non-loopback pod IP without token 401 ✅ 401
Pod startup /etc/secrets/cookie_hash_key, cookie_block_key, oidc_client_secret, claim_symmetric_key, token_rsa_key.pem all mounted, no crash ✅ 1/1 Running

Labels

  • added: external-mode OAuth2 resource server, OIDC browser handlers, bearer/cookie HTTP middleware, loopback + in-cluster allowlists
  • fixed: duplicate AuthMetadataService mux registration panic; chart run-service-auth-secret.yaml include-path typo; relative OIDC redirect URL rejected by confidential clients; intra-process http.DefaultClient calls to own mux blocked by auth; task-pod ActionsService enqueue calls returning Unauthorized

Check all the applicable boxes

  • main
    • Flyte 2 #6583
      • [V2] Add AuthMetadataService + wire external OAuth2 auth into runs service 👈

Restructure auth config to match flyteadmin's shape with support for
self-hosted and external authorization server modes. The self mode builds
OAuth2 metadata from relative URLs based on AuthorizedURIs, while the
external mode fetches metadata from .well-known/oauth-authorization-server
with retry logic, HTTP proxy support, and token endpoint proxy rewriting.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Kevin Su <pingsutw@apache.org>
@github-actions github-actions bot mentioned this pull request Mar 9, 2026
3 tasks
@pingsutw pingsutw changed the title Revamp AuthMetadataService to match flyteadmin implementation [WIP] Add AuthMetadataService Mar 9, 2026
@pingsutw pingsutw marked this pull request as draft March 9, 2026 21:46
@pingsutw pingsutw self-assigned this Mar 9, 2026
@pingsutw pingsutw added the flyte2 label Mar 9, 2026
@pingsutw pingsutw added this to the V2 GA milestone Mar 10, 2026
pingsutw added 11 commits March 10, 2026 23:59
Signed-off-by: Kevin Su <pingsutw@apache.org>
Signed-off-by: Kevin Su <pingsutw@apache.org>
Signed-off-by: Kevin Su <pingsutw@apache.org>
Signed-off-by: Kevin Su <pingsutw@apache.org>
Signed-off-by: Kevin Su <pingsutw@apache.org>
Signed-off-by: Kevin Su <pingsutw@apache.org>
Signed-off-by: Kevin Su <pingsutw@apache.org>
Signed-off-by: Kevin Su <pingsutw@apache.org>
Signed-off-by: Kevin Su <pingsutw@apache.org>
Signed-off-by: Kevin Su <pingsutw@apache.org>
Enable external-mode OAuth2 authentication for the v2 runs service so
JWT bearer tokens and auth cookies are validated at the HTTP boundary
and the standard OIDC browser login flow is served from the same
binary.

Go changes:
- runs/setup.go: new setupAuth() builds ResourceServer + AuthContext,
  registers /login /callback /logout and the OIDC metadata redirect,
  chains the new HTTP auth middleware with existing middleware, and
  replaces the buggy duplicate AuthMetadataService mount with a single
  real-or-stub branch.
- runs/service/auth/http_middleware.go: bearer/cookie validator with
  a public-path allowlist (/healthz, /readyz, /healthcheck, /login,
  /callback, /logout, /.well-known/, /flyteidl2.auth.AuthMetadataService/).
- runs/service/auth/auth_context.go: NewAuthContext takes an
  oidcClientSecret and populates oauth2.Config.ClientSecret; RedirectURL
  is computed as an absolute URL from the first authorizedUri via a
  new computeOIDCRedirectURL helper.
- runs/service/auth/config/config.go: add go:generate enumer directives
  for AuthorizationServerType and SameSite.
- Generated enumer files for AuthorizationServerType and SameSite.
- Unit tests: http_middleware, computeOIDCRedirectURL, enumer round
  trips, cookie helpers, token helpers, config defaults.

Helm chart (charts/flyte-binary):
- templates/configmap.yaml: render a new 004-auth.yaml from
  configuration.auth.* (including externalAuthServer, authorizedUris,
  userAuth.openId, thirdPartyConfig.flyteClient, and
  runs.security.useAuth) when auth.enabled.
- templates/_helpers.tpl: runServiceAuthSecretName honors a new
  configuration.auth.runServiceAuthSecretRef override so deployments
  can reuse an existing admin-auth secret instead of re-rendering.
- templates/run-service-auth-secret.yaml: skip rendering when the
  override is set to avoid Helm ownership conflicts.
- templates/deployment.yaml: fix run-service-auth-secret include-path
  typo; guard its checksum with the override; keep the existing
  extraInlineSecretRefs projection loop.
- templates/ingress/http.yaml: new ingress.minimalPaths flag that
  omits /oauth2, /.well-known, /me, /config, /v1/*, /api, and
  /console paths so they can fall through to an adjacent Flyte
  deployment sharing the same ALB ingress group.
- values.yaml: defaults for configuration.auth.externalAuthServer.*,
  configuration.auth.runServiceAuthSecretRef, and
  ingress.minimalPaths (false, preserving existing behavior).

Signed-off-by: Kevin Su <pingsutw@apache.org>
@pingsutw pingsutw changed the title [WIP] Add AuthMetadataService [V2] Add AuthMetadataService + wire external OAuth2 auth into runs service Apr 11, 2026
The unified Flyte binary uses connect-rpc clients that talk to their
own mux via http://localhost:<port> (e.g. RunService calls
ActionsService.CreateAction). Those calls have no Authorization header
because they're in-process, and the new external auth middleware was
rejecting them with 401 — so run creation silently failed end-to-end.

Bypass auth when req.RemoteAddr is a loopback address (127.0.0.0/8 or
::1). External traffic from the ALB never has a loopback remote addr,
so this doesn't widen the attack surface.

Add table-driven isLoopbackRequest tests and middleware tests for both
IPv4 and IPv6 loopback and a non-loopback pod IP.

Signed-off-by: Kevin Su <pingsutw@apache.org>
…alRunService

Task pods running flytekit call ActionsService.CreateAction (and
InternalRunService) via the flyte2-grpc ClusterIP service to enqueue
subsequent actions. Those calls arrive at the pod with the task pod's
IP as RemoteAddr — not loopback — so the loopback bypass does not
catch them, and the external auth middleware was returning 401, which
flytekit reported as "Failed to launch action: Unauthorized" and task
execution failed.

Add /flyteidl2.actions.ActionsService/ and
/flyteidl2.workflow.InternalRunService/ to the public-path allowlist
so in-cluster traffic to these services passes without credentials.
Remove the same paths from the ingress grpcPaths helper so they are
not exposed via the external ALB — they remain reachable only through
the ClusterIP service inside the cluster, matching v1's propeller ->
flyteadmin pattern.

Update the table-driven IsPublicPath test and swap the loopback /
non-loopback test path to RunService so the assertion still exercises
the gate rather than the new public path.

Signed-off-by: Kevin Su <pingsutw@apache.org>
Signed-off-by: Kevin Su <pingsutw@apache.org>
The external auth server's .well-known/oauth-authorization-server
response uses camelCase JSON keys (authorizationEndpoint, tokenEndpoint,
jwksUri) per the proto3 JSON specification. The v2 proto struct has
snake_case JSON tags (authorization_endpoint, token_endpoint, jwks_uri).

json.Unmarshal only matched the issuer field (same case), silently
dropping all other fields. This caused GetOAuth2Metadata to return
only {"issuer":"..."}, breaking CLI auth bootstrap — the client could
not discover the token or authorization endpoints.

Switch unmarshalResp from json.Unmarshal to protojson.Unmarshal, which
accepts both camelCase and snake_case input per the protobuf spec.

Signed-off-by: Kevin Su <pingsutw@apache.org>
@pingsutw pingsutw marked this pull request as ready for review April 16, 2026 07:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant