fix: add gateway-level default-deny AuthPolicy to local-deploy#866
fix: add gateway-level default-deny AuthPolicy to local-deploy#866yossiovadia wants to merge 2 commits into
Conversation
Adds a gateway-default-auth AuthPolicy that blocks all model inference routes without a per-route AuthPolicy override. Routes with MaaSAuthPolicy (created by MaaS controller) override this default and allow authenticated access. Without this, models deployed via the new inference.opendatahub.io CRDs are completely unprotected — requests bypass Authorino entirely. The production MaaS deployment already has this policy (deployment/base/maas-controller/policies/gateway-default-auth.yaml) but local-deploy was missing it.
|
@coderabbitai review |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: yossiovadia The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
✅ Actions performedReview triggered.
|
|
Caution Review failedFailed to post review comments 📝 WalkthroughWalkthroughA default-deny AuthPolicy deployment block was added to test/e2e/scripts/local-deploy.sh. The script now creates an AuthPolicy named Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Security & Code Quality Issues
Only actionable issues listed. 🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
test/e2e/scripts/local-deploy.sh (1)
1545-1552:⚠️ Potential issue | 🟠 Major | ⚡ Quick winSmoke test now masks the regression this PR fixes (CWE-754).
With
gateway-default-authin place, an unauthenticatedGET /v1/modelsMUST return401. The current smoke test still treats200as success (line 1548-1549), so if the AuthPolicy fails to attach (CRD version drift, Kuadrant operator not ready, gateway reconcile race, etc.) the script will print "Gateway routing working (200 OK)" and exit 0 — silently shipping a wide-open gateway. That's exactly the failure mode you're trying to prevent.Tighten the assertion to fail on anything other than
401(or403).🔒 Proposed fix
# Test gateway routing (should get 401 — auth is working) HTTP_CODE=$(curl -s -o /dev/null -w '%{http_code}' --max-time 5 http://localhost:19090/v1/models 2>/dev/null || echo "000") -if [[ "$HTTP_CODE" == "401" ]]; then - ok "Gateway routing + auth working (401 Unauthorized — expected without API key)" -elif [[ "$HTTP_CODE" == "200" ]]; then - ok "Gateway routing working (200 OK)" -else - warn "Gateway returned HTTP ${HTTP_CODE} (may need a moment to stabilize)" -fi +if [[ "$HTTP_CODE" == "401" || "$HTTP_CODE" == "403" ]]; then + ok "Gateway default-deny enforced (HTTP ${HTTP_CODE} without API key)" +else + fail "Gateway default-deny NOT enforced: got HTTP ${HTTP_CODE} (expected 401/403). gateway-default-auth may not be attached." + exit 1 +fi🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/e2e/scripts/local-deploy.sh` around lines 1545 - 1552, The smoke test's HTTP check (assigning HTTP_CODE from the curl call) currently treats 200 as success; update the conditional logic that inspects HTTP_CODE (the block using the ok/warn helpers) so that only 401 or 403 are treated as success (call ok with a message like "Gateway routing + auth working (401/403 expected without API key)"); for any other code (including 200) fail the script (e.g., call the existing fail helper or exit non-zero) and emit a clear error message indicating unexpected open gateway. Ensure you change the branches that reference HTTP_CODE and the ok/warn behavior so 200 no longer counts as success.
🧹 Nitpick comments (3)
test/e2e/scripts/local-deploy.sh (3)
1485-1514: ⚡ Quick winExistence-check skips drift correction.
if kubectl get authpolicy gateway-default-auth ... ; then okmeans any subsequent edit to the policy spec in this script (or any local mutation) won't be re-applied on rerun.kubectl applyis already declarative/idempotent — drop the guard so re-running the script converges to the desired spec. Same anti-pattern is present elsewhere in this file, but at least don't propagate it for a security-critical resource.♻️ Proposed fix
-if kubectl get authpolicy gateway-default-auth -n "$GATEWAY_NAMESPACE" &>/dev/null; then - ok "Gateway default-deny AuthPolicy already deployed" -else - echo " Deploying gateway default-deny AuthPolicy..." - kubectl apply -f - <<EOF +echo " Applying gateway default-deny AuthPolicy..." +kubectl apply -f - <<EOF apiVersion: kuadrant.io/v1 kind: AuthPolicy metadata: name: gateway-default-auth namespace: ${GATEWAY_NAMESPACE} spec: targetRef: group: gateway.networking.k8s.io kind: Gateway name: maas-default-gateway defaults: rules: authentication: {} authorization: deny-unconfigured-models: metrics: false priority: 0 patternMatching: patterns: - operator: eq selector: "context.request.http.method" value: "__deny_unconfigured_models__" EOF - ok "Gateway default-deny AuthPolicy deployed" -fi +ok "Gateway default-deny AuthPolicy applied"🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/e2e/scripts/local-deploy.sh` around lines 1485 - 1514, The current existence-check around kubectl get for the AuthPolicy named gateway-default-auth prevents drift correction; remove the if/else guard and always run the kubectl apply -f - <<EOF block so the AuthPolicy (gateway-default-auth) in GATEWAY_NAMESPACE is declaratively reconciled on every script run; keep the same YAML payload and ok "Gateway default-deny AuthPolicy deployed" message after apply to preserve feedback.
1500-1511: 💤 Low valueReplace fragile magic-string deny rule with documented Kuadrant default-deny idiom.
The
request.http.method == "__deny_unconfigured_models__"pattern is brittle; if Kuadrant normalizes selector semantics or someone removes the placeholder, the rule silently breaks. Kuadrant 1.3.1 (your deployed version) documents a standard idiom: definespec.defaults.rules.authorization.deny-all.opa.rego: "allow = false"in a gateway-level AuthPolicy. This is explicit, intention-revealing, and maintained by Kuadrant upstream.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/e2e/scripts/local-deploy.sh` around lines 1500 - 1511, Replace the brittle magic-string rule under defaults.rules.authorization.deny-unconfigured-models (the patternMatching selector "context.request.http.method" == "__deny_unconfigured_models__") with Kuadrant's documented deny-all idiom: remove the deny-unconfigured-models rule and instead add a gateway-level AuthPolicy default entry spec.defaults.rules.authorization.deny-all.opa.rego with the body "allow = false"; update the defaults/auth policy where the symbols defaults, rules, authentication, authorization, and deny-unconfigured-models appear so the policy uses deny-all OPA rego rather than the fragile selector-based pattern.
1512-1514: ⚡ Quick winAdd explicit wait for AuthPolicy Accepted and Enforced conditions instead of generic sleep.
The script deploys the Gateway default-deny AuthPolicy but immediately proceeds with a generic 20-second sleep before smoke tests, risking false positives if Kuadrant's async policy enforcement hasn't completed. Add explicit condition waits:
Proposed fix
ok "Gateway default-deny AuthPolicy deployed" fi + +# Wait for Kuadrant to accept and enforce the policy before traffic tests. +kubectl wait --for=condition=Accepted authpolicy/gateway-default-auth \ + -n "$GATEWAY_NAMESPACE" --timeout=60s || \ + warn "gateway-default-auth not Accepted within 60s" +kubectl wait --for=condition=Enforced authpolicy/gateway-default-auth \ + -n "$GATEWAY_NAMESPACE" --timeout=60s || \ + warn "gateway-default-auth not Enforced within 60s"Kuadrant's AuthPolicy status conditions (Accepted and Enforced) provide deterministic readiness signals as defined in RFC 0004. Replace the generic sleep with these specific checks to fail fast if policy enforcement fails.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/e2e/scripts/local-deploy.sh` around lines 1512 - 1514, Replace the generic sleep after deploying the Gateway default-deny AuthPolicy with explicit readiness checks that wait for the AuthPolicy's status.conditions "Accepted" and "Enforced" to become True; locate the AuthPolicy apply/creation step in the script (the resource name or variable used to create the Gateway default-deny AuthPolicy) and replace the "sleep 20" with deterministic checks (e.g., kubectl wait or a short polling loop that uses kubectl get <AuthPolicyName> -o json and inspects status.conditions for types Accepted and Enforced == True), timeout quickly (e.g., 60s) and exit non‑zero if either condition is not satisfied so the smoke tests run only after the policy is accepted and enforced.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@test/e2e/scripts/local-deploy.sh`:
- Around line 1545-1552: The smoke test's HTTP check (assigning HTTP_CODE from
the curl call) currently treats 200 as success; update the conditional logic
that inspects HTTP_CODE (the block using the ok/warn helpers) so that only 401
or 403 are treated as success (call ok with a message like "Gateway routing +
auth working (401/403 expected without API key)"); for any other code (including
200) fail the script (e.g., call the existing fail helper or exit non-zero) and
emit a clear error message indicating unexpected open gateway. Ensure you change
the branches that reference HTTP_CODE and the ok/warn behavior so 200 no longer
counts as success.
---
Nitpick comments:
In `@test/e2e/scripts/local-deploy.sh`:
- Around line 1485-1514: The current existence-check around kubectl get for the
AuthPolicy named gateway-default-auth prevents drift correction; remove the
if/else guard and always run the kubectl apply -f - <<EOF block so the
AuthPolicy (gateway-default-auth) in GATEWAY_NAMESPACE is declaratively
reconciled on every script run; keep the same YAML payload and ok "Gateway
default-deny AuthPolicy deployed" message after apply to preserve feedback.
- Around line 1500-1511: Replace the brittle magic-string rule under
defaults.rules.authorization.deny-unconfigured-models (the patternMatching
selector "context.request.http.method" == "__deny_unconfigured_models__") with
Kuadrant's documented deny-all idiom: remove the deny-unconfigured-models rule
and instead add a gateway-level AuthPolicy default entry
spec.defaults.rules.authorization.deny-all.opa.rego with the body "allow =
false"; update the defaults/auth policy where the symbols defaults, rules,
authentication, authorization, and deny-unconfigured-models appear so the policy
uses deny-all OPA rego rather than the fragile selector-based pattern.
- Around line 1512-1514: Replace the generic sleep after deploying the Gateway
default-deny AuthPolicy with explicit readiness checks that wait for the
AuthPolicy's status.conditions "Accepted" and "Enforced" to become True; locate
the AuthPolicy apply/creation step in the script (the resource name or variable
used to create the Gateway default-deny AuthPolicy) and replace the "sleep 20"
with deterministic checks (e.g., kubectl wait or a short polling loop that uses
kubectl get <AuthPolicyName> -o json and inspects status.conditions for types
Accepted and Enforced == True), timeout quickly (e.g., 60s) and exit non‑zero if
either condition is not satisfied so the smoke tests run only after the policy
is accepted and enforced.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Central YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 35495ae9-a5c2-4e39-9876-1bc05572520e
📒 Files selected for processing (1)
test/e2e/scripts/local-deploy.sh
…tighten smoke test - Drop existence guard: always kubectl apply (idempotent, ensures drift correction) - Wait for AuthPolicy Accepted condition before proceeding (replaces blind sleep) - Smoke test: 200 without auth is now a warning, not success
|
@yossiovadia: The following test has Failed: OCI Artifact Browser URLInspecting Test Artifacts ManuallyTo inspect your test artifacts manually, follow these steps:
mkdir -p oras-artifacts
cd oras-artifacts
oras pull quay.io/opendatahub/odh-ci-artifacts:maas-group-test-2fwxp |
Summary
gateway-default-authAuthPolicy tolocal-deploy.sh— blocks all model inference routes that don't have a per-route AuthPolicyWhy
The production MaaS deployment already has this policy (
deployment/base/maas-controller/policies/gateway-default-auth.yaml), but local-deploy was missing it. Without it, models deployed via the newinference.opendatahub.ioCRDs are completely unprotected — requests bypass Authorino entirely, and any request (no auth, invalid auth) returns HTTP 200.Found during testing of ExternalProvider/ExternalModel CRDs (ai-gateway-payload-processing#183). The default-deny is the same pattern used in production — routes with
MaaSAuthPolicyoverride the defaults and allow authenticated access.What changed
test/e2e/scripts/local-deploy.sh— added a block after subscription/auth policy deployment that applies the gateway-default-auth AuthPolicy (idempotent, skips if already exists).Test plan
Summary by CodeRabbit