Skip to content

Fix: Allow DHCP traffic on unmanaged VLAN bridges#1307

Open
blackdragoon26 wants to merge 5 commits into
containers:mainfrom
blackdragoon26:fix-vlan-dhcp
Open

Fix: Allow DHCP traffic on unmanaged VLAN bridges#1307
blackdragoon26 wants to merge 5 commits into
containers:mainfrom
blackdragoon26:fix-vlan-dhcp

Conversation

@blackdragoon26
Copy link
Copy Markdown

@blackdragoon26 blackdragoon26 commented Aug 11, 2025

This PR is being reworked in response to review feedback and now depends on:

nispor/mozim#79

The previous implementation here introduced a dedicated per-container DHCP
worker/runtime after entering the target network namespace. Maintainer feedback
was that this was the wrong layer and would not scale well.

The new approach is to move the namespace-sensitive socket binding into mozim,
so Netavark can keep its normal async/task model while still sending DHCP
traffic from the correct target namespace and interface context for unmanaged
bridge VLAN setups.

Once the mozim change is reviewed, this PR will be updated to:

  • remove the namespace-bound DHCP worker/runtime approach
  • use mozim netns-aware socket creation
  • keep the DHCP proxy in the host namespace
  • preserve correct DHCP behavior for unmanaged bridge VLAN networks

Related dependency PR:
nispor/mozim#79

@sourcery-ai
Copy link
Copy Markdown

sourcery-ai Bot commented Aug 11, 2025

Reviewer's Guide

This PR introduces a netlink-based helper to allow DHCP broadcast packets on VLAN-tagged subinterfaces and integrates its invocation into unmanaged bridge setup when a VLAN ID is specified.

Sequence diagram for DHCP allow rule on VLAN during unmanaged bridge setup

sequenceDiagram
    participant BridgeSetup
    participant NetlinkHelper
    participant Netlink
    BridgeSetup->>NetlinkHelper: Call allow_dhcp_on_vlan(bridge_name, vlan_id)
    NetlinkHelper->>Netlink: Create netlink connection
    NetlinkHelper->>Netlink: Lookup VLAN interface by name
    alt VLAN interface found
        NetlinkHelper->>Netlink: (Would) configure VLAN filtering to allow DHCP
    else VLAN interface not found or error
        NetlinkHelper->>BridgeSetup: Log warning
    end
Loading

Class diagram for new allow_dhcp_on_vlan helper and Bridge integration

classDiagram
    class Bridge {
        +create_interfaces(...)
    }
    class NetlinkHelper {
        +allow_dhcp_on_vlan(bridge_name: str, vlan_id: u16)
    }
    Bridge --> NetlinkHelper : uses
    NetlinkHelper : +allow_dhcp_on_vlan(bridge_name, vlan_id)
Loading

File-Level Changes

Change Details Files
Add helper to configure DHCP allow rules on VLAN subinterfaces
  • Define allow_dhcp_on_vlan function
  • Establish a netlink connection and spawn its task
  • Lookup VLAN interface by name and log status
  • Provide info/warn logs based on lookup outcome
  • Return Result<(), Error>
src/network/netlink.rs
Invoke DHCP-allow helper for unmanaged bridges with VLAN IDs
  • Log configuration intent for VLAN-filtered unmanaged bridge
  • Call allow_dhcp_on_vlan with bridge name and VLAN ID
  • Wrap invocation in error context
src/network/bridge.rs

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Signed-off-by: blackdragoon26 <sankalp.jha9643@gmail.com>
@packit-as-a-service
Copy link
Copy Markdown

Ephemeral COPR build failed. @containers/packit-build please check.

Copy link
Copy Markdown

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @blackdragoon26 - I've reviewed your changes - here's some feedback:

  • allow_dhcp_on_vlan currently spins up its own Tokio runtime and netlink connection per call—consider making it async and reusing the existing runtime/handle to avoid nested runtimes.
  • The allow_dhcp_on_vlan function has a TODO for actually configuring the VLAN filter—either implement that logic now or remove the placeholder to prevent silent misconfigurations.
  • In the unmanaged bridge setup you still hit the Err(err) return even after calling allow_dhcp_on_vlan, so the control flow never proceeds; adjust the early return so VLAN handling completes as intended.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- allow_dhcp_on_vlan currently spins up its own Tokio runtime and netlink connection per call—consider making it async and reusing the existing runtime/handle to avoid nested runtimes.
- The allow_dhcp_on_vlan function has a TODO for actually configuring the VLAN filter—either implement that logic now or remove the placeholder to prevent silent misconfigurations.
- In the unmanaged bridge setup you still hit the `Err(err)` return even after calling allow_dhcp_on_vlan, so the control flow never proceeds; adjust the early return so VLAN handling completes as intended.

## Individual Comments

### Comment 1
<location> `src/network/netlink.rs:637` </location>
<code_context>
+    tokio::spawn(connection);
+
+    // Lookup VLAN interface by name
+    let mut rt = tokio::runtime::Runtime::new().unwrap();
+    let mut links = rt.block_on(
+        handle
+            .link()
</code_context>

<issue_to_address>
Using a new Tokio runtime inside an async context can lead to resource issues.

Instead of creating a new runtime, pass an existing handle or refactor to prevent nested runtimes and potential thread pool contention.
</issue_to_address>

### Comment 2
<location> `src/network/netlink.rs:652` </location>
<code_context>
+                "Found VLAN interface {}, would configure bridge VLAN filtering to allow DHCP (UDP 67/68)",
+                vlan_iface
+            );
+            // TODO: implement actual VLAN filtering adjustment here
+        }
+        Ok(None) => {
</code_context>

<issue_to_address>
Missing implementation for VLAN filtering adjustment.

If this is meant to be a stub, clarify its status or return an error to indicate the feature is not yet implemented.
</issue_to_address>

<suggested_fix>
<<<<<<< SEARCH
            info!(
                "Found VLAN interface {}, would configure bridge VLAN filtering to allow DHCP (UDP 67/68)",
                vlan_iface
            );
            // TODO: implement actual VLAN filtering adjustment here
=======
            info!(
                "Found VLAN interface {}, would configure bridge VLAN filtering to allow DHCP (UDP 67/68)",
                vlan_iface
            );
            warn!(
                "VLAN filtering adjustment for interface {} is not yet implemented.",
                vlan_iface
            );
            return Err(anyhow::anyhow!(
                "VLAN filtering adjustment for interface {} is not yet implemented.",
                vlan_iface
            ));
>>>>>>> REPLACE

</suggested_fix>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment thread src/network/netlink.rs Outdated
Comment on lines +637 to +638
let mut rt = tokio::runtime::Runtime::new().unwrap();
let mut links = rt.block_on(
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (bug_risk): Using a new Tokio runtime inside an async context can lead to resource issues.

Instead of creating a new runtime, pass an existing handle or refactor to prevent nested runtimes and potential thread pool contention.

Comment thread src/network/netlink.rs Outdated
Comment on lines +648 to +652
info!(
"Found VLAN interface {}, would configure bridge VLAN filtering to allow DHCP (UDP 67/68)",
vlan_iface
);
// TODO: implement actual VLAN filtering adjustment here
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Missing implementation for VLAN filtering adjustment.

If this is meant to be a stub, clarify its status or return an error to indicate the feature is not yet implemented.

Suggested change
info!(
"Found VLAN interface {}, would configure bridge VLAN filtering to allow DHCP (UDP 67/68)",
vlan_iface
);
// TODO: implement actual VLAN filtering adjustment here
info!(
"Found VLAN interface {}, would configure bridge VLAN filtering to allow DHCP (UDP 67/68)",
vlan_iface
);
warn!(
"VLAN filtering adjustment for interface {} is not yet implemented.",
vlan_iface
);
return Err(anyhow::anyhow!(
"VLAN filtering adjustment for interface {} is not yet implemented.",
vlan_iface
));

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Aug 11, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: blackdragoon26, sourcery-ai[bot]
Once this PR has been reviewed and has the lgtm label, please assign luap99 for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@blackdragoon26
Copy link
Copy Markdown
Author

Hihi maintainers, there has been revamp in DHCP lease handling for unmanaged bridge VLANs

This is a major rework of the initial approach to resolve issue #1294 .
Previously, DHCP requests were sourced from the host-side interface, causing
unmanaged bridge VLAN configurations to route DHCP traffic on the wrong VLAN.
This update moves all DHCP traffic generation directly into the target network
namespace.

Key changes:

  • Replace async renewal tasks with a dedicated per-lease DHCP worker.
  • The worker uses setns() to enter the target namespace, starts a Tokio
    runtime, and binds the DHCP client to container_iface instead of host_iface.
  • Ensure renewals and rebinds utilize this same namespace-bound worker.
  • Implement clean worker shutdown during teardown (replacing task aborts).
  • Remove previous VLAN helper experiments, restoring the standard bridge flow.
  • Update netavark dhcp-proxy test helper to validate the namespace transition.

@blackdragoon26 blackdragoon26 force-pushed the fix-vlan-dhcp branch 3 times, most recently from 89e0021 to e25dbed Compare April 15, 2026 05:04
Signed-off-by: blackdragoon26 <sankalp.jha9643@gmail.com>
@blackdragoon26
Copy link
Copy Markdown
Author

@sourcery-ai review

Signed-off-by: blackdragoon26 <sankalp.jha9643@gmail.com>

# Conflicts:
#	src/dhcp_proxy/dhcp_service.rs
@blackdragoon26
Copy link
Copy Markdown
Author

Hi @Luap99 and @flouthoc , can you review the PR.

Copy link
Copy Markdown
Member

@Luap99 Luap99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry I do not have time to properly review this. Overall this approach here feels wrong, creating a new tokio runtime per container does not scale at all and I am not confident in the way the netns switching is done.

I would much prefer it if we add this via support for the mozim lib to only bind the raw socket in the netns there when we actually open the socket and not have to all this here

@blackdragoon26
Copy link
Copy Markdown
Author

sorry I do not have time to properly review this. Overall this approach here feels wrong, creating a new tokio runtime per container does not scale at all and I am not confident in the way the netns switching is done.

I would much prefer it if we add this via support for the mozim lib to only bind the raw socket in the netns there when we actually open the socket and not have to all this here

Yeah, that makes sense, thanks for that.
I agree the current per-container runtime/thread plus Netavark-side setns() is too heavy and puts namespace switching in the wrong layer.

I’ll rework this by moving the namespace-specific raw socket creation into mozim: add an option on DhcpV4Config to create/bind the DHCPv4 raw socket inside the target netns, with the namespace switch scoped only to socket creation and restored immediately afterward.

Then I will update this Netavark PR to remove the custom DHCP worker/runtime changes and just pass the target netns/container iface information into mozim.
I will link the mozim PR first, then refresh this PR once that support is available.

@blackdragoon26
Copy link
Copy Markdown
Author

Per the feedback about keeping the namespace-sensitive socket work in mozim, I split that out first here:

nispor/mozim#79

That PR adds scoped netns handling only around DHCP socket open/bind, and restores the original netns before returning to async code. I have the Netavark side reworked locally to use the normal task model again, and I’ll update this PR once the mozim direction is reviewed.

@cathay4t
Copy link
Copy Markdown
Contributor

cathay4t commented May 1, 2026

I have the same concern as @Luap99 , I need more detail on what is not working instead of this is how I fix the my problem.

@blackdragoon26
Copy link
Copy Markdown
Author

Thanks for pushing back on the previous direction. I reworked this to avoid the netns/runtime approach entirely. @cathay4t

The issue I reproduced is that for an unmanaged bridge with vlan=40, Netavark was asking the DHCP proxy to bind to the bridge itself. DHCP traffic sourced from the bridge can use the bridge self/default VLAN, so in a trunk setup it can obtain a lease from the wrong VLAN. In my repro, the container veth was correctly configured for VLAN 40, but DHCP got a lease from the untagged VLAN 20 network.

This update keeps the DHCP proxy/mozim flow unchanged. For unmanaged bridge networks with both vlan=<id> and DHCP, Netavark now uses a bridge VLAN interface for DHCP, e.g. br0.40, creating it if needed and validating it if it already exists. That makes the DHCP socket bind to an interface representing the intended VLAN while keeping renewals in the existing proxy task model.

I also added an integration test that sets up:

  • untagged VLAN 20 DHCP range
  • tagged VLAN 40 DHCP range
  • unmanaged bridge network configured with vlan=40
  • assertion that Netavark receives the VLAN 40 lease/gateway and dnsmasq sees the DHCPACK on srv0.40

Local verification on x86 Linux:

  • cargo check --bins
  • cargo build --bins
  • cargo test --lib
  • cargo clippy --bins -- -D warnings
  • make validate
  • manual repro for missing br0.40, pre-existing br0.40, and teardown

@blackdragoon26
Copy link
Copy Markdown
Author

@Luap99 , Can you please have a look in this PR now.

@blackdragoon26 blackdragoon26 requested a review from Luap99 May 1, 2026 21:43
Signed-off-by: blackdragoon26 <sankalp.jha9643@gmail.com>
@blackdragoon26
Copy link
Copy Markdown
Author

@Luap99 , Can you please have a look in this PR now.

@Luap99 , pinging you for the above, thanks.

@blackdragoon26
Copy link
Copy Markdown
Author

Hihi @Luap99 and @flouthoc , gentle reminder, for this PR review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants