Add vendor assessment agent with structured output by aureliensibiril · Pull Request #982 · getprobo/probo

aureliensibiril · 2026-04-02T14:52:02Z

Summary

Add composable Toolset interface, TypedTool[In, Out] factory, and result helpers to pkg/agent for reusable agent tooling
Build multi-agent vendor vetting orchestrator with 16 specialized sub-agents (crawler, security, compliance, data processing, AI risk, incident response, etc.)
Add browser, security, and search toolsets with SSRF protection for agent use
Define structured output types for all sub-agents with JSON schema enforcement via agent.WithOutputType
Wire ResponseFormat into the Anthropic provider (OutputConfig.Format), enabling API-level structured output — previously silently dropped
Add JSON validation in agent-as-tool execution path, returning error tool results on invalid output
Update all sub-agent prompts with explicit JSON output format sections
Wire vendor assessment into the service layer with GraphQL mutation, MCP tool, and CLI command
Remove task priority and rank across schema, UI, MCP, and tests

Changes by area

Agent framework (`pkg/agent`)

Toolset API with WithToolsets(...)
TypedTool[In, Out] with ResultJSON/ResultError(f) helpers
WithOutputType(...) for structured JSON output enforcement
WithThinking(...) for extended thinking budget
Agent-as-tool now validates JSON output before returning to parent

LLM providers (`pkg/llm`)

Anthropic: wire ResponseFormat → OutputConfig.Format (JSON schema enforcement)
OpenAI: already supported (no change)
Both providers support file parts (PDF, CSV)

Vendor assessment (`pkg/agents/vetting`)

16 typed output structs with jsonschema tags in output_types.go
All sub-agents use WithOutputType for schema-enforced JSON responses
Orchestrator coordinates sub-agents with parallel tool calls and progress reporting
Prompts updated with JSON output format examples

Task priority removal

Remove rank field from Task schema and UI
Remove priority-based drag-and-drop reordering logic
Clean up unused imports and dead code

Test plan

make test MODULE=./pkg/agent/... — all tests pass
make test MODULE=./pkg/agents/vetting/... — output type schema tests pass
go vet / go build — clean on all affected packages
Verify vendor assessment end-to-end with run.sh

aureliensibiril · 2026-04-08T10:09:01Z

@cubic-dev-ai review this PR

cubic-dev-ai · 2026-04-08T10:09:12Z

@cubic-dev-ai review this PR

@aureliensibiril I have started the AI code review. It will take a few minutes to complete.

cubic-dev-ai

11 issues found across 101 files

Note: This PR contains a large number of files. cubic only reviews up to 75 files per PR, so some files may not have been reviewed. We prioritized the most important files first.

Prompt for AI agents (unresolved issues)


Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="pkg/agents/vetting/prompts/code_security.txt">

<violation number="1" location="pkg/agents/vetting/prompts/code_security.txt:73">
P2: Example outputs are not valid JSON, conflicting with the enforced JSON schema and likely causing invalid outputs when the model follows the examples.</violation>
</file>

<file name="pkg/agents/vetting/prompts/analyzer.txt">

<violation number="1" location="pkg/agents/vetting/prompts/analyzer.txt:66">
P2: Examples contradict the JSON output requirement; they show semicolon-delimited text instead of valid JSON, which can lead the model to emit invalid JSON and fail schema enforcement.</violation>
</file>

<file name="pkg/agents/vetting/prompts/compliance.txt">

<violation number="1" location="pkg/agents/vetting/prompts/compliance.txt:45">
P2: Examples contradict the JSON‑schema requirement and show non‑JSON output formats, which can lead the model to emit invalid JSON and fail schema validation.</violation>
</file>

<file name="pkg/agent/tools/browser/fetch_robots.go">

<violation number="1" location="pkg/agent/tools/browser/fetch_robots.go:92">
P2: Disallow parsing lowercases the entire line and uses the lowercased remainder as the path, which changes case-sensitive URL paths and can misreport disallowed entries.</violation>
</file>

<file name="pkg/agent/tools/search/diff_documents.go">

<violation number="1" location="pkg/agent/tools/search/diff_documents.go:66">
P2: Oversized documents are incorrectly reported as having no differences, and the tool suppresses the "too large" diagnostic output.</violation>
</file>

<file name="pkg/agent/tools/internal/netcheck/netcheck.go">

<violation number="1" location="pkg/agent/tools/internal/netcheck/netcheck.go:34">
P1: `IsPublicIP` does not block all multicast addresses; it only blocks link-local multicast, allowing other multicast ranges to be treated as public.</violation>
</file>

<file name="pkg/agent/tools/browser/click.go">

<violation number="1" location="pkg/agent/tools/browser/click.go:57">
P1: Click-triggered navigation is not revalidated, allowing bypass of initial domain/URL SSRF checks.</violation>
</file>

<file name="pkg/agents/vetting/prompts/ai_risk.txt">

<violation number="1" location="pkg/agents/vetting/prompts/ai_risk.txt:69">
P2: Examples under `<examples>` are not valid JSON despite the prompt requiring schema-enforced JSON output; the semicolon-separated format can bias the model toward invalid JSON and break strict validation.</violation>
</file>

<file name="pkg/agents/vetting/prompts/incident_response.txt">

<violation number="1" location="pkg/agents/vetting/prompts/incident_response.txt:59">
P2: Examples contradict the JSON output requirement; they use a semicolon-delimited key:value list rather than valid JSON, which can cause the model to emit invalid output for the enforced schema.</violation>
</file>

<file name="pkg/agent/tools/browser/extract_text.go">

<violation number="1" location="pkg/agent/tools/browser/extract_text.go:70">
P2: Text size is capped only after full-page extraction, so huge pages can still cause large transfer/allocation overhead before truncation.</violation>
</file>

<file name="pkg/agents/vetting/assessment.go">

<violation number="1" location="pkg/agents/vetting/assessment.go:196">
P2: Research browser is created without any allowed-domain restriction; browser tool permits any http/https URL when no allowedDomains are set, enabling SSRF-style access to internal endpoints.</violation>
</file>

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.}

Three reinforcements on the browser navigation path, all surfaced by cubic code review on PR getprobo#982: - netcheck.IsPublicIP now rejects the full multicast range (ip.IsMulticast) rather than only link-local multicast, so addresses in 224.0.0.0/4 and 239.0.0.0/8 can no longer slip through the SSRF guard. - Browser.checkURL now runs netcheck.ValidatePublicURL on every URL, even when no allowed-domain list is set. The research browser in the vendor assessment is intentionally allowed to roam the public web, but it must still refuse URLs that resolve to loopback, private, or link-local IPs. - ClickElementTool reads the post-click location and feeds it back through Browser.checkURL. A click that triggers navigation to a different host (JS-initiated redirect, malicious <a href>, vendor page hijack) used to extract text from whatever page the browser ended up on; that path could bypass the initial checkURL call and read internal endpoints. The post-click revalidation closes that gap. Signed-off-by: Aurélien Sibiril <81782+aureliensibiril@users.noreply.github.com>

Three defects flagged by cubic code review on PR getprobo#982: - fetch_robots_txt lowercased the entire Disallow line before reading the path value, corrupting case-sensitive paths (e.g. /Admin/ reported as /admin/). Match the sitemap handling and read the path off the original-case raw line. - extract_page_text pulled the full document.body.innerText over the DevTools protocol before truncating on the Go side, so a huge page could burn bandwidth and memory well beyond maxTextLength. Slice the string in JS at 4x maxTextLength code units first (safe upper bound for UTF-16 code units per Go rune) before transferring, then finish the rune-exact truncation in Go. - diff_documents silently dropped the "documents too large for detailed diff" message when either side exceeded the 5000-line LCS cap, returning HasDifferences=false and an empty UnifiedDiff. Add a tooLarge flag on the internal diffOutput and surface the message via ErrorDetail so the caller can distinguish "no differences" from "too large to compare". Signed-off-by: Aurélien Sibiril <81782+aureliensibiril@users.noreply.github.com>

CodeQL flagged InsecureSkipVerify in check_ssl_certificate on PR getprobo#982. The tool is a cert INSPECTOR: we intentionally connect to servers whose certificates may be expired, self-signed, or otherwise invalid because reporting on that state is the entire purpose of the tool. The handshake's built-in verification is disabled, then the code manually runs x509.Verify on the returned chain and reports the result in the Valid field. No credentials or confidential data are ever sent over the connection. Document the intent inline and add a //nolint:gosec directive so the scanner stops flagging this path. Signed-off-by: Aurélien Sibiril <81782+aureliensibiril@users.noreply.github.com>

cubic-dev-ai

1 issue found across 13 files (changes from recent commits).

Prompt for AI agents (unresolved issues)


Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="pkg/agent/tools/browser/click.go">

<violation number="1" location="pkg/agent/tools/browser/click.go:74">
P1: Post-click URL validation is performed only after click-triggered navigation, so SSRF-blocked destinations may still be contacted before rejection.</violation>
</file>

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.}

Three reinforcements on the browser navigation path, all surfaced by cubic code review on PR getprobo#982: - netcheck.IsPublicIP now rejects the full multicast range (ip.IsMulticast) rather than only link-local multicast, so addresses in 224.0.0.0/4 and 239.0.0.0/8 can no longer slip through the SSRF guard. - Browser.checkURL now runs netcheck.ValidatePublicURL on every URL, even when no allowed-domain list is set. The research browser in the vendor assessment is intentionally allowed to roam the public web, but it must still refuse URLs that resolve to loopback, private, or link-local IPs. - ClickElementTool reads the post-click location and feeds it back through Browser.checkURL. A click that triggers navigation to a different host (JS-initiated redirect, malicious <a href>, vendor page hijack) used to extract text from whatever page the browser ended up on; that path could bypass the initial checkURL call and read internal endpoints. The post-click revalidation closes that gap. Signed-off-by: Aurélien Sibiril <81782+aureliensibiril@users.noreply.github.com>

Three defects flagged by cubic code review on PR getprobo#982: - fetch_robots_txt lowercased the entire Disallow line before reading the path value, corrupting case-sensitive paths (e.g. /Admin/ reported as /admin/). Match the sitemap handling and read the path off the original-case raw line. - extract_page_text pulled the full document.body.innerText over the DevTools protocol before truncating on the Go side, so a huge page could burn bandwidth and memory well beyond maxTextLength. Slice the string in JS at 4x maxTextLength code units first (safe upper bound for UTF-16 code units per Go rune) before transferring, then finish the rune-exact truncation in Go. - diff_documents silently dropped the "documents too large for detailed diff" message when either side exceeded the 5000-line LCS cap, returning HasDifferences=false and an empty UnifiedDiff. Add a tooLarge flag on the internal diffOutput and surface the message via ErrorDetail so the caller can distinguish "no differences" from "too large to compare". Signed-off-by: Aurélien Sibiril <81782+aureliensibiril@users.noreply.github.com>

CodeQL flagged InsecureSkipVerify in check_ssl_certificate on PR getprobo#982. The tool is a cert INSPECTOR: we intentionally connect to servers whose certificates may be expired, self-signed, or otherwise invalid because reporting on that state is the entire purpose of the tool. The handshake's built-in verification is disabled, then the code manually runs x509.Verify on the returned chain and reports the result in the Valid field. No credentials or confidential data are ever sent over the connection. Document the intent inline and add a //nolint:gosec directive so the scanner stops flagging this path. Signed-off-by: Aurélien Sibiril <81782+aureliensibiril@users.noreply.github.com>

Three reinforcements on the browser navigation path, all surfaced by cubic code review on PR getprobo#982: - netcheck.IsPublicIP now rejects the full multicast range (ip.IsMulticast) rather than only link-local multicast, so addresses in 224.0.0.0/4 and 239.0.0.0/8 can no longer slip through the SSRF guard. - Browser.checkURL now runs netcheck.ValidatePublicURL on every URL, even when no allowed-domain list is set. The research browser in the vendor assessment is intentionally allowed to roam the public web, but it must still refuse URLs that resolve to loopback, private, or link-local IPs. - ClickElementTool reads the post-click location and feeds it back through Browser.checkURL. A click that triggers navigation to a different host (JS-initiated redirect, malicious <a href>, vendor page hijack) used to extract text from whatever page the browser ended up on; that path could bypass the initial checkURL call and read internal endpoints. The post-click revalidation closes that gap. Signed-off-by: Aurélien Sibiril <81782+aureliensibiril@users.noreply.github.com>

Three defects flagged by cubic code review on PR getprobo#982: - fetch_robots_txt lowercased the entire Disallow line before reading the path value, corrupting case-sensitive paths (e.g. /Admin/ reported as /admin/). Match the sitemap handling and read the path off the original-case raw line. - extract_page_text pulled the full document.body.innerText over the DevTools protocol before truncating on the Go side, so a huge page could burn bandwidth and memory well beyond maxTextLength. Slice the string in JS at 4x maxTextLength code units first (safe upper bound for UTF-16 code units per Go rune) before transferring, then finish the rune-exact truncation in Go. - diff_documents silently dropped the "documents too large for detailed diff" message when either side exceeded the 5000-line LCS cap, returning HasDifferences=false and an empty UnifiedDiff. Add a tooLarge flag on the internal diffOutput and surface the message via ErrorDetail so the caller can distinguish "no differences" from "too large to compare". Signed-off-by: Aurélien Sibiril <81782+aureliensibiril@users.noreply.github.com>

CodeQL flagged InsecureSkipVerify in check_ssl_certificate on PR getprobo#982. The tool is a cert INSPECTOR: we intentionally connect to servers whose certificates may be expired, self-signed, or otherwise invalid because reporting on that state is the entire purpose of the tool. The handshake's built-in verification is disabled, then the code manually runs x509.Verify on the returned chain and reports the result in the Valid field. No credentials or confidential data are ever sent over the connection. Document the intent inline and add a //nolint:gosec directive so the scanner stops flagging this path. Signed-off-by: Aurélien Sibiril <81782+aureliensibiril@users.noreply.github.com>

cubic-dev-ai

2 issues found across 6 files (changes from recent commits).

Prompt for AI agents (unresolved issues)


Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="GNUmakefile">

<violation number="1" location="GNUmakefile:131">
P2: `test-e2e` hardcodes `bin/probod-e2e` instead of using `PROBOD_E2E_BIN`, so overriding the variable can run the wrong or stale binary.</violation>

<violation number="2" location="GNUmakefile:214">
P2: The new `bin/probod-e2e` target duplicates `bin/probod` prerequisites, creating dependency-list drift risk and fragile build maintenance.</violation>
</file>

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.}

cubic-dev-ai

1 issue found across 37 files (changes from recent commits).

Prompt for AI agents (unresolved issues)


Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="GNUmakefile">

<violation number="1" location="GNUmakefile:143">
P2: `test-e2e` prerequisite was changed to a configurable target name, but no matching rule exists for overridden `PROBOD_E2E_BIN` values, causing Make target resolution failures.</violation>
</file>

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.}

Three reinforcements on the browser navigation path, all surfaced by cubic code review on PR getprobo#982: - netcheck.IsPublicIP now rejects the full multicast range (ip.IsMulticast) rather than only link-local multicast, so addresses in 224.0.0.0/4 and 239.0.0.0/8 can no longer slip through the SSRF guard. - Browser.checkURL now runs netcheck.ValidatePublicURL on every URL, even when no allowed-domain list is set. The research browser in the vendor assessment is intentionally allowed to roam the public web, but it must still refuse URLs that resolve to loopback, private, or link-local IPs. - ClickElementTool reads the post-click location and feeds it back through Browser.checkURL. A click that triggers navigation to a different host (JS-initiated redirect, malicious <a href>, vendor page hijack) used to extract text from whatever page the browser ended up on; that path could bypass the initial checkURL call and read internal endpoints. The post-click revalidation closes that gap. Signed-off-by: Aurélien Sibiril <81782+aureliensibiril@users.noreply.github.com>

Three defects flagged by cubic code review on PR getprobo#982: - fetch_robots_txt lowercased the entire Disallow line before reading the path value, corrupting case-sensitive paths (e.g. /Admin/ reported as /admin/). Match the sitemap handling and read the path off the original-case raw line. - extract_page_text pulled the full document.body.innerText over the DevTools protocol before truncating on the Go side, so a huge page could burn bandwidth and memory well beyond maxTextLength. Slice the string in JS at 4x maxTextLength code units first (safe upper bound for UTF-16 code units per Go rune) before transferring, then finish the rune-exact truncation in Go. - diff_documents silently dropped the "documents too large for detailed diff" message when either side exceeded the 5000-line LCS cap, returning HasDifferences=false and an empty UnifiedDiff. Add a tooLarge flag on the internal diffOutput and surface the message via ErrorDetail so the caller can distinguish "no differences" from "too large to compare". Signed-off-by: Aurélien Sibiril <81782+aureliensibiril@users.noreply.github.com>

CodeQL flagged InsecureSkipVerify in check_ssl_certificate on PR getprobo#982. The tool is a cert INSPECTOR: we intentionally connect to servers whose certificates may be expired, self-signed, or otherwise invalid because reporting on that state is the entire purpose of the tool. The handshake's built-in verification is disabled, then the code manually runs x509.Verify on the returned chain and reports the result in the Valid field. No credentials or confidential data are ever sent over the connection. Document the intent inline and add a //nolint:gosec directive so the scanner stops flagging this path. Signed-off-by: Aurélien Sibiril <81782+aureliensibiril@users.noreply.github.com>

cubic-dev-ai · 2026-04-24T06:35:32Z

You're iterating quickly on this pull request. To help protect your rate limits, cubic has paused automatic reviews on new pushes for now—when you're ready for another review, comment @cubic-dev-ai review.

Signed-off-by: Aurélien Sibiril <81782+aureliensibiril@users.noreply.github.com>

The rebase onto main split the evidence-describer config into two: an LLMAgentConfig under agents.evidence-describer and a worker-tuning EvidenceDescriberConfig (Interval, StaleAfter, MaxConcurrency) at the top level. Bootstrap was only populating the LLM half, leaving Interval at zero — which panics the probod boot with "non-positive interval for NewTicker" when the evidence description worker starts. Add the missing fields with the same defaults main uses. Signed-off-by: Aurélien Sibiril <81782+aureliensibiril@users.noreply.github.com>

aureliensibiril force-pushed the aureliensibiril/vendor-assessment-agent branch 3 times, most recently from bde166f to 3c2639a Compare April 2, 2026 15:53

github-advanced-security AI found potential problems Apr 2, 2026

View reviewed changes

Comment thread pkg/agent/tools/security/ssl.go Fixed

aureliensibiril force-pushed the aureliensibiril/vendor-assessment-agent branch 2 times, most recently from cd1d947 to 9c38610 Compare April 6, 2026 19:12

aureliensibiril changed the title ~~Add composable agent and tool framework~~ Add vendor assessment agent with structured output Apr 6, 2026

aureliensibiril force-pushed the aureliensibiril/vendor-assessment-agent branch 2 times, most recently from ccba179 to d640d96 Compare April 7, 2026 06:03

gearnode reviewed Apr 7, 2026

View reviewed changes

Comment thread pkg/agent/agent.go Outdated

aureliensibiril force-pushed the aureliensibiril/vendor-assessment-agent branch 4 times, most recently from 29c89db to a061840 Compare April 8, 2026 09:06

aureliensibiril marked this pull request as ready for review April 8, 2026 10:00

aureliensibiril force-pushed the aureliensibiril/vendor-assessment-agent branch from a061840 to c7ba96e Compare April 8, 2026 10:02

cubic-dev-ai Bot reviewed Apr 8, 2026

View reviewed changes