ci(e2e): stabilize MCP flow and cancel stale main runs#4039
Open
yiliang114 wants to merge 5 commits into
Open
Conversation
Contributor
Code Coverage Summary
CLI Package - Full Text ReportCore Package - Full Text ReportFor detailed HTML reports, please see the 'coverage-reports-22.x-ubuntu-latest' artifact from the main CI run. |
419efe5 to
e4959fe
Compare
Contributor
There was a problem hiding this comment.
Pull request overview
This pull request updates the MCP message-flow E2E to validate the specific MCP add tool call/result pairing by tool_use_id, while no longer assuming the MCP call must be the first tool-related event in the assistant stream (to allow tool_search / discovery to occur earlier).
Changes:
- Makes the prompt explicitly require calling the
addtool to reduce “no-tool-call” model variance. - Locates the MCP
addtool_useby name and records itstool_use_id. - Verifies a matching non-error
tool_resultexists for that sametool_use_id, instead of asserting tool ordering.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
e4959fe to
42bb496
Compare
The model consistently picks list_directory over run_shell_command for file-listing prompts. Make the prompt explicit about which tool to use, matching the approach taken for the MCP tool flow test.
Comment on lines
+10
to
+14
| concurrency: | ||
| group: |- | ||
| ${{ github.workflow }}-${{ github.event_name == 'push' && github.ref == 'refs/heads/main' && 'main' || github.run_id }} | ||
| cancel-in-progress: |- | ||
| ${{ github.event_name == 'push' && github.ref == 'refs/heads/main' }} |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
main.tool_usemust bemcp__test-math-server__add, but MCP tools are deferred and ToolSearch can legitimately appear first. Separately, consecutive pushes tomaincurrently keep superseded E2E runs alive, wasting runner time.tool_useand matchingtool_result, while the workflow concurrency cancels only stalemainpush E2E runs and leavesfeat/e2e/**plusmerge_groupruns independent.Validation
Use the add tool to calculate 2 + 3. You must call the tool. Just give me the result.tool_searchappears before the MCP call.mainE2E pushes cancel older in-progressmainE2E workflow runs, while branch and merge queue E2E runs remain independent.git diff --check,npm run build, YAML parsing, and actionlint passed locally.[API Error: 无效的api key]before any MCP call.npx tsc --noEmit --project integration-tests/tsconfig.jsonfails on existing unrelated integration-test type errors outside this PR.isError === falsewas too strict for a message-flow test, so the test now validates the target call/result pairing without asserting tool execution success in this specific flow test.576bd8e0andcb7059f5were both in progress concurrently before the concurrency change.Scope / Risk
mainE2E runs may be canceled before completion when a newer main commit arrives; this is intentional because the newer run supersedes the older one.Testing Matrix
Testing matrix notes:
npm run buildpassed locally.npx vitestreached the MCP test but was blocked by local model/API key configuration before any MCP tool call.Linked Issues / Bugs