Skip to content

Commit bd864cf

Browse files
authored
Merge pull request #1040 from massgen/dev/v0.1.71
feat: v0.1.71
2 parents 3353fb7 + ba8ebc0 commit bd864cf

44 files changed

Lines changed: 4224 additions & 737 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.github/workflows/tests.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -37,5 +37,5 @@ jobs:
3737
run: >
3838
uv run pytest massgen/tests
3939
-m "not live_api and not docker and not expensive"
40-
-k "not test_timeline_snapshot and not test_final_lock_option and not test_web_quickstart_reasoning_sync and not test_subagent_input_bar_snapshot_matches_main_input"
40+
-k "not test_timeline_snapshot and not test_final_lock_option and not test_web_quickstart_reasoning_sync and not test_subagent_input_bar_snapshot_matches_main_input and not test_review_modal_snapshot"
4141
-q --tb=no

CHANGELOG.md

Lines changed: 25 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -9,14 +9,37 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
99

1010
## Recent Releases
1111

12+
**v0.1.71 (April 1, 2026)** - Trace Memory & Evaluation Polish
13+
Trace analyzer subagents now launch in the background after each round to write insights from execution traces into memory. Improved evaluation criteria generation and system prompt tuning. Fixes for final injection, eval criteria GPT pre-collab, trace analyzer launch, and trace memory.
14+
1215
**v0.1.70 (March 30, 2026)** - Evaluation Criteria Redesign
1316
Redesigned three-tier evaluation criteria with anti-pattern definitions and aspiration statements. Improved checklist-gated evaluation with tighter iterative submission cycles. Fast iteration mode, WebUI review modal, and background trace analysis from round 2.
1417

1518
**v0.1.69 (March 27, 2026)** - WebUI Automation & Improved Skill
1619
WebUI automation now auto-starts without browser interaction — open the URL at any point mid-run to monitor progress. MassGen skill redesign for increased usability and WebUI integration. Quickstart Wizard rework, Workspace Browser expansion, and flexible evaluation criteria field names.
1720

18-
**v0.1.68 (March 25, 2026)** - Checkpoint Mode
19-
New checkpoint coordination mode with delegator pattern — main agent plans solo then delegates to team via `checkpoint()` tool. LLM API circuit breaker for 429 handling. WebUI checkpoint support. LiteLLM supply chain fix.
21+
---
22+
23+
## [0.1.71] - 2026-04-01
24+
25+
### Changed
26+
- **Better Evaluation Criteria**: Improved criteria generation for higher-quality, more opinionated output
27+
- **System Prompt Tuning**: Adjusted system prompts for better agent performance across coordination rounds
28+
29+
### Fixed
30+
- **Final Injection Fix**: Corrected injection behavior at the final stage
31+
- **Eval Criteria GPT Pre-Collab Fix**: Resolved evaluation criteria issues with GPT models during pre-collaboration phase
32+
- **Execution Trace Analyzer Launch Fix**: Trace analyzer now starts correctly
33+
- **Trace Memory Fix**: Corrected memory handling in execution traces
34+
- **Auto Round Memory Fix**: Fixed automatic round handling for memory
35+
36+
### Documentation, Configurations and Resources
37+
- **Updated Log Analyzer Skill**: Updated `massgen/skills/massgen-log-analyzer/SKILL.md`
38+
- **Updated Execution Trace Analyzer**: Updated `massgen/subagent_types/execution_trace_analyzer/SUBAGENT.md`
39+
40+
### Technical Details
41+
- **Major Focus**: Stability and polish for v0.1.70's evaluation criteria system
42+
- **Contributors**: @ncrispino, @HenryQi and the MassGen team
2043

2144
---
2245

CONTRIBUTING.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -359,7 +359,7 @@ Create a `.env` file in the `massgen` directory as described in [README](README.
359359

360360
## 🔧 Development Workflow
361361

362-
> **Important**: Our next version is v0.1.71. If you want to contribute, please contribute to the `dev/v0.1.71` branch (or `main` if dev/v0.1.71 doesn't exist yet).
362+
> **Important**: Our next version is v0.1.72. If you want to contribute, please contribute to the `dev/v0.1.72` branch (or `main` if dev/v0.1.72 doesn't exist yet).
363363
364364
### 1. Create Feature Branch
365365

@@ -368,7 +368,7 @@ Create a `.env` file in the `massgen` directory as described in [README](README.
368368
git fetch upstream
369369

370370
# Create feature branch from dev/v0.1.60 (or main if dev branch doesn't exist yet)
371-
git checkout -b feature/your-feature-name upstream/dev/v0.1.71
371+
git checkout -b feature/your-feature-name upstream/dev/v0.1.72
372372
```
373373

374374
### 2. Make Your Changes
@@ -507,7 +507,7 @@ git push origin feature/your-feature-name
507507
```
508508

509509
Then create a pull request on GitHub:
510-
- Base branch: `dev/v0.1.71` (or `main` if dev branch doesn't exist yet)
510+
- Base branch: `dev/v0.1.72` (or `main` if dev branch doesn't exist yet)
511511
- Compare branch: `feature/your-feature-name`
512512
- Add clear description of changes
513513
- Link any related issues
@@ -617,7 +617,7 @@ Have a significant feature idea not covered by existing tracks?
617617
- [ ] Tests pass locally
618618
- [ ] Documentation is updated if needed
619619
- [ ] Commit messages follow convention
620-
- [ ] PR targets `dev/v0.1.71` branch (or `main` if dev branch doesn't exist yet)
620+
- [ ] PR targets `dev/v0.1.72` branch (or `main` if dev branch doesn't exist yet)
621621

622622
### PR Description Should Include
623623

Makefile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -90,7 +90,7 @@ test: test-fast
9090

9191
test-fast:
9292
@echo "🧪 Running fast test lane..."
93-
@uv run pytest massgen/tests --run-integration -m "not live_api and not docker and not expensive" -q --tb=no
93+
@uv run pytest massgen/tests --run-integration -m "not live_api and not docker and not expensive" -k "not test_review_modal_snapshot" -q --tb=no
9494
@echo "✓ Fast test lane passed"
9595

9696
test-all:

README.md

Lines changed: 26 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -69,7 +69,7 @@ This project started with the "threads of thought" and "iterative refinement" id
6969
<details open>
7070
<summary><h3>🆕 Latest Features</h3></summary>
7171

72-
- [v0.1.70 Features](#-latest-features-v0170)
72+
- [v0.1.71 Features](#-latest-features-v0171)
7373
</details>
7474

7575
<details open>
@@ -122,15 +122,15 @@ This project started with the "threads of thought" and "iterative refinement" id
122122
<details open>
123123
<summary><h3>🗺️ Roadmap</h3></summary>
124124

125-
- [Recent Achievements (v0.1.70)](#recent-achievements-v0170)
126-
- [Previous Achievements (v0.0.3 - v0.1.69)](#previous-achievements-v003---v0169)
125+
- [Recent Achievements (v0.1.71)](#recent-achievements-v0171)
126+
- [Previous Achievements (v0.0.3 - v0.1.70)](#previous-achievements-v003---v0170)
127127
- [Key Future Enhancements](#key-future-enhancements)
128128
- Bug Fixes & Backend Improvements
129129
- Advanced Agent Collaboration
130130
- Expanded Model, Tool & Agent Integrations
131131
- Improved Performance & Scalability
132132
- Enhanced Developer Experience
133-
- [v0.1.71 Roadmap](#v0171-roadmap)
133+
- [v0.1.72 Roadmap](#v0172-roadmap)
134134
</details>
135135

136136
<details open>
@@ -155,21 +155,20 @@ This project started with the "threads of thought" and "iterative refinement" id
155155

156156
---
157157

158-
## 🆕 Latest Features (v0.1.70)
158+
## 🆕 Latest Features (v0.1.71)
159159

160-
**🎉 Released: March 30, 2026**
160+
**🎉 Released: April 1, 2026**
161161

162-
**What's New in v0.1.70:**
163-
- **📋 Evaluation Criteria Redesign** - Three-tier categorization (`primary`, `standard`, `stretch`) with anti-pattern definitions and aspiration statements.
164-
- **🔄 Improved Checklist-Gated Evaluation** - Tighter iterative submission cycles with improved scoring and improvement proposals.
165-
- **⚡ Fast Iteration Mode** - Streamlined multi-round submission phases via `fast_iteration.yaml`.
166-
- **🔍 WebUI Review Modal** - Approve and comment on outputs directly in the browser.
162+
**What's New in v0.1.71:**
163+
- **🔍 Trace Analyzer Subagents** - Launch in the background after each round to write insights from execution traces into memory.
164+
- **📋 Better Evaluation Criteria** - Improved criteria generation for higher-quality, more opinionated output.
165+
- **🧠 System Prompt Tuning** - Adjusted system prompts for better agent performance across coordination rounds.
166+
- **🔧 Stability Fixes** - Fixed final injection, eval criteria GPT pre-collab, trace analyzer launch, and memory handling.
167167

168-
**Try v0.1.70 Features:**
168+
**Try v0.1.71 Features:**
169169
```bash
170-
pip install massgen==0.1.70
171-
# Try fast iteration with redesigned evaluation criteria
172-
uv run massgen --config @examples/features/fast_iteration.yaml "Create an svg of an AI agent coding."
170+
pip install massgen==0.1.71
171+
uv run massgen --config @examples/features/trace_analyzer_background.yaml "Create an svg of an AI agent coding."
173172
```
174173

175174
[See full release history and examples](massgen/configs/README.md#release-history--examples)
@@ -1241,18 +1240,19 @@ MassGen is currently in its foundational stage, with a focus on parallel, asynch
12411240

12421241
⚠️ **Early Stage Notice:** As MassGen is in active development, please expect upcoming breaking architecture changes as we continue to refine and improve the system.
12431242

1244-
### Recent Achievements (v0.1.70)
1243+
### Recent Achievements (v0.1.71)
12451244

1246-
**🎉 Released: March 30, 2026**
1245+
**🎉 Released: April 1, 2026**
12471246

1248-
#### Evaluation Criteria Redesign
1249-
- **Evaluation Criteria Redesign** ([#1035](https://github.com/massgen/MassGen/pull/1035)): Three-tier categorization (`primary`, `standard`, `stretch`) with anti-pattern definitions and aspiration statements
1250-
- **Improved Checklist-Gated Evaluation** ([#1035](https://github.com/massgen/MassGen/pull/1035)): Tighter iterative submission cycles with improved scoring and improvement proposals
1251-
- **Fast Iteration Mode** ([#1035](https://github.com/massgen/MassGen/pull/1035)): Streamlined multi-round submission phases via `fast_iteration.yaml`
1252-
- **WebUI Review Modal** ([#1035](https://github.com/massgen/MassGen/pull/1035)): Approve and comment on outputs directly in the browser when working in git
1253-
- **Background Trace Analysis** ([#1035](https://github.com/massgen/MassGen/pull/1035)): Execution trace analyzer starts automatically from round 2
1247+
#### Trace Memory & Evaluation Polish
1248+
- **Trace Analyzer Subagents**: Background trace analysis after each round — writes insights from execution traces into memory for next-round continuity
1249+
- **Better Evaluation Criteria**: Improved criteria generation for higher-quality, more opinionated output
1250+
- **System Prompt Tuning**: Adjusted system prompts for better agent performance across coordination rounds
1251+
- **Stability Fixes**: Fixed final injection, eval criteria GPT pre-collab, trace analyzer launch, trace memory, and auto round memory
12541252

1255-
### Previous Achievements (v0.0.3 - v0.1.69)
1253+
### Previous Achievements (v0.0.3 - v0.1.70)
1254+
1255+
**Evaluation Criteria Redesign (v0.1.70)**: Redesigned three-tier evaluation criteria with anti-pattern definitions and aspiration statements. Improved checklist-gated evaluation. Fast iteration mode, WebUI review modal, and background trace analysis.
12561256

12571257
**WebUI Automation & Improved Skill (v0.1.69)**: WebUI automation auto-starts without browser interaction. MassGen skill redesign for increased usability and WebUI integration. Quickstart Wizard rework and Workspace Browser expansion.
12581258

@@ -1537,9 +1537,9 @@ MassGen is currently in its foundational stage, with a focus on parallel, asynch
15371537

15381538
We welcome community contributions to achieve these goals.
15391539

1540-
### v0.1.71 Roadmap
1540+
### v0.1.72 Roadmap
15411541

1542-
Version 0.1.71 focuses on cloud execution:
1542+
Version 0.1.72 focuses on cloud execution:
15431543

15441544
#### Planned Features
15451545
- **Cloud Modal MVP** ([#982](https://github.com/massgen/MassGen/issues/982)): Run MassGen as a cloud job on Modal — progress streams to terminal, results saved locally under `.massgen/cloud_jobs/`

README_PYPI.md

Lines changed: 26 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -68,7 +68,7 @@ This project started with the "threads of thought" and "iterative refinement" id
6868
<details open>
6969
<summary><h3>🆕 Latest Features</h3></summary>
7070

71-
- [v0.1.70 Features](#-latest-features-v0170)
71+
- [v0.1.71 Features](#-latest-features-v0171)
7272
</details>
7373

7474
<details open>
@@ -121,15 +121,15 @@ This project started with the "threads of thought" and "iterative refinement" id
121121
<details open>
122122
<summary><h3>🗺️ Roadmap</h3></summary>
123123

124-
- [Recent Achievements (v0.1.70)](#recent-achievements-v0170)
125-
- [Previous Achievements (v0.0.3 - v0.1.69)](#previous-achievements-v003---v0169)
124+
- [Recent Achievements (v0.1.71)](#recent-achievements-v0171)
125+
- [Previous Achievements (v0.0.3 - v0.1.70)](#previous-achievements-v003---v0170)
126126
- [Key Future Enhancements](#key-future-enhancements)
127127
- Bug Fixes & Backend Improvements
128128
- Advanced Agent Collaboration
129129
- Expanded Model, Tool & Agent Integrations
130130
- Improved Performance & Scalability
131131
- Enhanced Developer Experience
132-
- [v0.1.71 Roadmap](#v0171-roadmap)
132+
- [v0.1.72 Roadmap](#v0172-roadmap)
133133
</details>
134134

135135
<details open>
@@ -154,21 +154,20 @@ This project started with the "threads of thought" and "iterative refinement" id
154154

155155
---
156156

157-
## 🆕 Latest Features (v0.1.70)
157+
## 🆕 Latest Features (v0.1.71)
158158

159-
**🎉 Released: March 30, 2026**
159+
**🎉 Released: April 1, 2026**
160160

161-
**What's New in v0.1.70:**
162-
- **📋 Evaluation Criteria Redesign** - Three-tier categorization (`primary`, `standard`, `stretch`) with anti-pattern definitions and aspiration statements.
163-
- **🔄 Improved Checklist-Gated Evaluation** - Tighter iterative submission cycles with improved scoring and improvement proposals.
164-
- **⚡ Fast Iteration Mode** - Streamlined multi-round submission phases via `fast_iteration.yaml`.
165-
- **🔍 WebUI Review Modal** - Approve and comment on outputs directly in the browser.
161+
**What's New in v0.1.71:**
162+
- **🔍 Trace Analyzer Subagents** - Launch in the background after each round to write insights from execution traces into memory.
163+
- **📋 Better Evaluation Criteria** - Improved criteria generation for higher-quality, more opinionated output.
164+
- **🧠 System Prompt Tuning** - Adjusted system prompts for better agent performance across coordination rounds.
165+
- **🔧 Stability Fixes** - Fixed final injection, eval criteria GPT pre-collab, trace analyzer launch, and memory handling.
166166

167-
**Try v0.1.70 Features:**
167+
**Try v0.1.71 Features:**
168168
```bash
169-
pip install massgen==0.1.70
170-
# Try fast iteration with redesigned evaluation criteria
171-
uv run massgen --config @examples/features/fast_iteration.yaml "Create an svg of an AI agent coding."
169+
pip install massgen==0.1.71
170+
uv run massgen --config @examples/features/trace_analyzer_background.yaml "Create an svg of an AI agent coding."
172171
```
173172

174173
[See full release history and examples](massgen/configs/README.md#release-history--examples)
@@ -1240,18 +1239,19 @@ MassGen is currently in its foundational stage, with a focus on parallel, asynch
12401239

12411240
⚠️ **Early Stage Notice:** As MassGen is in active development, please expect upcoming breaking architecture changes as we continue to refine and improve the system.
12421241

1243-
### Recent Achievements (v0.1.70)
1242+
### Recent Achievements (v0.1.71)
12441243

1245-
**🎉 Released: March 30, 2026**
1244+
**🎉 Released: April 1, 2026**
12461245

1247-
#### Evaluation Criteria Redesign
1248-
- **Evaluation Criteria Redesign** ([#1035](https://github.com/massgen/MassGen/pull/1035)): Three-tier categorization (`primary`, `standard`, `stretch`) with anti-pattern definitions and aspiration statements
1249-
- **Improved Checklist-Gated Evaluation** ([#1035](https://github.com/massgen/MassGen/pull/1035)): Tighter iterative submission cycles with improved scoring and improvement proposals
1250-
- **Fast Iteration Mode** ([#1035](https://github.com/massgen/MassGen/pull/1035)): Streamlined multi-round submission phases via `fast_iteration.yaml`
1251-
- **WebUI Review Modal** ([#1035](https://github.com/massgen/MassGen/pull/1035)): Approve and comment on outputs directly in the browser when working in git
1252-
- **Background Trace Analysis** ([#1035](https://github.com/massgen/MassGen/pull/1035)): Execution trace analyzer starts automatically from round 2
1246+
#### Trace Memory & Evaluation Polish
1247+
- **Trace Analyzer Subagents**: Background trace analysis after each round — writes insights from execution traces into memory for next-round continuity
1248+
- **Better Evaluation Criteria**: Improved criteria generation for higher-quality, more opinionated output
1249+
- **System Prompt Tuning**: Adjusted system prompts for better agent performance across coordination rounds
1250+
- **Stability Fixes**: Fixed final injection, eval criteria GPT pre-collab, trace analyzer launch, trace memory, and auto round memory
12531251

1254-
### Previous Achievements (v0.0.3 - v0.1.69)
1252+
### Previous Achievements (v0.0.3 - v0.1.70)
1253+
1254+
**Evaluation Criteria Redesign (v0.1.70)**: Redesigned three-tier evaluation criteria with anti-pattern definitions and aspiration statements. Improved checklist-gated evaluation. Fast iteration mode, WebUI review modal, and background trace analysis.
12551255

12561256
**WebUI Automation & Improved Skill (v0.1.69)**: WebUI automation auto-starts without browser interaction. MassGen skill redesign for increased usability and WebUI integration. Quickstart Wizard rework and Workspace Browser expansion.
12571257

@@ -1536,9 +1536,9 @@ MassGen is currently in its foundational stage, with a focus on parallel, asynch
15361536

15371537
We welcome community contributions to achieve these goals.
15381538

1539-
### v0.1.71 Roadmap
1539+
### v0.1.72 Roadmap
15401540

1541-
Version 0.1.71 focuses on cloud execution:
1541+
Version 0.1.72 focuses on cloud execution:
15421542

15431543
#### Planned Features
15441544
- **Cloud Modal MVP** ([#982](https://github.com/massgen/MassGen/issues/982)): Run MassGen as a cloud job on Modal — progress streams to terminal, results saved locally under `.massgen/cloud_jobs/`

ROADMAP.md

Lines changed: 20 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,10 @@
11
# MassGen Roadmap
22

3-
**Current Version:** v0.1.70
3+
**Current Version:** v0.1.71
44

55
**Release Schedule:** Mondays, Wednesdays, Fridays @ 9am PT
66

7-
**Last Updated:** March 30, 2026
7+
**Last Updated:** April 1, 2026
88

99
This roadmap outlines MassGen's development priorities for upcoming releases. Each release focuses on specific capabilities with real-world use cases.
1010

@@ -42,14 +42,26 @@ Want to contribute or collaborate on a specific track? Reach out to the track ow
4242

4343
| Release | Target | Feature | Owner | Use Case |
4444
|---------|--------|---------|-------|----------|
45-
| **v0.1.71** | 04/02/26 | Cloud Modal MVP | @ncrispino | Run MassGen as a cloud job on Modal ([#982](https://github.com/massgen/MassGen/issues/982)) |
46-
| **v0.1.72** | 04/04/26 | OpenAI Audio API | @ncrispino | Support OpenAI audio API for audio understanding ([#960](https://github.com/massgen/MassGen/issues/960)) |
47-
| **v0.1.73** | 04/07/26 | Image/Video Edit Capabilities | @ncrispino | Check and support img/video editing capabilities ([#959](https://github.com/massgen/MassGen/issues/959)) |
45+
| **v0.1.72** | 04/04/26 | Cloud Modal MVP | @ncrispino | Run MassGen as a cloud job on Modal ([#982](https://github.com/massgen/MassGen/issues/982)) |
46+
| **v0.1.73** | 04/07/26 | OpenAI Audio API | @ncrispino | Support OpenAI audio API for audio understanding ([#960](https://github.com/massgen/MassGen/issues/960)) |
47+
| **v0.1.74** | 04/09/26 | Image/Video Edit Capabilities | @ncrispino | Check and support img/video editing capabilities ([#959](https://github.com/massgen/MassGen/issues/959)) |
4848

4949
*All releases ship on MWF @ 9am PT when ready*
5050

5151
---
5252

53+
## ✅ v0.1.71 - Trace Memory & Evaluation Polish (Completed)
54+
55+
**Released:** April 1, 2026
56+
57+
### Features
58+
- **Trace Analyzer Subagents**: Background trace analysis after each round — writes insights from execution traces into memory for next-round continuity
59+
- **Better Evaluation Criteria**: Improved criteria generation for higher-quality, more opinionated output
60+
- **System Prompt Tuning**: Adjusted system prompts for better agent performance across coordination rounds
61+
- **Stability Fixes**: Fixed final injection, eval criteria GPT pre-collab, trace analyzer launch, trace memory, and auto round memory
62+
63+
---
64+
5365
## ✅ v0.1.70 - Evaluation Criteria Redesign (Completed)
5466

5567
**Released:** March 30, 2026 | PRs: [#1035](https://github.com/massgen/MassGen/pull/1035)
@@ -63,7 +75,7 @@ Want to contribute or collaborate on a specific track? Reach out to the track ow
6375

6476
---
6577

66-
## 📋 v0.1.71 - Cloud Modal MVP
78+
## 📋 v0.1.72 - Cloud Modal MVP
6779

6880
### Features
6981

@@ -79,7 +91,7 @@ Want to contribute or collaborate on a specific track? Reach out to the track ow
7991

8092
---
8193

82-
## 📋 v0.1.72 - OpenAI Audio API
94+
## 📋 v0.1.73 - OpenAI Audio API
8395

8496
### Features
8597

@@ -95,7 +107,7 @@ Want to contribute or collaborate on a specific track? Reach out to the track ow
95107

96108
---
97109

98-
## 📋 v0.1.73 - Image/Video Edit Capabilities
110+
## 📋 v0.1.74 - Image/Video Edit Capabilities
99111

100112
### Features
101113

0 commit comments

Comments
 (0)