Skip to content

Commit bd8c86b

Browse files
authored
Update Kida component validation and parity (#114)
1 parent 32e5274 commit bd8c86b

15 files changed

Lines changed: 1186 additions & 14 deletions

Makefile

Lines changed: 32 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
PYTHON_VERSION ?= 3.14t
55
VENV_DIR ?= .venv
66

7-
.PHONY: all help setup install test test-cov test-thread test-async lint lint-fix format ty clean shell docs docs-serve build publish release gh-release action-tag
7+
.PHONY: all help setup install test test-cov test-thread test-async test-safety test-rc-safety lint lint-fix format format-check ty package-smoke verify-stability verify-rc clean shell docs docs-serve build publish release gh-release action-tag
88

99
all: help
1010

@@ -20,10 +20,15 @@ help:
2020
@echo " make test-cov - Run tests with coverage report"
2121
@echo " make test-thread - Run thread safety stress tests"
2222
@echo " make test-async - Run async feature tests"
23+
@echo " make test-safety - Run focused safety/concurrency tests"
2324
@echo " make lint - Run ruff linter"
2425
@echo " make lint-fix - Run ruff linter with auto-fix"
2526
@echo " make format - Run ruff formatter"
27+
@echo " make format-check - Check ruff formatting"
2628
@echo " make ty - Run ty type checker (fast, Rust-based)"
29+
@echo " make package-smoke - Build artifacts and smoke-test installed wheel"
30+
@echo " make verify-stability - Run the full local stability gate"
31+
@echo " make verify-rc - Alias for verify-stability"
2732
@echo " make docs - Build documentation site (requires bengal)"
2833
@echo " make docs-serve - Start dev server for docs (requires bengal)"
2934
@echo " make build - Build distribution packages"
@@ -50,7 +55,7 @@ test:
5055
uv run pytest -q --tb=short
5156

5257
test-cov:
53-
uv run pytest --cov=src/kida --cov-report=term-missing
58+
uv run pytest --cov=kida --cov-report=term-missing --cov-fail-under=83
5459

5560
test-thread:
5661
@echo "Running thread safety stress tests..."
@@ -60,6 +65,18 @@ test-async:
6065
@echo "Running async feature tests..."
6166
PYTHON_GIL=0 uv run pytest tests/test_kida_async_features.py -v --tb=short
6267

68+
test-safety:
69+
@echo "Running focused safety/concurrency tests..."
70+
PYTHON_GIL=0 uv run pytest \
71+
tests/test_render_surface_parity.py \
72+
tests/test_sandbox_fuzz.py \
73+
tests/test_bytecode_cache_concurrency.py \
74+
tests/test_lru_cache_concurrency.py \
75+
tests/test_kida_stress_test.py \
76+
-q --tb=short
77+
78+
test-rc-safety: test-safety
79+
6380
lint:
6481
@echo "Running ruff linter..."
6582
uv run ruff check src/ tests/
@@ -72,10 +89,23 @@ format:
7289
@echo "Running ruff formatter..."
7390
uv run ruff format src/ tests/
7491

92+
format-check:
93+
@echo "Checking ruff formatting..."
94+
uv run ruff format --check src/ tests/
95+
7596
ty:
7697
@echo "Running ty type checker (Astral, Rust-based)..."
7798
uv run ty check src/kida/
7899

100+
package-smoke:
101+
@echo "Building and smoke-testing package artifacts..."
102+
uv run python scripts/package_smoke.py
103+
104+
verify-stability: lint format-check ty test-cov test-safety package-smoke
105+
@echo "✓ Stability verification gate passed"
106+
107+
verify-rc: verify-stability
108+
79109
docs:
80110
@echo "Building documentation site..."
81111
uv sync --group docs

docs/stability-gate.md

Lines changed: 74 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,74 @@
1+
# Kida Stability Gate
2+
3+
Use this checklist when a change touches public contracts, diagnostics,
4+
packaging, render surfaces, sandbox behavior, or free-threading assumptions.
5+
It is a project ritual for keeping Kida boring in production, not a promise
6+
that a major release is imminent.
7+
8+
## Local Stability Gate
9+
10+
Run the full local gate:
11+
12+
```bash
13+
make verify-stability
14+
```
15+
16+
This runs lint, format check, `ty`, full pytest with `--cov-fail-under=83`,
17+
focused render/sandbox/concurrency safety tests under `PYTHON_GIL=0`, and a
18+
wheel/sdist build plus clean-venv package smoke test.
19+
20+
`make verify-rc` remains an alias for contributors who already use that name.
21+
22+
The package smoke test verifies:
23+
24+
- import from the installed wheel
25+
- template render
26+
- CLI `check --validate-calls`
27+
- CLI `components --json`
28+
- component metadata
29+
- sandbox denial for blocked reflection attributes
30+
31+
## Benchmark Evidence
32+
33+
Linux 3.14t benchmark baselines are the performance comparison baseline. Darwin
34+
or other local baselines are useful for development, but they must not replace
35+
the committed Linux baseline used by CI.
36+
37+
Refresh the Linux baseline with the existing workflow or on a Linux 3.14t host:
38+
39+
```bash
40+
./scripts/benchmark_baseline.sh baseline
41+
```
42+
43+
Compare a candidate against the committed baseline:
44+
45+
```bash
46+
./scripts/benchmark_compare.sh baseline
47+
```
48+
49+
For stricter stability evidence, run the benchmark families that cover the
50+
main runtime promises:
51+
52+
```bash
53+
uv run pytest \
54+
benchmarks/test_benchmark_compile_pipeline.py \
55+
benchmarks/test_benchmark_render.py \
56+
benchmarks/test_benchmark_streaming.py \
57+
benchmarks/test_benchmark_inherited_blocks.py \
58+
benchmarks/test_benchmark_concurrent.py \
59+
--benchmark-only -v
60+
```
61+
62+
Regression thresholds:
63+
64+
- fail on more than 5% compile/render/stream regression
65+
- fail on more than 10% concurrency regression
66+
- update the Linux baseline only with an explicit baseline drift rationale
67+
68+
Include benchmark evidence in the PR when it matters:
69+
70+
- Linux platform and Python build
71+
- GIL status
72+
- benchmark command
73+
- compare summary
74+
- any baseline updates and why they are safe

plan/epic-pre-1.0-stabilization.md

Lines changed: 123 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,123 @@
1+
# Kida Pre-1.0 Stabilization Rituals
2+
3+
Kida is moving from pre-1.0 feature turbulence to a framework-author-ready
4+
surface. This is not a plan to cut a major release candidate; it is a plan to
5+
turn the behaviors framework authors already depend on into visible, repeatable
6+
project rituals. The stabilization posture is deliberately conservative: no new
7+
runtime dependencies, public APIs, template tags, or config knobs unless they
8+
are required to close a bug that blocks real users.
9+
10+
Current baseline after the `0.8.0` work on `main`:
11+
12+
- Pure Python remains the default contract; runtime dependencies stay empty.
13+
- `make lint`, `make ty`, and the full test suite are the required local gates.
14+
- Coverage currently targets the project floor in `pyproject.toml`; the
15+
stability gate should enforce the current 83% floor.
16+
- Existing strict xfails should stay at zero for stabilization work.
17+
18+
## Phase 1: Stability Inventory and API Snapshot
19+
20+
Freeze the public framework-author surface before changing behavior.
21+
22+
- Add a public API snapshot test for:
23+
- `kida.__all__`
24+
- `ErrorCode` names and values
25+
- `Environment.__init__`
26+
- public `Template` render and metadata methods
27+
- metadata dataclass fields
28+
- CLI subcommands and flags
29+
- Update API and CLI docs to name stable surfaces and separate them from
30+
internal or still-provisional surfaces.
31+
- Treat snapshot changes as deliberate API changes that require changelog and
32+
docs updates in the same PR.
33+
34+
Acceptance:
35+
36+
- `make lint`, `make ty`, and focused tests pass.
37+
- Snapshot tests fail when public contracts drift.
38+
39+
## Phase 2: Component and Metadata Contract Tests
40+
41+
Harden component validation and metadata for framework integration.
42+
43+
- Extend imported def validation tests for aliases, missing required props,
44+
unknown props, literal type mismatches, missing imported templates, and
45+
dynamic import skip behavior.
46+
- Stabilize metadata across `list_defs()`, `def_metadata()`,
47+
`block_metadata()`, `template_metadata()`, inheritance, regions, slots, and
48+
CLI JSON output.
49+
- Ensure `K-CMP-001` and `K-CMP-002` have docs entries with fix guidance.
50+
51+
Acceptance:
52+
53+
- Imported literal component validation is covered end to end.
54+
- Dynamic imports are explicitly documented as skipped.
55+
56+
## Phase 3: Render, Sandbox, and Free-Threading Gates
57+
58+
Convert safety invariants into local verification gates.
59+
60+
- Add render-surface coverage meta-tests for every current render surface.
61+
- Assert there are no active strict xfails.
62+
- Add a fragment-render scaffold guard so block/fragment render methods use the
63+
shared scaffold path.
64+
- Promote sandbox fuzz, bytecode-cache concurrency, LRU concurrency, and render
65+
stress tests into stability verification.
66+
67+
Acceptance:
68+
69+
- `PYTHON_GIL=0 make verify-stability` passes locally on 3.14t.
70+
- Free-threaded shared state has an audit note for intentional sharing,
71+
locking, and copy-on-write behavior.
72+
73+
## Phase 4: Diagnostics and Docs Truth Audit
74+
75+
Make docs match actual behavior.
76+
77+
- Add an ErrorCode docs coverage test for every public `ErrorCode`, including
78+
`K-CMP-*`.
79+
- Snapshot representative `kida check --validate-calls`,
80+
`kida components --json`, and error/warning formatting.
81+
- Reconcile API, CLI, components, type-checking, sandbox, thread-safety, and
82+
Jinja2 migration docs with current behavior.
83+
84+
Acceptance:
85+
86+
- Every public diagnostic has docs, fix guidance, and stable formatting.
87+
88+
## Phase 5: Benchmark Baseline and Packaging Smoke
89+
90+
Collect performance evidence and package smoke tests.
91+
92+
- Refresh Linux 3.14t benchmark baselines using existing benchmark scripts for
93+
render, compile pipeline, streaming, inherited blocks, and concurrency.
94+
- Add benchmark comparison instructions to release docs.
95+
- Build wheel/sdist, install from the wheel in a clean temporary environment,
96+
and smoke-test import, render, CLI check, component metadata, and sandbox
97+
denial.
98+
- Fail the stability gate on more than 5% compile/render/stream regression or more than 10%
99+
concurrency regression versus the committed Linux 3.14t baseline unless the
100+
PR updates the baseline with justification.
101+
102+
Acceptance:
103+
104+
- Any user-facing behavior change has docs or changelog coverage.
105+
- The repo has enough evidence to make a release decision later without
106+
reconstructing what was tested.
107+
108+
## Stability Gate Target
109+
110+
`make verify-stability` should become the single local stabilization gate and
111+
run:
112+
113+
- `ruff check`
114+
- `ruff format --check`
115+
- `ty`
116+
- full pytest
117+
- coverage with `--cov-fail-under=83`
118+
- sandbox fuzz and thread-safety tests
119+
- `PYTHON_GIL=0` free-threading pass where supported
120+
- wheel build/install smoke test
121+
122+
`make verify-rc` may remain as an alias, but the ritual is about stability
123+
evidence rather than committing the project to a specific version tag.

0 commit comments

Comments
 (0)