Skip to content

ostrowr/jsoncompat

Repository files navigation

jsoncompat logo

jsoncompat

crates.io docs.rs PyPI npm License: MIT

Check compatibility of evolving JSON schemas.

jsoncompat currently supports JSON Schema Draft 2020-12 only. If a schema declares $schema, it must be https://json-schema.org/draft/2020-12/schema with an optional trailing #.

Warning

Docs and examples at jsoncompat.com

This is alpha software. Not all incompatible changes are detected, and there may be false positives. Contributions are welcome!

Installation and basic usage

Install the CLI with Cargo:

cargo install jsoncompat

Check whether a schema change is compatible for a serializer:

jsoncompat compat old-schema.json new-schema.json --role serializer

Check whether a schema change is compatible in both serializer and deserializer directions, with fuzzing enabled to look for a concrete counterexample:

jsoncompat compat old-schema.json new-schema.json --role both --fuzz 1000 --depth 8

Generate example JSON values accepted by a schema:

jsoncompat generate schema.json --count 5 --pretty

Compare two golden schema files in CI:

jsoncompat ci old-golden.json new-golden.json --display table

Run the guided CLI smoke test:

jsoncompat demo --noninteractive

Motivation

Imagine you have an API that returns some JSON data, or JSON that you're storing in a database or file. You need to ensure that new code can read old data and that old code can read new data.

It's difficult to version JSON schemas in a traditional sense, because they can break in two directions:

  1. If a schema is used by the party generating the data, or "serializer", then a change to the schema that can break clients using an older version of the schema should be considered "breaking." For example, removing a required property from a serializer schema should be considered a breaking change for a schema with the serializer role.

    More formally, consider a serializer schema $S_A$ which is changed to $S_B$. This change should be considered breaking if there exists some JSON value that is valid against $S_B$ but invalid against $S_A$.

    As a concrete example, if you're a webserver that returns JSON data with the following schema:

    {
      "type": "object",
      "properties": {
        "id": { "type": "integer" },
        "name": { "type": "string" }
      },
      "required": ["id", "name"]
    }

    and you make name optional:

    {
      "type": "object",
      "properties": {
        "id": { "type": "integer" },
        "name": { "type": "string" }
      },
      "required": ["id"]
    }

    then you've made a breaking change for any client that is using the old schema.

    We assume that the serializer will not write additional properties that are not in the schema, even if additionalProperties is true. This allows us to consider a change to the schema that adds an optional property of some type not to be a breaking change.

  2. If a schema is used by a party receiving the data, or "deserializer", then a change to the schema that might fail to deserialize existing data should be considered "breaking." For example, adding a required property to a deserializer should be considered a breaking change.

    More formally, consider a deserializer schema $S_A$ which is changed to $S_B$. This change should be considered breaking if there exists some JSON value that is valid against $S_A$ but invalid against $S_B$.

    As a concrete example, imagine that you've been writing code that saves JSON data to a database with the following schema:

    {
      "type": "object",
      "properties": {
        "id": { "type": "integer" },
        "name": { "type": "string" }
      },
      "required": ["id"]
    }

    and you make name required, attempting to load that data into memory by deserializing it with the following schema:

    {
      "type": "object",
      "properties": {
        "id": { "type": "integer" },
        "name": { "type": "string" }
      },
      "required": ["id", "name"]
    }

    you'll be unable to deserialize any data that doesn't have a name property, which is a breaking change for the deserializer role.

    If a schema is used by both a serializer and a deserializer, then a change to the schema that can break either should be considered "breaking."

Rust API

The jsoncompat crate has a small document-level API:

use jsoncompat::{Role, SchemaDocument, check_compat};
use serde_json::json;

let old = SchemaDocument::from_json(&json!({ "type": "string" })).unwrap();
let new = SchemaDocument::from_json(&json!({ "type": ["string", "null"] })).unwrap();

let compatible = check_compat(&old, &new, Role::Deserializer).unwrap();
  • SchemaDocument::from_json(&Value) builds and canonicalizes a Draft 2020-12 schema document.
  • check_compat(&old, &new, Role::Serializer | Role::Deserializer | Role::Both) returns whether the schema change is compatible for that role.
  • CompatibilityError reports schema construction failures and compatibility features that are intentionally rejected instead of approximated.
  • json_schema_fuzz::ValueGenerator::generate(&schema, GenerationConfig::new(depth), rng) is the separate Rust value-generation API.

Support checklist

Legend:

  • ✅: the backcompat checker has an explicit subset rule, or the fuzzer has direct schema-guided candidate construction for that keyword.
  • 🟡: the feature is parsed/evaluated, but the checker is conservative or the fuzzer is heuristic/retry-based for that keyword.
  • ⚪: the keyword is accepted or stripped, but it is not represented in the resolved IR and does not contribute checker/fuzzer semantics.
  • ⛔: schema construction fails with a typed error before the checker/fuzzer runs.

Important global caveats:

  • jsoncompat::check_compat is a document-level structural subset checker over the resolved AST. It can return false negatives for conservative subset proofs and false positives for string-language constraints that are currently ignored when both sides are open string schemas.
  • Compatibility checks reject non-integral number.multipleOf constraints with a typed CompatibilityError instead of approximating fractional divisor inclusion with f64.
  • Runtime validation (SchemaDocument::is_valid) always delegates to the jsonschema backend compiled from the raw schema document, so non-f64 numeric schemas/values follow the backend's support limits rather than the compatibility checker's exact numeric model.
  • For serializer compatibility, the checker assumes serializers do not emit undeclared extra properties, even when additionalProperties: true.
  • json_schema_fuzz::ValueGenerator::generate walks the resolved AST heuristically, then validates every candidate with the raw jsonschema backend. If the resolved schema is known to have no valid instances, it returns GenerateError::Unsatisfiable deterministically; otherwise, if no valid candidate is found within the retry budget, it returns GenerateError::ExhaustedAttempts.
  • Boolean schema documents true and false are supported directly. false has no valid instances, so generation returns Unsatisfiable.
  • Unknown extension keywords are preserved only when needed to keep local ref targets addressable; otherwise they are ignored by both the checker and the fuzzer.

Scalar schemas

Keyword Compat Fuzz Notes
type Supports null, boolean, object, array, string, number, integer, and unions of those primitive types.
enum Numeric equality follows JSON Schema semantics, so 1 and 1.0 are treated as equal.
const Same semantic numeric equality as enum.
minimum Integer bounds are normalized exactly where possible; number bounds use f64.
maximum Integer bounds are normalized exactly where possible; number bounds use f64.
exclusiveMinimum 🟡 Integer exclusives are canonicalized into inclusive integer bounds; number generation may rely on validator retries around open lower bounds.
exclusiveMaximum 🟡 Integer exclusives are canonicalized into inclusive integer bounds; number generation may rely on validator retries around open upper bounds.
multipleOf 🟡 Exact integer divisibility is supported for integer schemas and integer-valued number.multipleOf. Compatibility checks reject non-integral number.multipleOf constraints instead of approximating fractional divisor inclusion; runtime validation and generation still accept those schemas.
minLength 🟡 The generic string generator respects this bound, but the format and pattern paths may ignore it and rely on validator retries.
maxLength 🟡 The generic string generator respects this bound, but the format and pattern paths may ignore it and rely on validator retries.
pattern 🟡 🟡 SchemaNode::accepts_value() enforces regexes for finite-value checks using a cached matcher per pattern, but check_compat does not prove regex-language inclusion between open string schemas. Unsupported ECMAScript regex constructs are preserved as source text but treated as non-matching by the internal evaluator. Regex generation is best-effort and does not guarantee coverage for complex constructs.
format 🟡 🟡 The checker does not prove format-language inclusion between open string schemas. The fuzzer only synthesizes date, date-time, time, email, idn-email, uri, iri, uri-reference, iri-reference, uuid, ipv4, ipv6, hostname, and idn-hostname; unknown formats fall back to generic strings.
contentEncoding Canonicalized for syntax, then stripped from the semantic IR.
contentMediaType Canonicalized for syntax, then stripped from the semantic IR.
contentSchema Nested schemas are canonicalized and ref-checked, but content decoding/validation is not modeled in the semantic IR.

Object schemas

Keyword Compat Fuzz Notes
properties Property schemas are compared recursively and generated directly.
required Missing required properties are synthesized before generation returns a candidate.
additionalProperties Checker support is subject to the serializer assumption above; the fuzzer generates additional keys only when this schema is not false.
patternProperties 🟡 🟡 Exact same regex keys are compared structurally, and property values are checked against every matching pattern schema. If the superset has no patternProperties, subset patterns fall back to additionalProperties; otherwise the checker does not prove regex-language inclusion between different patterns. The fuzzer does not synthesize property names from regexes.
propertyNames 🟡 🟡 Checker support depends on subset reasoning for the property-name schema itself, so regex/format caveats still apply. The fuzzer can generate keys from string/enum schemas and otherwise falls back to random names plus acceptance checks.
minProperties Canonicalization raises minProperties to at least the number of required keys.
maxProperties Checked structurally and used as a hard upper bound during generation.
dependentRequired 🟡 The checker accounts for trigger keys admitted by properties, patternProperties, or additionalProperties. The fuzzer does not proactively synthesize dependencies, but invalid candidates are rejected by the final validator pass.
dependentSchemas Nested schemas are canonicalized for dialect/ref validation, but SchemaNodeKind::Object does not store dependentSchemas, so checker/fuzzer semantics are absent.
unevaluatedProperties Canonicalized recursively but not represented with evaluated-location bookkeeping in the resolved IR.
dependencies Legacy keyword accepted by the parser but not represented in the resolved IR; use dependentRequired / dependentSchemas for Draft 2020-12 semantics.

Array schemas

Keyword Compat Fuzz Notes
items Schema-form items is compared recursively and used for tail-item generation. Tuple arrays must use Draft 2020-12 prefixItems; legacy tuple-form items: [...] is rejected.
prefixItems Compared positionally and generated positionally.
additionalItems Legacy keyword not represented in the resolved array node, so it does not affect compatibility or generation.
minItems Defaults to minContains (or 1) when contains is present, otherwise 0.
maxItems Checked structurally and used as a hard upper bound during generation.
contains 🟡 Checker reasoning handles item-match count constraints structurally. The fuzzer tries to force or avoid matching items, but this is heuristic and may rely on retries.
minContains 🟡 Participates in structural count reasoning for contains; the fuzzer uses best-effort witness construction plus validator retries.
maxContains 🟡 Participates in structural count reasoning for contains; the fuzzer uses best-effort witness avoidance plus validator retries.
uniqueItems 🟡 The checker handles the common sound cases; the fuzzer retries and then uses a synthetic fallback object for unconstrained item schemas if duplicates persist.
unevaluatedItems Canonicalized recursively but not represented with evaluated-location bookkeeping in the resolved IR.

Applicators and conditionals

Keyword Compat Fuzz Notes
allOf 🟡 🟡 Canonicalization simplifies trivial boolean branches. The checker is branch-local and can miss proofs such as allOf[A, B] <= A when only one branch is inspected. The fuzzer merges object branches heuristically and otherwise picks one non-trivial branch.
anyOf 🟡 🟡 The checker requires every sub branch to fit sup, and for sup anyOf it looks for one branch containing all of sub; that is conservative and can miss valid disjunctive proofs. The fuzzer samples one branch and retries.
oneOf 🟡 🟡 Canonicalization rewrites provably disjoint oneOf branches to anyOf. Outside that case, checker reasoning is conservative and does not symbolically prove exact-one exclusivity, and the fuzzer uses retry-based witness counting.
not 🟡 🟡 The checker only handles shallow boolean/Any cases symbolically; other not proofs fall back to conservative false. The fuzzer generates type-mismatch and fixed candidate witnesses, then retries against the full schema.
if 🟡 🟡 SchemaNode::accepts_value() evaluates the condition and the fuzzer samples branches, but the subset checker has no dedicated symbolic rule for conditional implication beyond exact-node equality and finite const/enum probes.
then 🟡 🟡 Stored and evaluated only as part of if / then / else; checker reasoning inherits the same conditional caveat as if.
else 🟡 🟡 Stored and evaluated only as part of if / then / else; checker reasoning inherits the same conditional caveat as if.

Document shape, references, and metadata

Keyword Compat Fuzz Notes
$ref 🟡 Same-document refs to "#" and "#/..." are supported, including recursive graphs. Pure alias cycles and non-local refs are rejected with typed resolver errors. Generation is depth-limited.
$anchor Plain-name fragment anchors are rejected; only JSON Pointer refs are supported today.
$dynamicRef Dynamic refs are rejected during schema resolution.
$dynamicAnchor Dynamic anchors are rejected during schema resolution.
$id Resource-scope identifiers are rejected because reference resolution is currently same-document only.
$defs 🟡 Acts as a local ref target container; nested schemas are canonicalized and resolved, but $defs is not a semantic node by itself.
definitions 🟡 Legacy ref target container with the same caveat as $defs.
title Preserved as metadata in canonical JSON, but not used by checker/fuzzer semantics.
$comment Stripped or preserved only as metadata; no compatibility or generation semantics.
description Stripped or preserved only as metadata; no compatibility or generation semantics.
default Stripped or preserved only as metadata; default values are not synthesized or used for compatibility.
examples Stripped or preserved only as metadata; examples do not guide fuzzing.
deprecated Annotation only; ignored by checker and fuzzer semantics.
readOnly Annotation only; ignored by checker and fuzzer semantics.
writeOnly Annotation only; ignored by checker and fuzzer semantics.
$vocabulary Annotation only; no dialect negotiation semantics are modeled.

Rust workspace architecture

The Rust code is split into five crates and one CLI binary. The website under web/ is a separate frontend and is not part of the compatibility engine.

Path Package Responsibility
schema/ json_schema_ast Draft 2020-12 dialect checks, schema canonicalization, AST construction, local $ref resolution, and direct validator compilation via jsonschema
src/ jsoncompat Backward-compatibility checking over resolved ASTs, plus the jsoncompat CLI in src/bin/jsoncompat.rs
fuzz/ json_schema_fuzz Schema-guided JSON instance generation for fuzz tests and example values
python/ jsoncompat_py PyO3 bindings that expose check_compat, generate_value, and role constants
wasm/ jsoncompat_wasm wasm-bindgen bindings that expose check_compat and generate_value to JavaScript

The primary schema APIs are:

  • json_schema_ast::compile(&raw_schema), which compiles the original schema document directly with the jsonschema validator backend. Before compilation, this crate rejects any $schema declaration other than Draft 2020-12 (https://json-schema.org/draft/2020-12/schema, with an optional trailing #).
  • json_schema_ast::SchemaDocument::from_json(&raw_schema) (alias: SchemaDocument::from_json), which eagerly stores the original raw schema JSON, eagerly canonicalizes and validates the schema document once, and then lazily materializes compiled validators on first use:
    • SchemaDocument::is_valid(&value) -> Result<bool, SchemaBuildError> lazily compiles a validator from the original raw schema document.
    • SchemaDocument::canonical_schema_json() -> Result<&Value, SchemaBuildError> returns the cached canonicalized schema document as JSON so humans can inspect the rewrite directly.
  • json_schema_fuzz::ValueGenerator::generate(&SchemaDocument, GenerationConfig, rng) -> Result<Value, GenerateError>, which is the Rust generation API.

The resolved IR extension API is public because jsoncompat and json_schema_fuzz are separate crates. SchemaDocument::root() lazily resolves local refs from the canonicalized JSON and exposes the immutable graph used by compatibility analysis and generation. Each SchemaNode has an opaque NodeId for cycle-safe identity. SchemaNodeKind only contains post-resolution semantic variants; parser-only $ref and declaration metadata variants are private implementation details. Scalar/object/array constraints are represented with normalized typed domains (IntegerBounds, NumberBounds, CountRange, ContainsConstraint, PatternConstraint) instead of loose keyword pairs, and regex support is explicit via PatternSupport.

At a high level, the runtime flow is:

flowchart TD
    raw["Raw JSON Schema document"]
    compile_raw["json_schema_ast::compile(raw)\nlazily compile raw validator"]
    validate_raw["SchemaDocument::is_valid(value)\nvalidate against raw schema"]
    canonicalize["schema/src/canonicalize.rs\neagerly canonicalize syntax, validate keyword shapes,\nand fill implicit constraints"]
    canonical_json["SchemaDocument::canonical_schema_json()\ndebug canonical JSON output"]
    compile_canonical["compile(canonicalized)\nparity/debug validator"]
    validate_canonical["compiled canonical validator\nparity/debug validation"]
    parse["schema/src/ast.rs\nparse canonical JSON into a private arena-backed graph"]
    resolve["schema/src/ast.rs\nresolve local # / #/... refs by node ID and freeze to SchemaNodeKind"]
    compat["src/subset.rs + src/subset/*.rs\njsoncompat::check_compat(old_doc, new_doc, role)"]
    fuzz["fuzz/src/lib.rs\njson_schema_fuzz::ValueGenerator::generate(&SchemaDocument, ...)"]

    raw --> compile_raw --> validate_raw
    raw --> canonicalize --> parse --> resolve
    canonicalize --> canonical_json
    canonicalize --> compile_canonical --> validate_canonical
    resolve --> compat
    resolve --> fuzz
Loading

What each crate owns

  • json_schema_ast is the schema frontend and resolved IR crate. schema/src/ast.rs stores the raw input schema immediately and canonicalizes it once inside SchemaDocument::from_json() so keyword-shape validation happens at schema construction without maintaining a second validator path. The resolved graph is still built lazily by parsing that cached canonical JSON into a private arena-backed graph, resolving local recursive references, normalizing applicators, and freezing it into SchemaNodeKind. Unsupported or non-productive refs fail with typed resolver errors. SchemaDocument::is_valid() is intentionally backed by the validator compiled from the original raw schema. SchemaNode::accepts_value() is the low-level evaluator for canonicalized AST subgraphs used by compatibility/fuzzing heuristics.
  • jsoncompat is the static compatibility checker. src/lib.rs defines Role and document-level check_compat, and src/subset.rs plus src/subset/{scalar,object,array}.rs implement the actual inclusion relation (sub ⊆ sup) over SchemaNode. The checker uses SchemaNode::accepts_value() for finite-value membership checks and keeps a cycle guard for recursive subset proofs.
  • json_schema_fuzz is the value-generation engine. Its public value-generation API is ValueGenerator::generate(&SchemaDocument, GenerationConfig, rng) and it only returns values accepted by SchemaDocument::is_valid(); if the resolved schema is known to be empty, generation returns a typed GenerateError::Unsatisfiable, and if the internal candidate generator cannot find a value within its retry budget, generation returns a typed GenerateError::ExhaustedAttempts. Internally it walks the canonicalized SchemaNode graph and uses SchemaNode::accepts_value() only as a pruning heuristic for recursive subgraphs.
  • jsoncompat_py and jsoncompat_wasm are thin adapters. They parse JSON strings, call the Rust core crates, and map Rust errors/results into Python or JavaScript types.

Test strategy

  • tests/backcompat.rs checks expected serializer/deserializer compatibility outcomes for hand-authored old/new schema pairs, then fuzzes each direction to look for concrete counterexamples.
  • tests/fuzz.rs runs the JSON-Schema-Test-Suite fixture corpus through SchemaDocument::from_json, asks json_schema_fuzz::ValueGenerator for raw-valid examples, compiles SchemaDocument::canonical_schema_json() to assert canonicalization parity, and additionally checks SchemaNode::accepts_value() against the canonicalized validator on schemas that stay within the internal evaluator's supported subset. Fixtures that rely on unsupported reference-resource features are skipped when schema build returns a typed resolver error, and known generation gaps are tracked by explicit GenerateError::ExhaustedAttempts whitelist entries.
  • schema/src/canonicalize/integration_tests.rs and schema/src/roundtrip_tests.rs test canonicalization and AST round-tripping directly.

Debugging canonicalization

To inspect the canonicalized schema document that backs compatibility checks and generation, compare the raw schema you passed to SchemaDocument::from_json() with SchemaDocument::canonical_schema_json(), and compile the canonical JSON with json_schema_ast::compile() if you need validator-level parity checks on representative instances. Canonicalization is an internal library facility, not a jsoncompat CLI subcommand.

Development

Requirements:

Run bootstrap.sh to install the necessary dependencies.

Run tests:

just check

Run the end-to-end CLI demo/smoke test:

cargo run --bin jsoncompat -- demo

By default the demo pauses before each step. For CI/non-interactive runs, pass:

cargo run --bin jsoncompat -- demo --noninteractive

Run the performance benchmark harnesses:

just bench

The schema operation benchmarks use a fixed handpicked corpus under benches/fixtures so broad fuzz fixture changes do not move the benchmark baseline.

For a fast smoke check of the benchmark binary:

just bench-check

See the Justfile for more commands

Releasing

just release will dry-run the release process for a patch release.

Right now, releases to PyPI and npm are done in CI via manual dispatch of the CI workflow on a tag. Releases to cargo are done manually for now.

Merging to main will trigger a release of the website.