Skip to content

feat: Enhance data regression fixture with JSON support and utilities#243

Open
MitchellAcoustics wants to merge 14 commits intoESSS:masterfrom
MitchellAcoustics:master
Open

feat: Enhance data regression fixture with JSON support and utilities#243
MitchellAcoustics wants to merge 14 commits intoESSS:masterfrom
MitchellAcoustics:master

Conversation

@MitchellAcoustics
Copy link
Copy Markdown

Closes #242

This pull request adds support for using JSON as an output format for the data_regression fixture, in addition to YAML. It introduces a recursive dictionary sorting utility to ensure consistent key ordering, updates the regression check logic to handle the new format, and adds comprehensive tests for the new functionality.

New output format support:

  • The data_regression.check method now accepts an extension parameter (defaulting to ".yml"), allowing regression data to be written as either YAML or JSON. If ".json" is specified, data is serialized using json.dumps with sorted keys and written as UTF-8 text. If an unsupported extension is given, a clear error is raised. [1] [2] [3]

Data normalization:

  • A new utility function sort_dict_by_keys recursively sorts dictionary keys (including nested dicts within lists), ensuring consistent output for regression checks and JSON serialization.
  • The data_dict is now normalized with sort_dict_by_keys before being dumped, ensuring stable ordering across runs and formats.

Testing improvements:

  • Test suite enhancements include parameterized tests to verify both YAML and JSON outputs, a test for the new dictionary sorting function, and a test to ensure unsupported extensions raise the correct error. [1] [2] [3] [4]
  • Adds new .json files to the test data to validate JSON output. [1] [2]

Imports and refactoring:

  • Imports for json and the new utility are added where needed. [1] [2]

These changes make the regression framework more flexible and robust, especially for users who prefer or require JSON output.

MitchellAcoustics and others added 12 commits February 12, 2026 14:23
When using the JSON path, the file is opened and json.dump writes directly to disk. If serialization fails (TypeError for non-serializable objects), this can leave an empty/partial expected/obtained file behind, unlike the YAML path which renders to bytes before opening the file. serializing to a string first (e.g., json.dumps) and only writing after successful serialization.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
The docstring says this function recursively sorts dicts, but the implementation only recurses into mapping values and won’t sort dicts nested inside sequences (e.g., [{...}, {...}]). Consider extending it to walk MutableSequence values too, or tightening the docstring/name to match the actual behavior.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <copilot@github.com>
Agent-Logs-Url: https://github.com/MitchellAcoustics/pytest-regressions/sessions/ecff2466-7c11-4a25-87cb-0ebd388921c3

Co-authored-by: MitchellAcoustics <22335636+MitchellAcoustics@users.noreply.github.com>
Enhance data regression fixture with JSON support and formatting
Comment on lines +64 to +66
:param extension: Extension of the file. Defaults to ".yml".
If equal to ".json", expects `data_dict` to be JSON serializable
and dumps it using standard `json.dump`.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I commented in the issue, lets use an enum instead. 👍

return f"'{libname}' library is an optional dependency and must be installed explicitly when the fixture 'check' is used"


def sort_dict_by_keys(data: MutableMapping[Any, Any]) -> MutableMapping[Any, Any]:
Copy link
Copy Markdown
Member

@nicoddemus nicoddemus Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why should we always sort the dict keys? Dicts are order preserving, in fact I can think of a few situations where users might want to regress against the original order, without sorting.

Let's remove key sorting altogether from this change, users can sort the dict themselves if they require that, I think.

Comment on lines +79 to +100
if extension.lower() in [".yml", ".yaml"]:
dumped_str = yaml.dump_all(
[data_dict],
Dumper=RegressionYamlDumper,
default_flow_style=False,
allow_unicode=True,
indent=2,
encoding="utf-8",
)
with filename.open("wb") as f:
f.write(dumped_str)
elif extension.lower() == ".json":
dumped_str = json.dumps(
data_dict, indent=2, sort_keys=True, ensure_ascii=False
)
with filename.open("w", encoding="utf-8") as f:
f.write(dumped_str)
else:
raise NotImplementedError(
f"file extension `{extension}` is not supported by data_regression; "
"supported extensions are '.yml', '.yaml', '.json'"
)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use a match against the enum, with an assert_never check for the default case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature request: JSON as an alternative serialization format for data_regression

3 participants