feat: Enhance data regression fixture with JSON support and utilities#243
Open
MitchellAcoustics wants to merge 14 commits intoESSS:masterfrom
Open
feat: Enhance data regression fixture with JSON support and utilities#243MitchellAcoustics wants to merge 14 commits intoESSS:masterfrom
MitchellAcoustics wants to merge 14 commits intoESSS:masterfrom
Conversation
…egression fixture
When using the JSON path, the file is opened and json.dump writes directly to disk. If serialization fails (TypeError for non-serializable objects), this can leave an empty/partial expected/obtained file behind, unlike the YAML path which renders to bytes before opening the file. serializing to a string first (e.g., json.dumps) and only writing after successful serialization. Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
The docstring says this function recursively sorts dicts, but the implementation only recurses into mapping values and won’t sort dicts nested inside sequences (e.g., [{...}, {...}]). Consider extending it to walk MutableSequence values too, or tightening the docstring/name to match the actual behavior.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <copilot@github.com>
Agent-Logs-Url: https://github.com/MitchellAcoustics/pytest-regressions/sessions/ecff2466-7c11-4a25-87cb-0ebd388921c3 Co-authored-by: MitchellAcoustics <22335636+MitchellAcoustics@users.noreply.github.com>
Enhance data regression fixture with JSON support and formatting
for more information, see https://pre-commit.ci
nicoddemus
requested changes
Apr 24, 2026
Comment on lines
+64
to
+66
| :param extension: Extension of the file. Defaults to ".yml". | ||
| If equal to ".json", expects `data_dict` to be JSON serializable | ||
| and dumps it using standard `json.dump`. |
Member
There was a problem hiding this comment.
As I commented in the issue, lets use an enum instead. 👍
| return f"'{libname}' library is an optional dependency and must be installed explicitly when the fixture 'check' is used" | ||
|
|
||
|
|
||
| def sort_dict_by_keys(data: MutableMapping[Any, Any]) -> MutableMapping[Any, Any]: |
Member
There was a problem hiding this comment.
Why should we always sort the dict keys? Dicts are order preserving, in fact I can think of a few situations where users might want to regress against the original order, without sorting.
Let's remove key sorting altogether from this change, users can sort the dict themselves if they require that, I think.
Comment on lines
+79
to
+100
| if extension.lower() in [".yml", ".yaml"]: | ||
| dumped_str = yaml.dump_all( | ||
| [data_dict], | ||
| Dumper=RegressionYamlDumper, | ||
| default_flow_style=False, | ||
| allow_unicode=True, | ||
| indent=2, | ||
| encoding="utf-8", | ||
| ) | ||
| with filename.open("wb") as f: | ||
| f.write(dumped_str) | ||
| elif extension.lower() == ".json": | ||
| dumped_str = json.dumps( | ||
| data_dict, indent=2, sort_keys=True, ensure_ascii=False | ||
| ) | ||
| with filename.open("w", encoding="utf-8") as f: | ||
| f.write(dumped_str) | ||
| else: | ||
| raise NotImplementedError( | ||
| f"file extension `{extension}` is not supported by data_regression; " | ||
| "supported extensions are '.yml', '.yaml', '.json'" | ||
| ) |
Member
There was a problem hiding this comment.
Let's use a match against the enum, with an assert_never check for the default case.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #242
This pull request adds support for using JSON as an output format for the
data_regressionfixture, in addition to YAML. It introduces a recursive dictionary sorting utility to ensure consistent key ordering, updates the regression check logic to handle the new format, and adds comprehensive tests for the new functionality.New output format support:
data_regression.checkmethod now accepts anextensionparameter (defaulting to".yml"), allowing regression data to be written as either YAML or JSON. If".json"is specified, data is serialized usingjson.dumpswith sorted keys and written as UTF-8 text. If an unsupported extension is given, a clear error is raised. [1] [2] [3]Data normalization:
sort_dict_by_keysrecursively sorts dictionary keys (including nested dicts within lists), ensuring consistent output for regression checks and JSON serialization.data_dictis now normalized withsort_dict_by_keysbefore being dumped, ensuring stable ordering across runs and formats.Testing improvements:
.jsonfiles to the test data to validate JSON output. [1] [2]Imports and refactoring:
jsonand the new utility are added where needed. [1] [2]These changes make the regression framework more flexible and robust, especially for users who prefer or require JSON output.