What's Changed
- Bring https://github.com/gabotechs/datafusion-distributed-experiment code by @gabotechs in #68
- Adds error serialization-deserialization by @gabotechs in #69
- Remove stage delegation in favor of planning-time stage assignation by @gabotechs in #71
- Fix rust toolchain to 1.83.0 by @gabotechs in #72
- Completed execution path + failing test by @robtandy in #74
- Fix serialization error by @LiaCastaneda in #76
- Small cleanup after #74 by @gabotechs in #75
- Fix ArrowFlightReadExec result streaming by @gabotechs in #77
- Add stage planner tests by @gabotechs in #78
- Split ArrowFlightReadExec node placement for distributed planning by @gabotechs in #79
- Update DataFusion version from 48.0.0 to 49.0.0 by @gabotechs in #82
- add doc comment for execution stage struct by @robtandy in #80
- Support user provided codecs by @gabotechs in #81
- Move all test utils to src/ and hide them behind an "integration" feature by @gabotechs in #84
- Add test comparing distributed + single node execution on TPCH data by @jayshrivastava in #83
- Execution working on all 22 TPCH queries by @robtandy in #89
- Add delta report for benchmarks by @gabotechs in #91
- Removes an extra line jump in distributed explains by @gabotechs in #95
- Create TTL map with time wheel architecture by @jayshrivastava in #96
- Fix compilation errors and warnings by @gabotechs in #102
- Introduce
ConfigExtensionExt, allowing the propagation of arbitraryConfigExtensions across network boundaries by @gabotechs in #100 - Nested Loop Joins (fixes TPCH query 22) by @robtandy in #104
- Improve
SessionBuilderergonomy and fix clippy errors by @gabotechs in #103 - Collect Left Hash Joins by @robtandy in #105
- Introduce
DistributedExttrait that extends the capabilities of DataFusion's session building tools by @gabotechs in #106 - Add plan validations to TPCH tests by @gabotechs in #107
- do_get: use TTL map to store task state by @jayshrivastava in #108
- Refactor arrow_flight_read.rs and friends by @gabotechs in #109
- Add
localhost_run.rsandlocalhost_worker.rsexamples by @gabotechs in #111 - Add README.md and LICENSE.txt by @gabotechs in #114
- Fix panics in tests and un-ignore working tests by @gabotechs in #120
- Improve EXPLAIN render by @gabotechs in #121
- Bigger TPCH tests by @gabotechs in #122
- File name and folder restructure by @gabotechs in #124
- Refactor do_get.rs and adjacent files by @gabotechs in #125
- Adds in-memory example by @gabotechs in #132
- Add support for in-memory TPCH tests by @gabotechs in #129
- Comment flaky test by @gabotechs in #133
- Support
--threadsand--workerson TPCH benchmarks by @gabotechs in #130 - Report host stats on TPCH benchmarks by @gabotechs in #131
- Robtandy/better graphviz plans by @robtandy in #135
- changes to allow nice graphviz of single node plans too by @robtandy in #136
- metrics: add metrics module and protos by @jayshrivastava in #141
- fix bug in graphviz for determining output partitions by @robtandy in #142
- move chrono out of optional deps so project can compile by @robtandy in #143
- execution_plans: add metrics collector and re-writer by @jayshrivastava in #144
- Distributed planning overhaul by @gabotechs in #145
- Update README.md with new diagrams based on NetworkShuffleExec and NetworkCoalesceExec by @gabotechs in #153
- Do not require default datafusion features by @gabotechs in #154
- set msrv via Cargo.toml, use 2024 edition by @adriangb in #152
- remove feature flags around chrono::DateTime by @adriangb in #155
- fix: Move
error.rsto protobuf by @jonathanc-n in #156 - execution_plans: add MetricsCollectingStream by @jayshrivastava in #150
- flight_service: add TrailingFlightDataStream by @jayshrivastava in #157
- fix: Incorrect weather parquet path in examples by @zuston in #165
- Generalize functions for NetworkCoalesceExec creation by @jonathanc-n in #162
- fix: Enable distributed plan for localhost_run by @zuston in #166
- Address public api weak points by @gabotechs in #158
- Add partition coalescing at the head of the plan by @gabotechs in #164
- Fix early drop stateful nodes by @gabotechs in #159
- Remove unnecessary StageExec proto serde overhead by @gabotechs in #163
- flight_service: emit metrics from ArrowFlightEndpoint by @jayshrivastava in #160
- Evolve
ChannelResolvertrait for requiring aFlightServiceClientinstead of atonic::BoxSyncCloneChannelby @gabotechs in #172 - update to DataFusion 50 by @adriangb in #146
- Use upstream composed extension codec by @gabotechs in #176
- Fix Dictionary Encoded Values by @cetra3 in #174
- Rework execution plan hierarchy for better interoperability by @gabotechs in #178
- Fix in-memory example by @gabotechs in #183
- Misc improvements to public API by @gabotechs in #181
- Add DistributedPlanError::NonDistributable rule and do not distribute SHOW COLUMNS by @gabotechs in #195
- implement distributed EXPLAIN ANALYZE by @jayshrivastava in #182
- Refactor distributed planner into its own folder by @gabotechs in #196
- Fix user provided UDFs encoding by @gabotechs in #200
- Add dynamic task config based on DataFusion extension options by @gabotechs in #197
- Add arrow flight endpoint hooks by @gabotechs in #198
- Rollback
PlanDependentUsizeby @gabotechs in #201 - metrics: make MetricsWrapperExec transparent by @jayshrivastava in #202
- Return input task count in NetworkBoundary by @gabotechs in #204
- remove schema adapter by @adriangb in #205
- allow configuring message sizes by @adriangb in #207
- Adds test ensuring dictionary corruption does not occur anymore by @marc-pydantic in #208
- Struct is public only within crate. by @JSOD11 in #210
- Create stage key constructor by @JSOD11 in #211
- Update Readme with working git-lfs instructions by @JSOD11 in #212
- Add testing for distributed codec by @JSOD11 in #217
- fix(datafusion-distributed): relax ttl map timing constraints by @JSOD11 in #218
- Rework task assignation mechanism by @gabotechs in #216
- fix(ttl_map): two minor fixes for ttl_map configuration parsing by @ruchirK in #225
- fix(ttl_map): initialize time to 1 for correct wrapping sub by @ruchirK in #224
- Add AWS CDK-based benchmarking environment by @gabotechs in #227
- Improve TPCH gen command by @gabotechs in #230
- free up disk space in CI runners by @jayshrivastava in #234
- Split TPCH tests in correctness, planning and explain analyze by @gabotechs in #232
- Upgrade to DF v51 by @ahmed-mez in #236
- Upgrade benchmarks to v51 by @ahmed-mez in #238
- Faster CDK deploys by @gabotechs in #241
- Add batch coalescing in NetworkShuffleExec operations by @gabotechs in #242
- add tpc-ds tests and property-based testing utilities by @jayshrivastava in #231
- Add failing option extension propagation test by @gabotechs in #246
- Add Trino to CDK benchmarks by @gabotechs in #244
- Fix network boundary deadlocks by @gabotechs in #240
- Fix build without
integrationfeature by @gabotechs in #249 - Collect metrics optionally by @gabotechs in #245
- Rework integration tests for not constructing the plans manually by @gabotechs in #257
- Add failing test for UNIONs in the distributed planner by @gabotechs in #258
- Add Sphinx-based docs by @gabotechs in #254
- Add Distributed DataFusion cli by @gabotechs in #261
- Rework distributed planning logic by @gabotechs in #259
- Bump arrow version to 57.1.0 by @gabotechs in #268
- Make queries in TPC-DS get distributed by @gabotechs in #264
- Split channel resolver in two by @gabotechs in #265
- Refactor benchmarks crate and add TPC-DS benchmarks by @gabotechs in #269
- Add Clickbench tests and benchmarks by @gabotechs in #270
- Distribute UNION operations by @gabotechs in #262
- Update missing documentation by @gabotechs in #266
- Apply TaskEstimator on all nodes by @gabotechs in #271
- Rename all arrow flight endpoint references to "Worker" by @gabotechs in #274
- Improve default task estimator by @gabotechs in #275
- Adapt remote benchmarks to support more datasets by @gabotechs in #276
- Improve docs and add custom execution plan example by @gabotechs in #277
- Rework AWS CDK code for the benchmarking cluster and add ballista to benchmarks by @gabotechs in #278
- Add Spark to remote benchmarks by @gabotechs in #280
- Doc fixes by @JSOD11 in #288
- add binary executable crate for observability tool by @EdsonPetry in #290
- Identify benchmark queries as arbitrary strings by @gabotechs in #282
- Worker network stream optimizations by @gabotechs in #283
- Remove ballista from benchmarks by @gabotechs in #292
- Move forward to datafusion 52 by @marc-pydantic in #291
- Fix header extension propagation by @gabotechs in #298
- Replace explain_analyze.rs tests with more scoped metrics_collection.rs tests by @gabotechs in #294
- Allow sending compressed data over the wire by @gabotechs in #301
- Add new
batch_coalescing_below_network_boundariesoptimization pass by @gabotechs in #299 - Distributed hive-style join testing with superset satisfaction by @JSOD11 in #256
- Support OutputBytes, OutputBatches, PruningMetrics, and Ratio metrics by @jayshrivastava in #305
- Clippy fixes by @JSOD11 in #303
- Gene.bordegaray/2025/12/add broadcast exec by @gene-bordegaray in #279
- Add metrics to network boundaries by @gabotechs in #306
- Misc improvements for benchmarks by @gabotechs in #300
- Add functionality for worker-console communication by @EdsonPetry in #297
- Optionally display metrics per task (version 2) by @jayshrivastava in #309
- Add fast deploys by @gabotechs in #315
- Fix proto generation script
out_dirpath by @EdsonPetry in #317 - Fix metrics collection bug: preserve PartitionIsolatorExec in plan by @jayshrivastava in #314
- Change remote cluster benchmark instances to c5n.2xlarge by @gabotechs in #321
- Handle stream cancellations gracefully by @gabotechs in #320
- Branch name in benchmarks by @gabotechs in #325
- fix bench data generation and docs by @gene-bordegaray in #327
- Put arrow back to 57.1.0 by @gabotechs in #329
- fix: metrics displaying on new line by @jayshrivastava in #328
- Update README.md by @gabotechs in #334
- Replace TTL Map with moka async cache by @EdsonPetry in #339
- Fix metrics display on leaf nodes by @gabotechs in #336
- feat: Support multiple output tasks from NetworkCoalesceExec by @gabrielkerr in #323
- add
network_latency_*metrics for network* nodes by @jayshrivastava in #322 - Add shuffle batch size to distributed config by @gabotechs in #332
- feat: Add support for passthrough headers to worker nodes by @geoffreyclaude in #340
- Gene.bordegaray/2026/01/improve broadcast cache by @gene-bordegaray in #324
- Claude Skill for benchmarks by @gabotechs in #337
- test: reproduce metrics loss on early stream termination by @gferrate in #341
- Measure latency server-side in benchmarks by @gabotechs in #344
- metrics: add plan_bytes_sent to network* nodes by @jayshrivastava in #349
- Add task progression tracking in ddf console by @EdsonPetry in #330
- make DISTRIBUTED_DATAFUSION_TASK_ID_LABEL constant public by @jayshrivastava in #351
- Gene.bordegaray/2026/02/improve remote bench guide by @gene-bordegaray in #350
- Use CustomMetricValue implementations for latency related metrics by @gabotechs in #358
- Add max_tasks_per_sate config parameter by @gabotechs in #355
- Add shuffle benchmark by @gabotechs in #362
- Send subplans in the coordinator stage rather than in NetworkBoundary nodes by @gabotechs in #345
- add count and total to network latency metrics by @kurtvolmar in #364
- Fix metrics display
bytes_transferredas human-readable bytes by @EdsonPetry in #367 - Percentile latencies by @gabotechs in #368
- Upgrade datafusion version to v52.3.0 by @jayshrivastava in #369
- feat(console): rewrite TUI with worker-centric dashboard and live metrics by @EdsonPetry in #363
- Revert "Upgrade datafusion version to v52.3.0" by @gabotechs in #371
- Fix system metrics never collecting due to async closure in
thread::spawnby @EdsonPetry in #379 - Add automatic worker discovery to console by @EdsonPetry in #372
- Row count not propagated in RecordBatches by @gabotechs in #387
- Move to dedicated Worker gRPC specification instead of Arrow Flight by @gabotechs in #375
- Fixed exec_err! message by @sandugood in #388
- Upgrade to datafusion 53.0.0 by @sandugood in #391
- default and verbose DisplayAs formatting for PartitionIsolatorExec by @kurtvolmar in #398
- Add support for worker versioning. by @EdsonPetry in #389
- [docs] Fix a typo in variable name from 'sate' to 'state' by @martin-g in #403
- fix: restore Stage plan during deserialization round-trip by @gabrielkerr in #404
- Shuffle project files for preparing to crates.io release by @gabotechs in #405
New Contributors
- @adriangb made their first contribution in #152
- @jonathanc-n made their first contribution in #156
- @zuston made their first contribution in #165
- @cetra3 made their first contribution in #174
- @marc-pydantic made their first contribution in #208
- @JSOD11 made their first contribution in #210
- @ruchirK made their first contribution in #225
- @ahmed-mez made their first contribution in #236
- @EdsonPetry made their first contribution in #290
- @gene-bordegaray made their first contribution in #279
- @gabrielkerr made their first contribution in #323
- @geoffreyclaude made their first contribution in #340
- @gferrate made their first contribution in #341
- @kurtvolmar made their first contribution in #364
- @sandugood made their first contribution in #388
- @martin-g made their first contribution in #403
Full Changelog: https://github.com/datafusion-contrib/datafusion-distributed/commits/v1.0.0