Skip to content

fix: scd2 unmapped insert for nested tables#3812

Open
deschman wants to merge 2 commits intodlt-hub:develfrom
deschman:fix/3811-scd2-nested-table-inserts
Open

fix: scd2 unmapped insert for nested tables#3812
deschman wants to merge 2 commits intodlt-hub:develfrom
deschman:fix/3811-scd2-nested-table-inserts

Conversation

@deschman
Copy link
Copy Markdown

Description

For nested tables loaded with the scd2 merge strategy, the follow-up SQL that inserts new rows from the staging table into the destination used INSERT INTO … SELECT *. That ties column order to whatever the bulk writer produced in staging. On Spark/Databricks (ANSI mode), if staging and destination disagree on a column’s type—e.g. schema drift or evolving JSON—SELECT * can produce values that do not assign cleanly to the target table and surfaces as obscure DATATYPE_MISMATCH errors.

This change generates an explicit column list from the destination table schema (excluding the SCD2 from/to metadata columns) for both the INSERT target list and the SELECT, so values are aligned by column name with the destination DDL instead of by staging column order.

Related Issues

Additional Context

  • Repro context and discussion: weak/nested schemas where child fields appear or disappear between loads; DuckDB often tolerates this with name-based parquet loads, while Databricks failed on the previous SELECT * pattern.
  • I have read CONTRIBUTING.md.
  • I ran make test-common locally and the suite was green.

@rudolfix rudolfix self-assigned this Mar 31, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

scd2 nested table inserts are nondeterministic

2 participants