Skip to content

command dump

zmworm edited this page May 7, 2026 · 3 revisions

dump

Serialize a document into a replayable batch script — the round-trip mechanism for editing a document by emit → modify → replay.

Synopsis

officecli dump <file> <path> [--format batch] [-o <out>] [--json]

Description

Walks the document and emits a JSON BatchItem[] array that, when replayed via officecli batch, reconstructs the source document. Currently .docx only.

The dump is portable: unstable IDs (paraId / rsidR / textId) and derived effective.* readbacks are filtered out. The OpenXML SDK regenerates IDs on save, so emit just stays out of the way.

Arguments

Name Type Required Default Description
file path Yes - Document path (.docx only in current release)
path string Yes - DOM path to dump. / emits the whole document; subtree paths emit just that subtree without bundling sibling resources. Supported: /, /body, /body/p[N], /body/tbl[N], and resource parts /theme, /settings, /numbering, /styles. Subtree emit uses last() xpath predicates so the script is safe to replay onto non-blank documents.

Options

Name Type Required Default Description
--format string No batch Output format. Currently only batch is supported.
-o / --out path No - Write output to file instead of stdout. Stdout output is the path on success.
--json bool No false Standard JSON envelope wrapper (the batch payload itself is always JSON).

What's emitted

v1.0.73 hardened the round-trip extensively: bookmarks (cross-paragraph spans), TOC fields with \t/\b switches, page-background color, hyperlink tooltip/tgtFrame/history, eastAsianLayout, paragraph-mark-only run formatting (markRPr.*), tables in headers/footers, columns + vAlign on inline section breaks, fldSimple/oMath inside hyperlinks/ins/del/footnotes, ruby/smartTag/customXml wrappers, cantSplit rows, tcW percent semantics, asymmetric tcMar padding, w:sym runs, noBreakHyphen/softHyphen, soft <w:br/> line breaks, ListItem SDT, MERGEFIELD whitespace quoting, complex-field HYPERLINKs, comment dates, PAGE field, header/footer types from sections, lineRule (atLeast/exact/auto), char-based indents, w14 ligatures/numForm/numSpacing, ins/del track-change attribution.

Layer Mechanism
/styles Emitted before body so paragraph styleId refs resolve on replay
/body paragraphs Single-run paragraphs collapse into one add p row; multi-run paragraphs split into paragraph + run child rows
Tables and mixed body content Typed add rows
Section page layout set / on the root for page width/height/margins/columns/etc.
Inline section breaks Section breaks inside the body emitted alongside their paragraph
docDefaults and document protection Emitted alongside section layout
Headers and footers Seed paragraph + appended content per-part
Comments / footnote refs / endnote refs Anchored to the body paragraphs they reference
Numbering Emitted wholesale via raw-set when document has list templates
Settings part Emitted wholesale via raw-set
Theme part Emitted wholesale via raw-set
Charts Typed add (chartType + data string) — not raw-set
Pictures Inlined as data URIs through the src= prop

Format keys are forwarded as-is; the OOXML schema reflection fallback in the Add side accepts arbitrary props, so emit doesn't need a per-key allowlist.

Examples

# Whole document to stdout
officecli dump report.docx /

# Write to a batch file
officecli dump report.docx / -o report.batch.json

# Subtree: just one paragraph
officecli dump report.docx /body/p[3]

# Subtree: a single table or a resource part
officecli dump report.docx /body/tbl[1]
officecli dump report.docx /numbering

# Round-trip: dump → batch
officecli dump report.docx / -o /tmp/r.json
officecli create rebuilt.docx --type docx
officecli batch rebuilt.docx --input /tmp/r.json

Notes

  • --out - is treated as stdout (not a file literally named -).
  • With --json, the envelope's data carries outputFile + itemCount metadata, not a bare path.
  • TOC PAGEREF page numbers are preserved on round-trip but not recalculated — run refresh afterward to update them.

See Also

  • batch — replay the emitted JSON (defaults to continue-on-error)
  • refresh — recalculate TOC / PAGE fields after replay
  • Word reference

Based on OfficeCLI v1.0.75

Clone this wiki locally