-
Notifications
You must be signed in to change notification settings - Fork 234
command dump
Serialize a document into a replayable batch script — the round-trip mechanism for editing a document by emit → modify → replay.
officecli dump <file> <path> [--format batch] [-o <out>] [--json]Walks the document and emits a JSON BatchItem[] array that, when replayed via officecli batch, reconstructs the source document. Currently .docx only.
The dump is portable: unstable IDs (paraId / rsidR / textId) and derived effective.* readbacks are filtered out. The OpenXML SDK regenerates IDs on save, so emit just stays out of the way.
| Name | Type | Required | Default | Description |
|---|---|---|---|---|
file |
path | Yes | - | Document path (.docx only in current release) |
path |
string | Yes | - | DOM path to dump. / emits the whole document; subtree paths emit just that subtree without bundling sibling resources. Supported: /, /body, /body/p[N], /body/tbl[N], and resource parts /theme, /settings, /numbering, /styles. Subtree emit uses last() xpath predicates so the script is safe to replay onto non-blank documents. |
| Name | Type | Required | Default | Description |
|---|---|---|---|---|
--format |
string | No | batch |
Output format. Currently only batch is supported. |
-o / --out
|
path | No | - | Write output to file instead of stdout. Stdout output is the path on success. |
--json |
bool | No | false |
Standard JSON envelope wrapper (the batch payload itself is always JSON). |
v1.0.73 hardened the round-trip extensively: bookmarks (cross-paragraph spans), TOC fields with
\t/\bswitches, page-background color, hyperlink tooltip/tgtFrame/history,eastAsianLayout, paragraph-mark-only run formatting (markRPr.*), tables in headers/footers, columns + vAlign on inline section breaks,fldSimple/oMathinside hyperlinks/ins/del/footnotes,ruby/smartTag/customXmlwrappers,cantSplitrows,tcWpercent semantics, asymmetrictcMarpadding,w:symruns,noBreakHyphen/softHyphen, soft<w:br/>line breaks,ListItemSDT,MERGEFIELDwhitespace quoting, complex-field HYPERLINKs, comment dates, PAGE field, header/footer types from sections, lineRule (atLeast/exact/auto), char-based indents, w14 ligatures/numForm/numSpacing, ins/del track-change attribution.
| Layer | Mechanism |
|---|---|
/styles |
Emitted before body so paragraph styleId refs resolve on replay |
/body paragraphs |
Single-run paragraphs collapse into one add p row; multi-run paragraphs split into paragraph + run child rows |
| Tables and mixed body content | Typed add rows |
| Section page layout |
set / on the root for page width/height/margins/columns/etc. |
| Inline section breaks | Section breaks inside the body emitted alongside their paragraph |
docDefaults and document protection |
Emitted alongside section layout |
| Headers and footers | Seed paragraph + appended content per-part |
| Comments / footnote refs / endnote refs | Anchored to the body paragraphs they reference |
| Numbering | Emitted wholesale via raw-set when document has list templates |
| Settings part | Emitted wholesale via raw-set
|
| Theme part | Emitted wholesale via raw-set
|
| Charts | Typed add (chartType + data string) — not raw-set |
| Pictures | Inlined as data URIs through the src= prop |
Format keys are forwarded as-is; the OOXML schema reflection fallback in the Add side accepts arbitrary props, so emit doesn't need a per-key allowlist.
# Whole document to stdout
officecli dump report.docx /
# Write to a batch file
officecli dump report.docx / -o report.batch.json
# Subtree: just one paragraph
officecli dump report.docx /body/p[3]
# Subtree: a single table or a resource part
officecli dump report.docx /body/tbl[1]
officecli dump report.docx /numbering
# Round-trip: dump → batch
officecli dump report.docx / -o /tmp/r.json
officecli create rebuilt.docx --type docx
officecli batch rebuilt.docx --input /tmp/r.json-
--out -is treated as stdout (not a file literally named-). - With
--json, the envelope'sdatacarriesoutputFile+itemCountmetadata, not a bare path. - TOC PAGEREF page numbers are preserved on round-trip but not recalculated — run
refreshafterward to update them.
- batch — replay the emitted JSON (defaults to continue-on-error)
- refresh — recalculate TOC / PAGE fields after replay
- Word reference
Based on OfficeCLI v1.0.75