Skip to content

ingest/ledgerbackend: integrate XDR views into BSB + add GetLedgerRaw#5941

Merged
tamirms merged 15 commits into
stellar:mainfrom
tamirms:xdr-views-pr2
May 12, 2026
Merged

ingest/ledgerbackend: integrate XDR views into BSB + add GetLedgerRaw#5941
tamirms merged 15 commits into
stellar:mainfrom
tamirms:xdr-views-pr2

Conversation

@tamirms
Copy link
Copy Markdown
Contributor

@tamirms tamirms commented May 4, 2026

Summary

Builds on the XDR view types (#5937) by integrating them into BufferedStorageBackend, adding a GetLedgerRaw method to the LedgerBackend interface, and tuning the BSB pipeline for higher throughput.

Views integration + GetLedgerRaw

  • BSB rewrite to use views: Replaces eager full-batch decode (xdr.LedgerCloseMetaBatch) with view-based parse (xdr.LedgerCloseMetaBatchView). Per-ledger byte slices are extracted once via metas.All() and stored on the BSB struct. GetLedger decodes a single ledger on demand from the cached bytes; GetLedgerRaw returns the raw bytes without decoding.
  • GetLedgerRaw across all backends: Added to the LedgerBackend interface with implementations on every backend:
    • BufferedStorageBackend: returns a copy of the per-ledger view slice from the cached batch.
    • CaptiveStellarCore: cache changed from a decoded cachedMeta *xdr.LedgerCloseMeta to raw frame bytes plus sequence (cachedRaw []byte + cachedSeq uint32, extracted via view at frame time). GetLedger decodes lazily from cachedRaw; GetLedgerRaw returns a copy. Both share a fetchSequence helper. The meta-pipe reader (bufferedLedgerMetaReader) now returns raw frame bytes instead of pre-decoding.
    • RPCLedgerBackend: buffer changed from map[uint32]xdr.LedgerCloseMeta to map[uint32][]byte holding the base64-decoded raw bytes. GetLedger does the XDR decode on demand from the cached bytes; GetLedgerRaw returns a copy.
    • metricsLedgerBackend: passthrough that records the same ledger_fetch_duration_seconds summary as GetLedger.
    • loadtest.LedgerBackend: baseline implementation calling GetLedger + MarshalBinary.
    • MockDatabaseBackend and loadtest's mockLedgerBackend: testify-mock stubs.
  • xdr/ledger_close_meta_view.go (new): Convenience helpers on LedgerCloseMetaView (LedgerSequence(), LedgerHash(), PreviousLedgerHash()) that mirror the equivalents on LedgerCloseMeta. Used by BSB and CaptiveCore to extract sequence/hash fields without paying the full XDR decode.
  • Idempotent re-request consistency: All backends now consistently allow re-requesting the most-recently-served sequence without erroring or advancing state (BSB and RPC previously diverged from CaptiveStellarCore).
  • Per-frame allocation in meta pipe reader: bufferedLedgerMetaReader now allocates a fresh slice per frame (replacing the reusable frameBuf). Required for correctness — the raw frame bytes must outlive the reader's read state to be passed through to consumers.
  • DataStore.GetFile size return: Signature changed to (io.ReadCloser, int64, error) so ledger_buffer can pre-allocate the zstd decompression buffer using the compressed file size, avoiding repeated buffer growth/realloc on each ledger object. Size is -1 if unknown (chunked transfer). Affects all DataStore implementations (gcs, s3, filesystem, mocks) and their tests.

BSB pipeline tuning

  • Default BufferSize = 10000 for small-file schemas (was 100). The old default throttled workers via the per-consumer-pull pushTaskQueue feedback loop. With the buffer sized to the range cap (newLedgerBuffer caps internally), workers see all queued tasks upfront. This change is independent of the views work and benefits upstream too if applied there.
  • Decompression moved from workers to consumer. Each zstd.DecodeAll allocates ~9 MB of internal sequenceDecs working memory; with N concurrent workers the GC pressure stole CPU from the pipeline. The consumer now decompresses one batch at a time on a single shared zstd.Decoder. The ~100 µs per-batch cost is dwarfed by the consumer's XDR work.
  • Channel-based buffer poolscompressedPool (~14 KB worker download buffers) and decompressedPool (~50 KB consumer batch buffers). Sidesteps sync.Pool's GC overhead on hot-path large allocations. Decompression buffer is pre-allocated using zstd.Header.FrameContentSize when present.
  • Cancellation safety in pushTaskQueue and storeObject: both now select on ctx.Done() during their taskQueue / ledgerQueue channel sends. Without this, a worker could hang on a full queue forever after cancellation (and subsequently block wg.Wait() in close()).
  • Granular currentLedgerLock in storeObject: held only for the brief currentLedger increment, not across the channel send. Holding it across the send would block GetLatestLedgerSequence callers whenever ledgerQueue is full.
  • Local bufferSize variable in newLedgerBuffer: instead of mutating bsb.config.BufferSize. Avoids a latent bug where a second PrepareRange with a different range would see the cap from the first call.

Performance

GCS-backed throughput at NumWorkers=100, BufferSize=10000 (10-run medians, pubnet, 1000 ledgers):

lps % of network ceiling
Network ceiling (parallel curl / direct GetFile) 207 100%
This branch GetLedgerRaw 198 96%
This branch GetLedger 167 81%
Upstream main GetLedger (with BufferSize=10000) 165 80%
Upstream main GetLedger (default BufferSize=100) ~80 39%

The BufferSize default bump accounts for most of the lift (upstream goes from ~80 → 165 lps with the same change). GetLedgerRaw adds another ~19% over GetLedger by skipping XDR decoding.

buffered_storage_backend_bench_test.go is included for reproducibility — env-driven (BSB_BENCH_BUCKET / _FROM / _TO + GCS application-default credentials), skips silently in CI.

Test plan

  • All existing unit tests pass
  • Race detector passes (go test -race ./ingest/ledgerbackend/...)
  • New tests:
    • TestBSBGetLedgerRaw_SingleLedgerPerFile
    • TestBSBGetLedger_IdempotentReRequest
    • TestRPCGetLedgerRaw (round-trip + idempotent + copy semantics)
    • TestCaptiveGetLedgerRaw (round-trip + idempotent + copy semantics)
    • TestMetricsGetLedgerRaw (verifies same summary as GetLedger)
    • TestReadLedgerMetaFromPipe and TestReadLedgerMetaFromPipeMultipleFrames now assert raw frame bytes round-trip back to their LedgerCloseMeta

🤖 Generated with Claude Code

tamirms and others added 5 commits May 4, 2026 21:23
Three perf-related changes that work together:

BSB:
- Replace eager full-batch decode (xdr.LedgerCloseMetaBatch) with a
  view-based parse of the batch header (xdr.LedgerCloseMetaBatchView).
  Per-ledger byte slices are extracted once via metas.All() and stored
  on the BSB struct as []xdr.LedgerCloseMetaView.
- GetLedger now decodes a single ledger on demand from the cached bytes
  rather than holding a fully-decoded batch.
- Add GetLedgerRaw(ctx, sequence) ([]byte, error) on the BSB struct —
  returns a safe copy of the raw XDR for a ledger without decoding.
  Useful for forwarding/replication where decode is wasted work.

ledger_buffer:
- Pre-allocate the decompression destination buffer using the file size
  returned by datastore.GetFile, avoiding the zstd decoder's repeated
  buffer growth/realloc on each ledger object.

datastore:
- Change DataStore.GetFile to return (io.ReadCloser, int64, error). The
  size return is -1 when unknown (e.g., chunked transfer); each backend
  fills it in from its native metadata: GCS attrs.Size, S3 ContentLength,
  filesystem stat. Updates all backends and tests, plus producer_test.

GetLedgerRaw is not yet on the LedgerBackend interface — that addition
plus implementations on the other backends comes in the next commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ions

GetLedgerRaw returns XDR wire bytes for a ledger without re-marshaling —
useful for forwarding/replication or callers that decode selectively.

Interface: ingest/ledgerbackend/ledger_backend.go gains GetLedgerRaw.

Backends:
- BufferedStorageBackend: already has it (returns a copy of the per-ledger
  view slice from the cached batch buffer).
- CaptiveStellarCore: bufferedLedgerMetaReader now returns the raw frame
  alongside the decoded LedgerCloseMeta. Both are cached on the backend
  (cachedRaw + cachedMeta). GetLedgerRaw returns a copy of cachedRaw.
  Refactored GetLedger and GetLedgerRaw to share fetchSequence — the
  validation + stream-advance loop was duplicated.
- RPCLedgerBackend: buffer changed from map[uint32]xdr.LedgerCloseMeta to
  map[uint32]rpcBufferedLedger holding both raw bytes and decoded form.
  base64 decode happens once at fetch; XDR decode also once. GetLedgerRaw
  returns a copy of the bytes. GetLedger returns the decoded form.
  Refactored both to share fetchSequenceLocked.
- MockDatabaseBackend, loadtest mockLedgerBackend: testify-mock stubs.
- loadtest.LedgerBackend: baseline impl that calls GetLedger + MarshalBinary.
  Avoids restructuring its merged-stream cache; can be optimized if needed.
- metricsLedgerBackend: passthrough that records the same
  ledgerFetchDurationSummary as GetLedger — both paths are "fetches" from
  the backend's perspective; latency profile is dominated by I/O.

The per-frame allocation in bufferedLedgerMetaReader replaces the previous
reusable frameBuf — the bytes need to outlive the read state to be passed
through metaResult.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Tests for GetLedgerRaw across the ledger backends:

- BSB: TestBSBGetLedger_IdempotentReRequest verifies that re-requesting the
  most-recently-served sequence on GetLedger returns the same ledger and
  that GetLedgerRaw is also idempotent on cached entries.
- RPC: TestRPCGetLedgerRaw verifies the zero-copy fetch path (base64-decode
  once, return cached bytes), idempotent re-request, copy semantics
  (mutating returned bytes does not affect subsequent fetches), and round-
  trip back to LedgerCloseMeta.
- CaptiveStellarCore: TestCaptiveGetLedgerRaw verifies that GetLedgerRaw
  returns the raw frame captured by the meta pipe reader without re-
  marshaling, with idempotent + copy semantics.
- metricsLedgerBackend: TestMetricsGetLedgerRaw verifies GetLedgerRaw
  records into the same ledger_fetch_duration_seconds summary as GetLedger.

Also: adds a clarifying comment to GCSDataStore.GetFile noting that
r.Attrs.Size matches the byte count the caller will read because
ReadCompressed(true) prevents decompressive transcoding.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The existing TestReadLedgerMetaFromPipe and TestReadLedgerMetaFromPipeMultipleFrames
discarded the raw return from readLedgerMetaFromPipe. A bug in raw-frame
extraction (off-by-one in length, bytes bleeding between frames) would leave
the decoded LedgerCloseMeta valid but the raw bytes wrong, slipping past the
existing assertions.

Add a round-trip check on the raw return: SafeUnmarshal the raw bytes back
to a LedgerCloseMeta and assert it equals the original input — guards the
zero-copy GetLedgerRaw path through metaResult.raw.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two reverts to keep PR scope clean and restore upstream behavior:

1. support/datastore: revert ListFilePaths changes in s3.go and gcs.go.
   The earlier branch had inadvertently rewritten the prefix-handling
   logic introduced by upstream PR stellar#5923. Restoring the upstream version
   and re-adding TestS3ListFilePaths_NoPrefix / TestGCSListFilePaths_NoPrefix
   that were deleted alongside the rewrite. The only datastore changes
   in this PR are now the GetFile signature additions (size return).

2. ingest/ledgerbackend: revert BSB lock changes to RLock for GetLedger,
   GetLedgerRaw, and Close, matching CaptiveStellarCore. The contract is
   that GetLedger is single-threaded; only Close runs concurrently with
   an in-flight GetLedger, and RLock is what allows Close to acquire the
   lock while GetLedger is blocked in the buffer queue, then call
   ledgerBuffer.close() to interrupt it. Switching to exclusive Lock
   would deadlock on concurrent Close.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 4, 2026 21:49
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends the ingest LedgerBackend abstraction with a GetLedgerRaw method (raw XDR bytes) and updates backends to leverage XDR “view” types for less allocation-heavy batch handling, while also evolving DataStore.GetFile to return object size for more efficient buffering/decompression.

Changes:

  • Added LedgerBackend.GetLedgerRaw and implemented it across all ledger backends (buffered storage, captive core, RPC, metrics wrapper, loadtest, and mocks).
  • Reworked BufferedStorageBackend to parse LedgerCloseMetaBatch via LedgerCloseMetaBatchView and decode individual ledgers on demand.
  • Changed DataStore.GetFile to return (reader, size, error) and updated datastore implementations/tests accordingly; ledger_buffer now uses size + zstd header info to pre-allocate decompression buffers.

Reviewed changes

Copilot reviewed 26 out of 26 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
support/datastore/datastore.go Updates DataStore interface: GetFile now returns (io.ReadCloser, int64, error) with size semantics.
support/datastore/configure.go Adapts manifest reading to new GetFile signature.
support/datastore/configure_test.go Updates mocks/expectations for new GetFile return values.
support/datastore/filesystem.go Implements size-returning GetFile using Stat() on opened files.
support/datastore/filesystem_test.go Verifies returned file size and updates call sites.
support/datastore/gcs.go Returns compressed object size from GCS reader attrs.
support/datastore/gcs_test.go Updates test call sites for new GetFile signature.
support/datastore/s3.go Returns ContentLength (or -1) from S3 GetObject result.
support/datastore/s3_test.go Updates test call sites for new GetFile signature.
support/datastore/mocks.go Updates MockDataStore.GetFile to include size return.
ingest/ledgerbackend/ledger_backend.go Adds GetLedgerRaw to the LedgerBackend interface with behavioral contract (copy semantics).
ingest/ledgerbackend/ledger_buffer.go Rewrites buffering pipeline and zstd decompression with reusable buffers and size-based preallocation.
ingest/ledgerbackend/buffered_storage_backend.go Integrates XDR views into BSB, caches per-ledger slices, adds GetLedgerRaw, and idempotent re-request behavior.
ingest/ledgerbackend/buffered_storage_backend_test.go Adds tests for GetLedgerRaw and idempotent re-request; updates for new buffer output type.
ingest/ledgerbackend/buffered_meta_pipe_reader.go Captures per-frame raw XDR bytes for captive core pipeline (needed for GetLedgerRaw).
ingest/ledgerbackend/buffered_meta_pipe_reader_test.go Extends tests to validate raw frame bytes round-trip correctly.
ingest/ledgerbackend/captive_core_backend.go Adds captive-core GetLedgerRaw using cached frame bytes and refactors sequence fetch path.
ingest/ledgerbackend/captive_core_backend_test.go Adds GetLedgerRaw tests including idempotency and copy semantics.
ingest/ledgerbackend/rpc_backend.go Buffers both decoded meta and raw bytes; adds GetLedgerRaw and idempotent re-request behavior.
ingest/ledgerbackend/rpc_backend_test.go Updates behavior expectations and adds GetLedgerRaw tests (round-trip/idempotent/copy).
ingest/ledgerbackend/metrics.go Adds metrics passthrough for GetLedgerRaw.
ingest/ledgerbackend/metrics_test.go New test ensuring GetLedger and GetLedgerRaw record into the same summary.
ingest/ledgerbackend/mock_database_backend.go Extends mock backend with GetLedgerRaw.
ingest/producer_test.go Updates datastore mock expectations for new GetFile signature.
ingest/loadtest/ledger_backend.go Implements baseline GetLedgerRaw via GetLedger + marshal.
ingest/loadtest/ledger_backend_test.go Extends loadtest mock backend with GetLedgerRaw.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread ingest/ledgerbackend/buffered_storage_backend.go
Comment thread ingest/ledgerbackend/buffered_storage_backend.go Outdated
Comment thread ingest/ledgerbackend/buffered_storage_backend.go
Comment thread ingest/ledgerbackend/buffered_storage_backend.go
Comment thread ingest/ledgerbackend/rpc_backend.go Outdated
Comment thread support/datastore/mocks.go
Comment thread ingest/ledgerbackend/ledger_buffer.go Outdated
Comment thread ingest/ledgerbackend/captive_core_backend.go
Three real issues from Copilot's PR review:

1. BSB loadBatchForSequence: defer returning batchBytes to the pool unless
   the new batch is successfully installed. Previously, any error after
   getFromLedgerQueue (e.g., view parse failure on a malformed batch)
   would drop the buffer, leaking it from the pool.

2. RPC fetchSequenceLocked: the unprepared error message said "before
   calling GetLedger" but the same path is now also used by GetLedgerRaw
   since the GetLedger/GetLedgerRaw refactor.

3. ledger_buffer: cap zstd FrameContentSize before allocating from the
   decompression pool. The header is parsed from data the worker
   downloaded; a corrupt or hostile FCS could otherwise trigger an
   arbitrarily large allocation. New maxDecompressedBatchSize = 1 GiB
   guards the pre-alloc; oversized claims fall back to DecodeAll's
   natural growth.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@tamirms tamirms requested a review from Copilot May 4, 2026 22:06
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 26 out of 26 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread ingest/ledgerbackend/buffered_storage_backend.go
Comment thread ingest/ledgerbackend/buffered_storage_backend.go
Comment thread ingest/ledgerbackend/buffered_storage_backend.go
Comment thread ingest/ledgerbackend/buffered_storage_backend.go
Cleanup of stale references that mention only GetLedger when both
GetLedger and GetLedgerRaw share the same code path:

- BSB validateSequence: "cannot GetLedger" -> "cannot GetLedger or
  GetLedgerRaw" (and the matching TestBSBClose assertion).
- captive core: cachedMeta doc said "Updated in GetLedger()" but it's
  actually updated in fetchSequence(), shared by both methods.
- captive core: idempotent re-request comment in fetchSequence updated
  to mention both.
- RPC PrepareRange doc: historical-range note updated to mention both.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 26 out of 26 changed files in this pull request and generated 7 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread ingest/ledgerbackend/ledger_buffer.go
Comment thread ingest/ledgerbackend/ledger_buffer.go Outdated
Comment thread ingest/ledgerbackend/buffered_storage_backend.go
Comment thread ingest/ledgerbackend/buffered_storage_backend.go
Comment thread ingest/ledgerbackend/buffered_storage_backend.go
Comment thread ingest/ledgerbackend/buffered_storage_backend.go Outdated
Comment thread ingest/ledgerbackend/rpc_backend.go
Three more findings from Copilot's third review:

1. ledger_buffer pushTaskQueue: guard the blocking send to taskQueue
   with lb.context.Done(). Pre-existing in upstream — workers exit on
   ctx cancel and stop draining taskQueue, so a consumer mid-pull from
   ledgerQueue could otherwise block forever on the next refill.

2. ledger_buffer downloadLedgerObject: cap compressedSize by the same
   maxDecompressedBatchSize bound used for FrameContentSize. A corrupt
   or hostile S3 ContentLength / GCS Attrs.Size could otherwise trigger
   an arbitrarily large allocation. Oversized claims fall back to the
   io.ReadAll path which grows naturally.

3. RPC PrepareRange doc: fix three typos ("wil", "withing", "gaurantee")
   in a comment I'm already editing in this PR.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@tamirms tamirms requested a review from Copilot May 5, 2026 03:01
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 26 out of 26 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread ingest/ledgerbackend/ledger_buffer.go Outdated
Comment thread ingest/ledgerbackend/ledger_buffer.go
Comment thread ingest/ledgerbackend/ledger_buffer.go Outdated
Comment thread ingest/ledgerbackend/buffered_storage_backend.go Outdated
Comment thread ingest/ledgerbackend/buffered_storage_backend.go Outdated
Four issues found while sweeping every goroutine, channel, and lock in
the new buffer pipeline:

1. Writer goroutine leak. Workers can call lb.cancel(...) directly on
   fatal errors (line 192/200) without lb.close() ever being called.
   The previous `for result := range lb.resultChan` loop only exited
   when resultChan was closed (only Close() does that), so the writer
   would block forever on the final receive. Refactor to a select that
   also watches lb.context.Done().

2. NumWorkers == 0 init deadlock. Pre-existing in upstream — the init
   loop sends bufferSize+1 tasks to a channel of capacity bufferSize,
   relying on workers to drain. With zero workers, the (bufferSize+1)-th
   send blocks forever. Add an explicit validation in
   NewBufferedStorageBackend.

3. ledgerBuffer.close() panics on second call. The new close() does
   close(lb.resultChan), which panics if called twice. BSB.Close() can
   be called multiple times (it doesn't early-return on bsb.closed) and
   doesn't nil bsb.ledgerBuffer, so a second Close() would re-enter
   close() and panic. Wrap the body in sync.Once. Upstream's close()
   was only cancel + Wait and was naturally idempotent.

4. maxDecompressedBatchSize was renamed to maxBatchObjectSize. The
   constant is now used for both compressed-bytes pre-allocation
   (data store Content-Length / Attrs.Size) and decompressed-bytes
   pre-allocation (zstd FrameContentSize). The previous name only
   referred to the decompressed side and was misleading.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 26 out of 26 changed files in this pull request and generated 9 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread ingest/ledgerbackend/buffered_meta_pipe_reader.go
Comment thread ingest/ledgerbackend/ledger_buffer.go Outdated
Comment thread ingest/ledgerbackend/buffered_storage_backend.go
Comment thread ingest/ledgerbackend/buffered_storage_backend.go
Comment thread ingest/ledgerbackend/buffered_storage_backend.go
Comment thread ingest/ledgerbackend/buffered_storage_backend.go
Comment thread ingest/ledgerbackend/buffered_storage_backend.go
Comment thread ingest/ledgerbackend/buffered_storage_backend.go
Comment thread support/datastore/mocks.go
tamirms and others added 2 commits May 5, 2026 05:53
Previously, the captive-core meta pipe reader and the RPC backend's fetch
loop both eagerly XDR-decoded every ledger before storing it. That meant
GetLedgerRaw still paid the full unmarshal cost upstream — defeating the
zero-decode promise.

Move the decode out of these hot paths and into GetLedger only:

- xdr/ledger_close_meta_view.go: new convenience helpers on
  LedgerCloseMetaView for the header fields needed during streaming
  validation (LedgerSequence, LedgerHash, PreviousLedgerHash). Reads
  bytes lazily without full decode.

- buffered_meta_pipe_reader.go: readLedgerMetaFromPipe returns just the
  raw frame bytes. metaResult drops the *xdr.LedgerCloseMeta field.

- captive_core_backend.go: handleMetaPipeResult uses the new view-based
  helpers for sequence/hash validation. Cache fields are now cachedRaw +
  cachedSeq + cachedHasLedger (cachedMeta removed). GetLedger decodes on
  demand from cachedRaw; GetLedgerRaw never decodes.

- rpc_backend.go: rpcBufferedLedger drops .meta — only the raw bytes are
  stored. GetLedger decodes from raw on demand; GetLedgerRaw returns a
  copy of raw. Mirrors the captive-core pattern.

Tests updated to match the new metaResult shape (raw bytes only).
A metaResultFor test helper marshals a LedgerCloseMeta into a metaResult
where tests previously held the decoded form directly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two redundant abstractions removed:

1. CaptiveStellarCore: drop cachedHasLedger bool. cachedRaw != nil
   already encodes "is set" — the explicit flag was redundant.

2. RPCLedgerBackend: drop the rpcBufferedLedger struct wrapper. It
   held a single raw []byte field and added no semantics; just use
   map[uint32][]byte directly.

No behavior change. Tests + race detector still pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 27 out of 27 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread ingest/ledgerbackend/buffered_storage_backend.go
Comment thread ingest/ledgerbackend/buffered_storage_backend.go
Comment thread ingest/ledgerbackend/buffered_storage_backend.go
Comment thread ingest/ledgerbackend/buffered_storage_backend.go
Comment thread ingest/ledgerbackend/captive_core_backend.go
All() returns the materialized slices and walks the array anyway;
checking len(slices) gives the same empty-batch detection without the
extra array-length-prefix read.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@tamirms tamirms force-pushed the xdr-views-pr2 branch 2 times, most recently from 15c8cde to d5a1ef8 Compare May 5, 2026 14:21
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 29 out of 29 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread ingest/ledgerbackend/buffered_meta_pipe_reader.go
Comment thread ingest/ledgerbackend/ledger_buffer.go
Comment thread ingest/ledgerbackend/buffered_storage_backend.go
Comment thread ingest/ledgerbackend/buffered_storage_backend.go
Three independent BSB pipeline changes plus an env-driven GCS
benchmark:

1. Bump DefaultBufferedStorageBackendConfig.BufferSize from 100 to
   10000 for small-file schemas. The old default throttled workers
   via the per-consumer-pull pushTaskQueue feedback loop; once the
   first 100 tasks completed every subsequent task waited for a
   consumer pull. With the buffer sized to the range cap
   (newLedgerBuffer caps internally), workers see all queued tasks
   upfront and stay saturated. This change is independent of the
   rest of the commit and benefits upstream too if applied there.

2. Move zstd decompression from workers to the consumer. Each
   DecodeAll call allocates ~9 MB of internal sequenceDecs working
   memory; with N concurrent workers, peak heap scaled with
   NumWorkers and the resulting GC pressure stole CPU from the
   pipeline. The consumer now decompresses one batch at a time on a
   single shared zstd.Decoder. The ~100 us per-batch cost is
   dwarfed by the consumer's XDR work and isn't a new bottleneck.

3. Replace the writer goroutine + reorder map with a priority-queue
   + storeObject pattern. Workers do the ordering work themselves
   under a brief mutex hold rather than handing off to a single
   writer; this eliminates an extra channel hop and the goroutine
   wake-up latency that dominated tail variance under load. The
   send to ledgerQueue happens under priorityQueueLock — when the
   queue fills this naturally throttles other workers, providing
   backpressure that caps in-flight memory.

Adds buffered_storage_backend_bench_test.go — env-driven so it
skips by default. Run with BSB_BENCH_BUCKET / _FROM / _TO + GCS
application-default credentials. Sweeps NumWorkers across
{5, 10, 20, 50, 100} with BufferSize=10000.

Benchmarked against GCS (sdf-ledger-close-meta/ledgers/pubnet,
1000 ledgers, NumWorkers=100, BufferSize=10000, 10 runs each, median):

  upstream main GetLedger:  165 lps  (5.75-6.36s)
  this branch GetLedger:    167 lps  (5.78-6.20s)
  this branch GetLedgerRaw: 198 lps  (4.95-5.18s)

vs upstream-default (BufferSize=100): ~80 lps. The buffer-size fix
accounts for most of the lift; GetLedgerRaw skips XDR decoding and
adds another +5% on top.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@cjonas9 cjonas9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First pass comments attached: amazing work and SO much to take in -- I wanted to hold off on submitting this review until I'd had a few more chances to look it over, but for the sake of getting a review out before too long, I figured now was better than later.

Additionally, I'm left with the sense that it may be worthwhile for someone with more long-term exposure and context to the SDK to review this, but beyond that (and considering the extra time that would take), this genuinely looks great to me. Your call on whether to merge today/Friday or not; if you hold off, I'll take another pass by ~EOD Monday

Comment thread ingest/ledgerbackend/buffered_storage_backend.go Outdated
Comment thread ingest/ledgerbackend/buffered_storage_backend.go
Comment thread ingest/ledgerbackend/buffered_storage_backend.go
Comment thread ingest/ledgerbackend/captive_core_backend.go Outdated
Comment thread support/datastore/filesystem.go Outdated
Comment thread xdr/ledger_close_meta_view.go
Comment thread ingest/ledgerbackend/ledger_buffer.go Outdated
tamirms and others added 2 commits May 12, 2026 17:49
- BSB: include sequence/range bounds in validateSequence error messages
- BSB: document that bsBackendLock only protects against concurrent Close;
  external serialization is the type's thread-safety contract
- LedgerBackend interface: document the not-thread-safe-except-Close
  contract so all implementations share it
- captive_core: bundle cachedRaw+cachedSeq into a cachedLedger struct so
  the two fields are always read/written together
- ledger_buffer: clarify that the init loop creates bufferSize+1 tasks
- filesystem: include "and the file's size in bytes" in GetFile doc
- xdr/ledger_close_meta_view: LedgerHash/PreviousLedgerHash return
  zero-copy []byte slices into the source bytes instead of copying into
  fixed Hash arrays; document the pin-source-alive requirement

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@tamirms tamirms merged commit 8d34165 into stellar:main May 12, 2026
11 checks passed
@tamirms tamirms deleted the xdr-views-pr2 branch May 12, 2026 16:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants