Skip to content

Support bounded parallel chunk transfers#29341

Open
tyler-french wants to merge 1 commit intobazelbuild:masterfrom
tyler-french:tfrench/cdc-concurrent
Open

Support bounded parallel chunk transfers#29341
tyler-french wants to merge 1 commit intobazelbuild:masterfrom
tyler-french:tfrench/cdc-concurrent

Conversation

@tyler-french
Copy link
Copy Markdown
Contributor

@tyler-french tyler-french commented Apr 19, 2026

Description

For --experimental_remote_cache_chunking implemented in #28437

This PR enables parallel uploads and downloads for chunked files, to improve performance. Since the concurrency is globally bounded already by GRPC total connectsion, we create a separate bound per file to prevent too-fast fanout. This is done using 32 which is a good balance, but not too high.

To prevent issues using batches, we create simple sliding-window style transfer managers.

RELNOTES: CDC chunk uploads and downloads can now happen in parallel within a large blob.

Benchmarking:

With our synthetic benchmark of network delays and simulated jitter, the parallelism leads to a 20x improvement, but of course, this doesn't always match realistic situations.

After Change:

Benchmark                                 (avgChunkSizeBytes)  (chunkCount)  (chunkSizeBytes)  (delayMillis)  (fileSizeBytes)  (jitterMillis)  (schedulerThreads)  Mode  Cnt   Score   Error  Units
ChunkedTransferBenchmark.downloadChunked                  N/A            32              1024             25              N/A              10                   1  avgt    3  34.958 ± 0.092  ms/op
ChunkedTransferBenchmark.downloadChunked                  N/A            32              1024             25              N/A              10                   2  avgt    3  34.971 ± 0.085  ms/op
ChunkedTransferBenchmark.downloadChunked                  N/A            32              1024             25              N/A              10                   4  avgt    3  34.983 ± 0.127  ms/op
ChunkedTransferBenchmark.downloadChunked                  N/A            32              1024             25              N/A              10                   8  avgt    3  34.974 ± 0.213  ms/op
ChunkedTransferBenchmark.uploadChunked                   1024           N/A               N/A             25            32768              10                   1  avgt    3  35.006 ± 1.170  ms/op
ChunkedTransferBenchmark.uploadChunked                   1024           N/A               N/A             25            32768              10                   2  avgt    3  35.028 ± 1.280  ms/op
ChunkedTransferBenchmark.uploadChunked                   1024           N/A               N/A             25            32768              10                   4  avgt    3  35.071 ± 1.534  ms/op
ChunkedTransferBenchmark.uploadChunked                   1024           N/A               N/A             25            32768              10                   8  avgt    3  35.056 ± 1.407  ms/op

Before Change:

Benchmark                                 (avgChunkSizeBytes)  (chunkCount)  (chunkSizeBytes)  (delayMillis)  (fileSizeBytes)  (jitterMillis)  (schedulerThreads)  Mode  Cnt    Score     Error  Units
ChunkedTransferBenchmark.downloadChunked                  N/A            32              1024             25              N/A              10                   2  avgt    3  811.458 ± 466.502  ms/op
ChunkedTransferBenchmark.downloadChunked                  N/A            32              1024             25              N/A              10                   4  avgt    3  811.918 ± 453.385  ms/op
ChunkedTransferBenchmark.downloadChunked                  N/A            32              1024             25              N/A              10                   8  avgt    3  811.849 ± 453.511  ms/op
ChunkedTransferBenchmark.uploadChunked                   1024           N/A               N/A             25            32768              10                   1  avgt    3  741.295 ± 392.466  ms/op
ChunkedTransferBenchmark.uploadChunked                   1024           N/A               N/A             25            32768              10                   2  avgt    3  741.600 ± 404.457  ms/op
ChunkedTransferBenchmark.uploadChunked                   1024           N/A               N/A             25            32768              10                   4  avgt    3  742.135 ± 401.637  ms/op
ChunkedTransferBenchmark.uploadChunked                   1024           N/A               N/A             25            32768              10                   8  avgt    3  742.024 ± 398.510  ms/op

Big File:

CURRENT BRANCH (512 MiB)

Benchmark                                 (avgChunkSizeBytes)  (chunkCount)  (chunkSizeBytes)  (delayMillis)  (fileSizeBytes)  (jitterMillis)  (schedulerThreads)  Mode  Cnt    Score      Error  Units
ChunkedTransferBenchmark.downloadChunked                  N/A           512           1048576             25              N/A              10                   8  avgt    3  416.743 ±    8.043  ms/op
ChunkedTransferBenchmark.uploadChunked                1048576           N/A               N/A             25        536870912              10                   8  avgt    3  806.346 ± 1386.743  ms/op
MASTER BASELINE (512 MiB)

Benchmark                                 (avgChunkSizeBytes)  (chunkCount)  (chunkSizeBytes)  (delayMillis)  (fileSizeBytes)  (jitterMillis)  (schedulerThreads)  Mode  Cnt      Score      Error  Units
ChunkedTransferBenchmark.downloadChunked                  N/A           512           1048576             25              N/A              10                   8  avgt    3  12783.277 ± 1555.102  ms/op
ChunkedTransferBenchmark.uploadChunked                1048576           N/A               N/A             25        536870912              10                   8  avgt    3  11758.738 ± 2207.502  ms/op

@tyler-french tyler-french requested a review from a team as a code owner April 19, 2026 18:23
@github-actions github-actions Bot added team-Remote-Exec Issues and PRs for the Execution (Remote) team awaiting-review PR is awaiting review from an assigned reviewer labels Apr 19, 2026
@tyler-french
Copy link
Copy Markdown
Contributor Author

FYI @tjgq I think this was a follow-up from the original PR

@tyler-french
Copy link
Copy Markdown
Contributor Author

@bazel-io fork 9.2.0

Copy link
Copy Markdown
Contributor

@sluongng sluongng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit skeptical of the current approach, so I will skip on reading the window filling implementation right now. (though Codex does suggest there is a problem there)

Each invocation may have multiple actions/spawns running in parallel, each creates multiple blob uploads/downloads, and some of those blobs are chunked blobs. Adding parallelism on the blob level feels like a local optimization, and the new flag does not offer strong control over the total parallelism of the invocation.

I wonder if we need something higher-level that lets us effectively enforce a global parallelism for uploads and downloads🤔

Comment thread src/main/java/com/google/devtools/build/lib/remote/ChunkedBlobUploader.java Outdated
@tyler-french tyler-french force-pushed the tfrench/cdc-concurrent branch from 8969051 to 33689c3 Compare April 22, 2026 15:44
Copilot AI review requested due to automatic review settings April 22, 2026 15:44
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Enables bounded parallelism for content-defined chunk (CDC) blob uploads/downloads, improving throughput while avoiding unbounded per-blob fanout.

Changes:

  • Implement sliding-window style, per-blob bounded concurrency for chunk uploads in ChunkedBlobUploader.
  • Implement sliding-window style, per-blob bounded concurrency for chunk downloads (including in-flight dedup) in ChunkedBlobDownloader.
  • Add/expand unit tests for window refill, cancellation, and failure propagation; add a JMH benchmark binary target.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
src/main/java/com/google/devtools/build/lib/remote/ChunkedBlobUploader.java Adds bounded in-flight chunk upload window and cancellation on failure.
src/main/java/com/google/devtools/build/lib/remote/ChunkedBlobDownloader.java Adds bounded in-flight chunk download window with reassembly and in-flight dedup.
src/test/java/com/google/devtools/build/lib/remote/ChunkedBlobUploaderTest.java Adds tests for window refill, cancellation, and failure handling for parallel uploads.
src/test/java/com/google/devtools/build/lib/remote/ChunkedBlobDownloaderTest.java Updates tests for new download API and adds parallel-window behavior tests.
src/test/java/com/google/devtools/build/lib/remote/ChunkedTransferBenchmark.java Introduces a JMH benchmark for chunked upload/download with latency + jitter.
src/test/java/com/google/devtools/build/lib/remote/BUILD Adds a java_opt_binary target to run the new benchmark.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/main/java/com/google/devtools/build/lib/remote/ChunkedBlobDownloader.java Outdated
Comment thread src/test/java/com/google/devtools/build/lib/remote/ChunkedTransferBenchmark.java Outdated
Comment thread src/main/java/com/google/devtools/build/lib/remote/ChunkedBlobUploader.java Outdated
@tyler-french
Copy link
Copy Markdown
Contributor Author

I'm a bit skeptical of the current approach, so I will skip on reading the window filling implementation right now. (though Codex does suggest there is a problem there)

Each invocation may have multiple actions/spawns running in parallel, each creates multiple blob uploads/downloads, and some of those blobs are chunked blobs. Adding parallelism on the blob level feels like a local optimization, and the new flag does not offer strong control over the total parallelism of the invocation.

I wonder if we need something higher-level that lets us effectively enforce a global parallelism for uploads and downloads🤔

Updated this in the follow-up direction you suggested: I removed the flag/plumbing and kept only a small hardcoded per-blob window of 32 as a guard against huge single-blob fanout. The actual global active RPC limit is still the shared gRPC channel pool (--remote_max_connections / --remote_max_concurrency_per_connection), so this isn’t intended to be a new invocation-level concurrency control - the combined cache doesn't have such restriction as far as I can tell.

@tyler-french tyler-french force-pushed the tfrench/cdc-concurrent branch from 33689c3 to 8363b37 Compare April 22, 2026 17:15
Comment thread src/main/java/com/google/devtools/build/lib/remote/ChunkedBlobDownloader.java Outdated
Comment thread src/main/java/com/google/devtools/build/lib/remote/ChunkedBlobDownloader.java Outdated
@tyler-french tyler-french force-pushed the tfrench/cdc-concurrent branch from 8363b37 to 976a25f Compare April 27, 2026 14:46
// Guard against pathological fanout from a single large chunked blob. This is only a per-blob
// cap; chunk requests still flow through CombinedCache and the shared remote cache transport
// stack below it, which is what bounds active remote RPC concurrency across blobs.
private static final int MAX_IN_FLIGHT_CHUNK_DOWNLOADS = 16;
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fmeum @tjgq One option is to make this a flag --experimental_chunk_transfer_concurrency or something. I think just keeping it something reasonable is more simple. Open to suggestions.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would stick to a reasonable default value for now. Interested parties can still tune the value and compile from source to test it out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

awaiting-review PR is awaiting review from an assigned reviewer team-Remote-Exec Issues and PRs for the Execution (Remote) team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants