Skip to content

[C++] Explore SIMD optimization for base64_decode #49800

@Reranko05

Description

@Reranko05

Describe the enhancement requested

The current base64_decode implementation processes input byte-by-byte using scalar operations.

This came up while working on a recent change to improve validation performance (PR #49660), where replacing find() with a lookup table highlighted that decoding itself is still done sequentially.

Since base64 decoding follows a regular pattern (4 chars → 3 bytes), it seems like it could benefit from SIMD/vectorized approaches (e.g., AVX2), especially for larger inputs.

I wanted to check:

  • Is exploring a SIMD-based decoding path something that would be in scope for Arrow?
  • Have there been any prior attempts or discussions around this?
  • Would a CPU-dispatched approach (SIMD + scalar fallback) be acceptable here?

I haven’t explored SIMD in this area yet, but happy to prototype something or run comparisons if this aligns with project direction.

Component(s)

C++

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions