Skip to content

generic cpu compilation and fallback#227

Closed
shaleenji wants to merge 3 commits intomasterfrom
generic_cpu_compilation
Closed

generic cpu compilation and fallback#227
shaleenji wants to merge 3 commits intomasterfrom
generic_cpu_compilation

Conversation

@shaleenji
Copy link
Copy Markdown
Collaborator

Pull Request

Summary

This PR removes the need to be extremely rigid in the way compiation for different CPU capabilities is done. AVX2 AVX512 etc. A good way to do things is to be able to compile the code in a generic fashion and enable higher capabilities based on the CPU we are running on currently.

Type of Change

  • New feature

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 23, 2026

VectorDB Benchmark - Ready To Run

CI Passed ([lint + unit tests] (https://github.com/endee-io/endee/actions/runs/25301136918)) - benchmark options unlocked.

Post one of the command below. Only members with write access can trigger runs.


Available Modes

Mode Command What runs
Dense /correctness_benchmarking dense HNSW insert throughput · query P50/P95/P99 · recall@10 · concurrent QPS
Hybrid /correctness_benchmarking hybrid Dense + sparse BM25 fusion · same suite + fusion latency overhead

Infrastructure

Server Role Instance
Endee Server Endee VectorDB — code from this branch t2.large
Benchmark Server Benchmark runner t3a.large

Both servers start on demand and are always terminated after the run — pass or fail.


How Correctness Benchmarking Works

1. Post /correctness_benchmarking <mode>
2. Endee Server Create  →  this branch's code deployed  →  Endee starts in chosen mode
3. Benchmark Server Create  →  benchmark suite transferred
4. Benchmark Server runs correctness benchmarking against Endee Server
5. Results posted back here  →  pass/fail + full metrics table
6. Both servers terminated   →  always, even on failure

After a new push, CI must pass again before this menu reappears.

- Skip the post-build ndd symlink when the binary is already named
  'ndd' to prevent a self-referential symlink on generic CPU builds
- Add AVX2 SIMD paths for fp16↔fp32 vector conversion and scaled
  quantization, filling the gap between AVX-512 and scalar fallback
- Refactor AVX-512 quantization block to use scoped variables and
  support runtime dispatch via NDD_RUNTIME_X86_DISPATCH
Introduces a new --x86 / USE_X86=ON build option for users without a
specific SIMD target. Enables NDD_RUNTIME_X86_DISPATCH at configure time,
produces the ndd-x86 binary, and documents the option in the install
script help text, getting-started guide, and README.
@Vaibhav-Endee
Copy link
Copy Markdown

Closing the PR for the time being, as multiple internal conditions need to be re-implemented for desired performance.

Currently, the system and compiles runs in -x86, taking the baseline as avx2 for missing avx512 flags, causing a dip in performance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants