generic cpu compilation and fallback by shaleenji · Pull Request #227 · endee-io/endee

shaleenji · 2026-04-23T07:33:59Z

Pull Request

Summary

This PR removes the need to be extremely rigid in the way compiation for different CPU capabilities is done. AVX2 AVX512 etc. A good way to do things is to be able to compile the code in a generic fashion and enable higher capabilities based on the CPU we are running on currently.

Type of Change

New feature

github-actions · 2026-04-23T07:34:34Z

VectorDB Benchmark - Ready To Run

CI Passed ([lint + unit tests] (https://github.com/endee-io/endee/actions/runs/25301136918)) - benchmark options unlocked.

Post one of the command below. Only members with write access can trigger runs.

Available Modes

Mode	Command	What runs
Dense	`/correctness_benchmarking dense`	HNSW insert throughput · query P50/P95/P99 · recall@10 · concurrent QPS
Hybrid	`/correctness_benchmarking hybrid`	Dense + sparse BM25 fusion · same suite + fusion latency overhead

Infrastructure

Server	Role	Instance
Endee Server	Endee VectorDB — code from this branch	`t2.large`
Benchmark Server	Benchmark runner	`t3a.large`

Both servers start on demand and are always terminated after the run — pass or fail.

How Correctness Benchmarking Works

1. Post /correctness_benchmarking <mode>
2. Endee Server Create  →  this branch's code deployed  →  Endee starts in chosen mode
3. Benchmark Server Create  →  benchmark suite transferred
4. Benchmark Server runs correctness benchmarking against Endee Server
5. Results posted back here  →  pass/fail + full metrics table
6. Both servers terminated   →  always, even on failure

After a new push, CI must pass again before this menu reappears.

- Skip the post-build ndd symlink when the binary is already named 'ndd' to prevent a self-referential symlink on generic CPU builds - Add AVX2 SIMD paths for fp16↔fp32 vector conversion and scaled quantization, filling the gap between AVX-512 and scalar fallback - Refactor AVX-512 quantization block to use scoped variables and support runtime dispatch via NDD_RUNTIME_X86_DISPATCH

Introduces a new --x86 / USE_X86=ON build option for users without a specific SIMD target. Enables NDD_RUNTIME_X86_DISPATCH at configure time, produces the ndd-x86 binary, and documents the option in the install script help text, getting-started guide, and README.

Vaibhav-Endee · 2026-05-04T05:42:23Z

Closing the PR for the time being, as multiple internal conditions need to be re-implemented for desired performance.

Currently, the system and compiles runs in -x86, taking the baseline as avx2 for missing avx512 flags, causing a dip in performance.

generic cpu compilation and fallback

fafaf60

PareekVaibhav added 2 commits April 23, 2026 17:06

Vaibhav-Endee closed this May 4, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

generic cpu compilation and fallback#227

generic cpu compilation and fallback#227
shaleenji wants to merge 3 commits intomasterfrom
generic_cpu_compilation

shaleenji commented Apr 23, 2026

Uh oh!

github-actions Bot commented Apr 23, 2026 •

edited

Loading

Uh oh!

Vaibhav-Endee commented May 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

shaleenji commented Apr 23, 2026

Pull Request

Summary

Type of Change

Uh oh!

github-actions Bot commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

VectorDB Benchmark - Ready To Run

Available Modes

Infrastructure

How Correctness Benchmarking Works

Uh oh!

Vaibhav-Endee commented May 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

github-actions Bot commented Apr 23, 2026 •

edited

Loading